· mdcbowen

Cymer/ASML Link to heading

As the lead data engineer for a high-visibility engagement with Cymer (a division of ASML), I was tasked with executing a structured technology evaluation of Amazon Redshift as a candidate to replace a legacy Microsoft SQL Server (MSSQL) data warehouse. The client provided an extensive test plan, including performance benchmarks and reproducibility criteria, for which I was responsible for both implementation and interpretation.

To meet these objectives, I designed and built a robust, repeatable test harness in Ruby using SQL and ERB templating, simulating a complex suite of analytic workloads across multiple permutations. The testing framework supported over 800 unique scenarios across combinations of table structures, data volumes, indexing strategies, and query styles. These scenarios were validated against both MSSQL and Redshift environments.

Key elements of my contribution included:

Data Modeling & Loading: Automated generation of synthetic data at scale using Ruby scripts and bulk loaders tailored to each platform. On Redshift, I implemented parallel load strategies using COPY with optimized file slicing, compression, and distribution key tuning.
Query Test Framework: Engineered a flexible and fully parameterized system for templated SQL queries, enabling systematic testing of performance differences across schema configurations (e.g., row vs. columnar layout, sort/dist keys, compression encodings).
Benchmarking & Metrics Capture: Developed instrumentation to track query runtime, memory usage, and I/O behavior. Benchmarks were executed in isolated environments to preserve consistency and logged to flat files for post-analysis.
Platform Diagnostics: Leveraged Redshift’s system tables (e.g., STL_QUERY, SVL_QLOG, STL_ALERT_EVENT_LOG) to profile execution plans and identify bottlenecks such as skewed joins, ineffective distribution, or unnecessary vacuuming.
Recommendations & Reporting: Authored a detailed performance comparison and architectural suitability report that advised the client on Redshift’s capabilities and limitations in relation to their workloads, including factors like concurrency handling, sort key tuning, and ETL trade-offs.

This project demonstrated my ability to bridge the gap between engineering rigor and business decision-making. It required advanced SQL skills, deep familiarity with cloud-native analytical databases, and the capability to rapidly prototype and iterate complex testing pipelines under client-defined constraints. My final deliverables were praised for their clarity, technical depth, and actionability, and influenced strategic data infrastructure decisions at Cymer.