04/Projects

Selectedwork.

A deeper look at the systems I've built — what was broken, the architecture I chose, and what changed because of it.

// Project 01·2022 — 2024

Enterprise Cloud Modernization Program

// Headline number

~1 TB

processed / day

// Architecture

live · 9 nodes · 9 edges

// active jobs2 running·1 queued
RUN_3421·ingest_users_mysql
24s
RUN_3422·transform_price_history
12s
RUN_3423·load_to_bigquery
3s
RUN_3424·backfill_alteryx_exports

Company-wide migration from Alteryx / MySQL to a cloud-native GCP stack. Started as a POC, became the production architecture of a $100M+ org.

// The problem

Alteryx / MySQL couldn't keep up. ETL ran for hours. No visibility. Engineering bottleneck for every new pipeline.

// My approach

Metadata-driven GCP architecture: BigQuery + Cloud Composer + Dataproc + PySpark, all IaC'd with Terraform. Analysts add pipelines via config — no engineering tickets. Incremental migration, production never stopped.

// Stack

gcpbigqueryairflowcloud-composerpysparkterraformdataprocgcspythoncloud-migrationalteryxmysql

// Outcome

  • Runtime cut from hours to minutes
  • ~1 TB processed daily
  • Right-sized Dataproc clusters reduced cost
  • Self-service onboarding for non-engineering teams

// Project 02·2023 — 2024

Cortex

// Headline number

~90%

routine investigations · self-serve

// Operations app

live

cortex · pipeline ops

v22

DAGS

4 dags+ new
user_etl
142ms
price_sync
refresh_cache
tenant_metrics
89ms

Single operational interface for DAGs, BigQuery state, logs, and metadataReact on App Engine. Ops teams self-serve without engineering.

// The problem

Engineering owned all ops visibility. Business teams filed tickets to see their own data — slow, frustrating, trust-eroding.

// My approach

React + App Engine app pulling DAGs, BigQuery state, MySQL metadata, and logs into one view. Real-time updates, structured search, one-click validation, rerun / skip / override without engineering tickets.

// Stack

app-enginereactpythonbigquerymysqlgcsairflowdagsgcpinternal-toolsmetadata-drivenmetadata-management

// Outcome

  • Operational tickets dropped sharply
  • Business teams self-serve ~90% of routine investigations
  • Mean-time-to-detect on data issues materially improved
  • Built and maintained as a single-engineer project

// Project 03·2021 — 2022

Automated Mover Modeling System

// Headline number

100s

models trained in parallel

// ML pipeline · stage 1/6

running

123456

Split data

train · test · val

train · 70%
test
val

8,420 rows · stratified split

Metadata-driven ML platform — parallel training, retraining, and scoring of 100s of models on PySpark + Airflow, controlled from a UI.

// The problem

Hundreds of models, all running from notebooks — slow, brittle, no audit trail, no reproducibility.

// My approach

PySpark + Airflow platform driven by metadata config. Imbalance correction, parallel training, versioning, scoring. Every run reproducible from logged config. Internal teams add or modify clients from a UI — no engineering involvement.

// Stack

machine-learningpysparkairflowmetadata-drivenmodel-orchestrationimbalanced-classificationretrainingscoring

// Outcome

  • Hundreds of models trained in parallel
  • Manual notebook work eliminated
  • Imbalance correction by default
  • Reproducibility through metadata-versioned configs

// Project 04·2020 — 2022

ROI and QBR Reporting Automation

// Headline number

€60k

annual time savings unlocked

// Report sheet · Q4 ROI

computing

fx=SUM(B2:E4)xlsx · auto-saved
ABCDE
1Q1Q2Q3Q4
2ACME CORP42k58k68k74k
3BETA INC38k45k52k60k
4GAMMA LTD28k33k41k48k
5TOTAL0k0k0k0k

ROI reports cut from minutes / hours to seconds. QBR PowerPoint decks built in under a minute.

// The problem

ROI reports: 30 min to hours, formatting errors. QBR decks: full day of slide assembly. Account managers buried in formatting.

// My approach

Django + React + BigQuery. ROI reports on-demand in seconds. QBR engine reads analytics tables and writes branded PowerPoint slides — typography, charts, narrative — programmatically.

// Stack

reportingautomationroiqbrpowerpointdjangoreactbigquerypythonanalytics

// Outcome

  • ROI generation 30 min — hours → seconds
  • QBR PowerPoint full day → under a minute
  • ~€60k annual time savings unlocked
  • Account teams refocus on analysis, not assembly