London-based Data Engineer and Data Scientist. First Class BSc, Birkbeck, University of London.
I'm a London-based data professional with a First Class BSc in Data Science & Computing from Birkbeck, University of London. Studying part-time while building real-world projects shaped how I approach engineering: deliberately, methodically, and with a strong bias for systems that hold up under pressure.
My work is grounded in data engineering and analytics engineering — building pipelines and systems that are reliable, observable, and produce data that people can trust and act on. I specialise in the full pipeline lifecycle: from raw ingestion and orchestration (Airflow, CDC with Debezium/Redpanda) to transformation and warehousing (dbt, DuckDB, BigQuery), data quality enforcement (Great Expectations), and lineage tracking (OpenLineage).
On the data science side, I apply machine learning where there's genuine predictive value: financial time-series forecasting with LSTM networks, statistical modelling, and rigorous model evaluation. I treat ML as an engineering discipline — reproducible workflows, honest backtesting, and results you can stand behind.
I work across finance, sport, and urban/environmental data — domains where the stakes of data quality and pipeline reliability actually matter. My projects span TfL real-time data, Premier League and F1 analytics warehouses, London air quality monitoring, and equity price forecasting.
First Class Honours · BSc Data Science & Computing · Birkbeck, University of London
Designed and maintained production-grade end-to-end data pipelines to ingest, clean, and model member and event engagement data using Python, SQL, dbt, and Apache Airflow — improving data freshness from 6 hours to 45 minutes (87% improvement) and ensuring 95% of loads met a 1-hour freshness SLA.
Automated data collection and reporting workflows for workshops, hackathons, and startup events, eliminating ~65% of manual reporting tasks and expediting time-to-insight for weekly stakeholder updates.
Developed self-service dashboards and reporting packages highlighting KPIs — attendance trends, participation rates, and program impact — reducing manual reporting by ~18 hours per month.
Partnered with cross-functional leaders to establish metrics and enforce reporting SLAs; achieved 99% on-time weekly delivery with end-to-end pipeline latency under 60 minutes at the 95th percentile, contributing to a 22% increase in attendance for flagship programs.
Provided structured 1-on-1 mentoring through the Caawi Mentorship Platform, guiding cohorts of ~12 early-career candidates in data engineering, portfolio development, and job readiness.
Pre-processed large-scale, multi-source datasets using Python and SQL, implementing deduplication and schema validation to reduce data defects by 30% and cut data preparation time from 10 hours to 5 hours per reporting cycle.
Automated data extraction and established daily refresh pipelines, improving stakeholder turnaround from 3 days to same-day delivery — a 67% efficiency gain.
Conducted comprehensive EDA to identify key operational drivers — lead-time variance, fulfilment delays, returns, and margin leakage — producing 12+ actionable recommendations adopted by stakeholders to drive data-driven decision-making.
Applied feature engineering techniques (lateness flags, supplier segmentation) to enhance signal separation between high- and low-performing suppliers by 15%.
Built interactive Power BI KPI dashboards for real-time monitoring, reducing manual reporting by 16 hours per month, growing adoption to 25+ users across operations and commercial teams, and contributing to an 8% reduction in logistics costs and 12% increase in on-time delivery.