Projects

Projects that ship.

Real systems built to production standards — pipelines, warehouses, ML workflows, and data science work across finance, sport, and urban data.

Data Engineering · Real-time  ·  ⭐ 3

TfL Real-Time Lakehouse

Real-time London transport data on your laptop. Airflow ingests TfL bus and tube arrivals and stores them in Parquet. dbt and DuckDB handle transformation, Great Expectations validates every batch, and OpenLineage provides full data lineage tracking — a completely observable, production-pattern lakehouse with zero cloud dependency.

PythonApache AirflowdbtDuckDBParquetGreat ExpectationsOpenLineageTfL API
Data Engineering · CDC

StreamShop CDC Analytics Stack

Production-style Change Data Capture pipeline: Postgres → Debezium → Redpanda (Kafka-compatible API) → ClickHouse. Includes a synthetic e-commerce event generator, Python CDC sink, and dbt models with tests for analytical queries on captured changes.

PythonDebeziumRedpandaClickHousedbtPostgreSQL
Data Engineering · Environmental

London Air Quality & Weather Lakehouse

Local-first lakehouse ingesting OpenAQ air quality and Open-Meteo weather data. Airflow orchestrates ingestion, raw JSON lands in MinIO (S3-compatible), and dbt applies incremental models with full validation in PostgreSQL.

PythonApache AirflowdbtPostgreSQLMinIO (S3)OpenAQ API
Analytics Engineering · Sports

F1 RaceOps Analytics Warehouse

Formula 1 analytics warehouse built on an Ergast-compatible schema, converting raw race timing data into KPIs for pit stop performance, strategy effectiveness, and reliability. Dimensional modelling with dbt and full CI via GitHub Actions.

PythondbtSQLGitHub ActionsErgast API
Analytics Engineering · Football

Premier League Analytics Warehouse

End-to-end football analytics warehouse on BigQuery, built with dbt and GitHub Actions CI. Covers Premier League seasons 2014–15 to 2024–25 from openfootball JSON, modelled into analytics-ready dimensional tables.

dbtBigQueryGitHub ActionsSQLPython
Data Engineering · Lakehouse

Mini Lake — dbt + DuckDB

Local-first, portable lakehouse demo using dbt and DuckDB. Features versioned seeds, staging-to-marts models, schema tests, auto-generated documentation, and a GitHub Actions CI example — ideal for learning and prototyping analytics engineering patterns.

dbtDuckDBPythonGitHub ActionsSQL
Machine Learning · Finance

Stock Market Prediction with LSTM Networks

End-to-end ML pipeline for financial time-series forecasting using LSTM deep learning. Achieves an R² of 0.95, mean absolute error of $2.4, and 87% directional accuracy. Covers data ingestion, feature engineering (technical indicators, rolling stats, lag features), LSTM architecture design, hyperparameter tuning, and rigorous backtesting methodology.

PythonTensorFlow / KerasLSTMPandasNumPyScikit-learnMatplotlib
Data Science · ML

Missing Data — Imputation Techniques

Interactive notebook clarifying MCAR, MAR, and MNAR missingness mechanics; NumPy masked arrays; and modern imputation using scikit-learn KNN and Iterative/MICE imputation. Includes reproducible experiments and visual analysis.

PythonScikit-learnNumPyJupyter
Data Science · Statistics

Chebyshev, LLN & CLT — Probability Fundamentals

Notebook exploring core probability theory: binomial distributions, the Law of Large Numbers, the Central Limit Theorem, and Chebyshev's inequality with Uniform/Normal/Gamma tail comparisons — all implemented and visualised from scratch.

PythonNumPyMatplotlibJupyter
Data Science · Statistics

Linear Regression from Scratch

Implements OLS linear regression from the ground up — slope/intercept derivation, loss function, gradient descent, and model evaluation with visualisation. Demonstrates core ML foundations without library abstractions.

PythonNumPyMatplotlibJupyter

Earlier & exploratory projects.

Systems design, UI prototyping, and foundational programming work.


Systems Design
Fitness Gym Systems Design
UML diagrams, CRC cards, UI flows, and a prototype for a gym information system — full systems analysis deliverable.
Frontend
Fitness Gym UI Prototype
Framework-free JavaScript SPA for a gym management UI. Hash routing, responsive styles, Home/Classes/Trainers/Memberships/Dashboard/Sign-in pages — all in a single file.
Java · OOP
Wallet Polymorphism Demo
Java OOP demo illustrating inheritance, method overriding, and polymorphism through a Wallet holding CallingCard, IDCard, and DriverLicense types.
Cryptography
Casual Coded Correspondence
Caesar and Vigenère cipher implementation in a Jupyter Notebook, including a brute-force Caesar solver. Classical cryptography as a teaching exercise.
Java · OOP
Storage Unit Manager
Java mini-project for a self-storage unit system. Item and SelfStorageUnit classes with store, manage, and analyse functionality, tested with a driver class.
All source code on GitHub
15 public repositories · github.com/aosman101
View Profile