Technical Stack

What I work with.

Tools chosen for what they're genuinely good at — not for résumé padding. Each one earns its place.

My stack spans the full data engineering lifecycle — from real-time ingestion and CDC pipelines to warehouse modelling, BI, data quality, and ML workflows. These are tools I actively use and understand in depth across production-pattern projects and real professional experience.
Python SQL dbt Apache Airflow Apache Spark BigQuery Snowflake ClickHouse DuckDB Debezium Redpanda Kafka Azure AWS Databricks Docker Terraform Power BI Tableau Looker Great Expectations OpenLineage MLflow TensorFlow GitHub Actions
01 / Programming Languages
Languages
Python
SQL
Java
R
Bash / Shell
JavaScript
Jinja (dbt templating)
02 / Data Engineering
Pipelines & Orchestration
Apache Airflow
Apache Spark
dbt Core / dbt Cloud
Databricks
Airbyte / Fivetran (EL)
Docker
Terraform (IaC)
ELT / ETL design
Incremental loading
Schema enforcement
03 / Streaming & CDC
Real-time & Event-driven
Debezium (CDC)
Redpanda (Kafka API)
Apache Kafka
Apache Flink (concepts)
Event streaming patterns
Change data capture design
04 / Cloud Platforms
Cloud & Infrastructure
Microsoft Azure
Azure Data Factory
Azure Synapse Analytics
Amazon Web Services (AWS)
AWS S3 / Glue / Lambda
Google Cloud Platform (GCP)
Databricks
Snowflake
Google BigQuery
04b / Databases & Storage
Data Stores
ClickHouse
DuckDB
PostgreSQL
MySQL
MinIO (S3-compatible)
Parquet / Delta / columnar
05 / Analytics Engineering
Warehousing & Modelling
dbt (models, tests, docs)
Dimensional modelling
Lakehouse architecture
Looker / LookML
dbt data contracts & tests
Slowly changing dimensions
Staging → marts modelling
06 / BI & Visualisation
Dashboards & Reporting
Power BI
Tableau
Looker
Metabase
Apache Superset
Streamlit
Plotly / Dash
KPI dashboard design
07 / Data Quality & Observability
Quality & Lineage
Great Expectations
Soda Core
OpenLineage / Marquez
dbt schema tests
Row-level validation
SLA monitoring
Pipeline observability
08 / Machine Learning & AI
Modelling & ML Ops
Scikit-learn
TensorFlow / Keras
MLflow (experiment tracking)
Weights & Biases
Hugging Face (transformers)
LSTM / RNN architectures
Feature engineering
Time-series modelling
Backtesting & evaluation
09 / Python Data Stack
Libraries & Analysis
Pandas
NumPy / SciPy
Matplotlib / Seaborn
Plotly
SQLAlchemy
Jupyter Notebooks
FastAPI
10 / CI/CD & DevOps
Workflow & Tooling
GitHub Actions
Git / GitHub
Docker & Docker Compose
Terraform
Kubernetes (concepts)
CI/CD pipeline design
Reproducible environments

I pick tools that fit the problem. Airflow for orchestration with retries, dependencies, and monitoring. dbt because SQL-first transformations with built-in testing and lineage are the right model for analytics engineering. Databricks and Snowflake for scalable, cloud-native data processing where performance and collaboration matter. Azure and AWS as the infrastructure backbone for production pipelines. Debezium + Redpanda when the problem demands capturing every database change in real time. Power BI and Tableau to translate data into decisions stakeholders can act on. The stack reflects the problem — not the other way around.

See it in use Get in Touch