Stack — Adil Osman

My stack spans the full data engineering lifecycle — from real-time ingestion and CDC pipelines to warehouse modelling, BI, data quality, and ML workflows. These are tools I actively use and understand in depth across production-pattern projects and real professional experience.

Python SQL dbt Apache Airflow Apache Spark BigQuery Snowflake ClickHouse DuckDB Debezium Redpanda Kafka Azure AWS Databricks Docker Terraform Power BI Tableau Looker Great Expectations OpenLineage MLflow TensorFlow GitHub Actions

01 / Programming Languages

Languages

Python

SQL

Java

Bash / Shell

JavaScript

Jinja (dbt templating)

02 / Data Engineering

Pipelines & Orchestration

Apache Airflow

Apache Spark

dbt Core / dbt Cloud

Databricks

Airbyte / Fivetran (EL)

Docker

Terraform (IaC)

ELT / ETL design

Incremental loading

Schema enforcement

03 / Streaming & CDC

Real-time & Event-driven

Debezium (CDC)

Redpanda (Kafka API)

Apache Kafka

Apache Flink (concepts)

Event streaming patterns

Change data capture design

04 / Cloud Platforms

Cloud & Infrastructure

Microsoft Azure

Azure Data Factory

Azure Synapse Analytics

Amazon Web Services (AWS)

AWS S3 / Glue / Lambda

Google Cloud Platform (GCP)

Databricks

Snowflake

Google BigQuery

04b / Databases & Storage

Data Stores

ClickHouse

DuckDB

PostgreSQL

MySQL

MinIO (S3-compatible)

Parquet / Delta / columnar

05 / Analytics Engineering

Warehousing & Modelling

dbt (models, tests, docs)

Dimensional modelling

Lakehouse architecture

Looker / LookML

dbt data contracts & tests

Slowly changing dimensions

Staging → marts modelling

06 / BI & Visualisation

Dashboards & Reporting

Power BI

Tableau

Looker

Metabase

Apache Superset

Streamlit

Plotly / Dash

KPI dashboard design

07 / Data Quality & Observability

Quality & Lineage

Great Expectations

Soda Core

OpenLineage / Marquez

dbt schema tests

Row-level validation

SLA monitoring

Pipeline observability

08 / Machine Learning & AI

Modelling & ML Ops

Scikit-learn

TensorFlow / Keras

MLflow (experiment tracking)

Weights & Biases

Hugging Face (transformers)

LSTM / RNN architectures

Feature engineering

Time-series modelling

Backtesting & evaluation

09 / Python Data Stack

Libraries & Analysis

Pandas

NumPy / SciPy

Matplotlib / Seaborn

Plotly

SQLAlchemy

Jupyter Notebooks

FastAPI

10 / CI/CD & DevOps

Workflow & Tooling

GitHub Actions

Git / GitHub

Docker & Docker Compose

Terraform

Kubernetes (concepts)

CI/CD pipeline design

Reproducible environments

I pick tools that fit the problem. Airflow for orchestration with retries, dependencies, and monitoring. dbt because SQL-first transformations with built-in testing and lineage are the right model for analytics engineering. Databricks and Snowflake for scalable, cloud-native data processing where performance and collaboration matter. Azure and AWS as the infrastructure backbone for production pipelines. Debezium + Redpanda when the problem demands capturing every database change in real time. Power BI and Tableau to translate data into decisions stakeholders can act on. The stack reflects the problem — not the other way around.

What I work with.