ETL Pipeline using dbt, airflow, snowflake, AWS services (S3, EC2, IAM, SSM), PowerBI

Goal:
I built a production-grade data engineering platform to solve a common enterprise problem: transforming raw, unreliable data into trusted, analytics-ready insights—not just dashboards built on ad-hoc scripts.

Using the Netflix dataset as a real-world proxy, I designed a modern ELT architecture with clear separation of ingestion, transformation, orchestration, and analytics. The platform leverages Snowflake for scalable analytics, dbt for tested and documented transformations, and Airflow for reliable orchestration with proactive failure alerts.

This project demonstrates senior-level business and engineering maturity—treating data as a dependable asset, enforcing quality and lineage, and building systems that are maintainable, observable, and production-ready. It reflects how modern data teams operate in real organizations.

Architecture Overview

High-level flow:

  • Raw Netflix data ingested into Amazon S3
  • Data loaded into Snowflake as the cloud data warehouse
  • dbt used to build analytics-ready fact and dimension models
  • Apache Airflow orchestrates the full workflow end-to-end
  • Slack, SNS alerts notify on pipeline failures
  • Clean data exposed for BI & analytics use cases

This architecture follows the ELT paradigm, separating ingestion, transformation, and analytics for scalability and maintainability.

Tools & Technologies

  • Cloud: AWS (S3, SNS, IAM, System Manager)
  • Data Warehouse: Snowflake
  • Transformation: dbt (models, tests, documentation)
  • Orchestration: Apache Airflow
  • Programming: Python, SQL
  • Analytics: Power BI / SQL analytics layer
  • Version Control: Git & GitHub

Why This Project Matters (Business Lens)

  • Simulates real enterprise data pipelines,
  • Implements failure handling & alerting,
  • Follows analytics engineering best practices (dbt tests, models, lineage)
  • Designed for scalability, observability, and collaboration

View full implementation, code, and documentation on GitHub
(Includes dbt models, Airflow DAGs, SQL transformations, and screenshots)

Results & Deliverables

  • End-to-end automated ELT pipeline
  • Analytics-ready Snowflake models
  • dbt documentation and lineage graphs
  • Airflow DAGs with monitoring and alerts
  • BI-ready datasets for reporting and insights

Screenshots of architecture, Airflow DAGs, dbt lineage, and analytics outputs are included.