I will build a real time data lakehouse pipeline

Sri Lanka

I speak Sinhala, English

Python Developer, FastAPI , Web Scraping , AI Automation, Data Engineering

I'm a Data Engineer with 3+ years of industry experience building RESTful APIs, web scraping systems, and AI-powered applications. I specialize in FastAPI, Streamlit, and LangGraph, and work with lead...
About this Gig

Looking to build a real-time data pipeline that keeps your data warehouse always up to date without manual ETL jobs?


I will design and deliver a fully automated, end-to-end data lakehouse pipeline that captures every change in your database the moment it happens, streams it through Kafka, and lands it as queryable Delta Lake tables all orchestrated and monitored by Apache Airflow.

What you get:


  • Live CDC from your MySQL database (no downtime, no manual exports)
  • Scalable stream processing with Apache Spark
  • S3-compatible Delta Lake storage (MinIO) query with Trino or Spark SQL
  • Airflow DAG for automated health checks and pipeline monitoring
  • Fully Dockerized runs on your server or cloud VM
  • Setup guide and documentation included


Perfect for startups, data teams, and businesses that need reliable, real-time data availability without managing complex infrastructure from scratch.

Destination Platform:

Databricks Lakehouse

PostgreSQL

MySQL

Tools & Platforms:

Airbyte

Kafka Connect

Debezium

Apache NiFi

My Portfolio