Build a real time data lakehouse pipeline by Tharangadassana

FAQ

What information do you need to get started?

I need details about your source database (type, version, size), your preferred storage destination, and your server/cloud environment. If you're unsure, a free discovery call can help scope it out.

Can you connect to my existing database without downtime?

Yes. Using CDC (Change Data Capture) via Debezium, the pipeline reads your MySQL binary log — no locks, no downtime, no impact on your running application.

What does the pipeline deliver in real time?

Every INSERT, UPDATE, and DELETE in your source database is captured instantly and lands in Delta Lake tables on MinIO (S3-compatible) within seconds — queryable via Spark SQL or Trino.

Do I need cloud infrastructure or does this run on-premise?

Both. The entire stack runs on Docker Compose — deploy it on your local server, a cloud VM (AWS EC2, GCP, Azure), or any Linux machine with 8GB+ RAM.

Can you handle schema changes in my source database?

Yes. The pipeline is built with schema evolution in mind. I configure Debezium and Spark to handle new columns and type changes gracefully without breaking the pipeline.

Will you sign an NDA if my data is sensitive?

Absolutely. I am happy to sign an NDA before the project starts.

Do you offer post-delivery support?

Yes — 7 days (Basic), 14 days (Standard), 30 days (Premium) for bug fixes and deployment issues.

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will build a real time data lakehouse pipeline

About this Gig

My Portfolio

FAQ

Related tags