Looks Like This Service Is On Hold
I will help in data engineering, governance, and discovery
India
Architect in Data Engineering, Data Warehouse, and Data Lake: Delta
About this Gig
In today's data-driven landscape, organizations demand agility and scalability to unlock the full potential of their data assets. My expertise lies in architecting and implementing robust, high-performance ETL pipelines that bridge the gap between traditional OLTP databases and cutting-edge data lakehouse architectures, empowering your organization to derive actionable insights from both analytical and operational workloads.
I specialize in the design, development, and deployment of data pipelines tailored for batch, real-time, and near real-time data ingestion and transformation from OLTP-compliant databases, such as MySQL, AWS Aurora, and GCP Cloud SQL. These pipelines seamlessly integrate with modern data lakehouse formats, including Apache Hudi, Iceberg, and Delta Lake, enabling you to build a unified and scalable data foundation.
By implementing my ETL pipelines, your organization can:
- Enhance data accessibility and usability for both analytical and operational purposes.
- Reduce data management complexity by leveraging the unified data foundation of a data lakehouse.
- Improve data governance and compliance through robust data lineage and audit tras.
Data solutions for your edge
FAQ
Do you ingest data from CSV, JSON, S3/GCS in Parquet also
Yes, using a highly configurable Scala ETL pipeline ingests diverse files into Hudi/Delta Lakehouse. Hive Metastore integration ensures seamless data discovery via Athena/Trino/Pestro.
Do you ingest data from kafka topic directly also
Yes, I have the highly configurable Scala code ETL pipeline which will read the Kafka topic as micro-batches and write into Lakehouse file format. Leveraging Hive Metastore to provide a unified data catalog for Athena/Trino/Presto or Any SQL-based query engine.
Do you read the MySQL instance using JDBC connection or binlog?
I have the fully configurable ETL codebase that reads MySQL-based tables using JDBC connection incremental/full or by enabling the binlog (using Debezium/Maxwell) and pushing to Kafka topic for real-time ingestion into the Lakehouse file format. Data Discovery is enabled using Hive Metastore.

