Looks Like This Service Is On Hold

I will help in data engineering, governance, and discovery

India

I speak English, Hindi, Japanese

Architect in Data Engineering, Data Warehouse, and Data Lake: Delta

Looking to build a data pipeline to bring data from diverse sources (MySQL, DynamoDB, MongoDB, etc.) into a data lake on S3 or Google Cloud Storage? I specialize in creating these pipelines using infr...

About this Gig

In today's data-driven landscape, organizations demand agility and scalability to unlock the full potential of their data assets. My expertise lies in architecting and implementing robust, high-performance ETL pipelines that bridge the gap between traditional OLTP databases and cutting-edge data lakehouse architectures, empowering your organization to derive actionable insights from both analytical and operational workloads.

I specialize in the design, development, and deployment of data pipelines tailored for batch, real-time, and near real-time data ingestion and transformation from OLTP-compliant databases, such as MySQL, AWS Aurora, and GCP Cloud SQL. These pipelines seamlessly integrate with modern data lakehouse formats, including Apache Hudi, Iceberg, and Delta Lake, enabling you to build a unified and scalable data foundation.

By implementing my ETL pipelines, your organization can:

Enhance data accessibility and usability for both analytical and operational purposes.
Reduce data management complexity by leveraging the unified data foundation of a data lakehouse.
Improve data governance and compliance through robust data lineage and audit tras.

Data solutions for your edge

help in data engineering, governance, and discovery

Full Screen

Expertise:

Big data

•

Data acquisition

•

Data extraction

•

Data flow

•

ETL

+1 more

Technology:

Apache Kafka

•

Apache Spark

•

BigQuery

•

Scala

•

Databricks

+1 more

FAQ

Do you ingest data from CSV, JSON, S3/GCS in Parquet also

Yes, using a highly configurable Scala ETL pipeline ingests diverse files into Hudi/Delta Lakehouse. Hive Metastore integration ensures seamless data discovery via Athena/Trino/Pestro.

Do you ingest data from kafka topic directly also

Yes, I have the highly configurable Scala code ETL pipeline which will read the Kafka topic as micro-batches and write into Lakehouse file format. Leveraging Hive Metastore to provide a unified data catalog for Athena/Trino/Presto or Any SQL-based query engine.

Do you read the MySQL instance using JDBC connection or binlog?

I have the fully configurable ETL codebase that reads MySQL-based tables using JDBC connection incremental/full or by enabling the binlog (using Debezium/Maxwell) and pushing to Kafka topic for real-time ingestion into the Lakehouse file format. Data Discovery is enabled using Hive Metastore.

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

Looks Like This Service Is On Hold

I will help in data engineering, governance, and discovery

About this Gig

FAQ

Related tags