I will write and optimize pyspark etl pipelines for your data workflows
Senior Data Engineer ,Spark ,Scala ,AWS ,Airflow , Kafka ,Big Dat
About this Gig
Are you looking for a reliable PySpark Data Engineer to build or optimize your ETL pipelines?
You're in the right place.
I'm Pankaj, a Data Engineer with 3+ years of experience at Paytm, where I built 200+ production ETL pipelines processing over 5 TB/day using PySpark, Airflow, AWS, and Kafka.
This gig focuses 100% on delivering fast, scalable, and clean PySpark ETL solutions for your business.
What I Can Do for You
- Write clean and optimized PySpark ETL code
- Build end-to-end ETL workflows (extract transform load)
- Convert SQL logic into PySpark transformations
- Fix failing or slow PySpark jobs
- Optimize Spark jobs to reduce runtime and EMR cost
- Integrate PySpark with AWS Glue, S3, EMR, Athena
- Data cleaning, validation & transformation
- Debug existing ETL pipelines
Why Choose Me
- Production-ready, clean code
- Strong real-world experience
- Fast communication and delivery
- 100% focus on reliability and scalability
- Practical understanding of pipeline failures & optimizations
Technologies I Use
- PySpark / Spark
- AWS Glue, S3, EMR
- SQL
- Airflow (workflow orchestration)
- Kafka
- Python & Scala
Have a custom requirement?
Message me anytime I reply fast.
Lets build something scalable.
FAQ
What do you need from me to start?
Database/API access, sample data, SQL logic, or problem statement.
Can you connect to my database or API?
Yes — MySQL, PostgreSQL, MongoDB, APIs, S3, and more.
Do you optimize existing pipelines?
Yes — I specialize in runtime optimization and debugging.
Can you integrate AWS services?
Yes — Glue, S3, EMR, Lambda, Athena.
Can you sign an NDA?
Yes — I can work under NDA if required.

