I will build spark etl pipelines for batch processing and big data workflows
Scalable Solutions, Clean Code and Clear Communication
About this Gig
I will design and develop scalable Spark ETL pipelines for batch data processing, transformation, and large-volume workflows.
This gig is ideal for businesses that need to process data from files, databases, APIs, or other structured sources in a reliable and maintainable way. Whether you need a new batch pipeline from scratch or improvements to an existing job, I can help you build a clean and production-oriented solution.
I focus on practical data engineering outcomes such as ingestion, transformation, validation, aggregation, and delivery into analytics-ready datasets or downstream systems.
What this gig can include
- Spark or PySpark ETL pipeline development
- batch processing for large datasets
- data ingestion from CSV, JSON, Parquet, APIs, and databases
- data cleaning, normalization, and transformation
- joins, aggregations, filtering, and enrichment logic
- output to files, data warehouses, or databases
- optimization and refactoring of existing Spark jobs
- structured logging and maintainable code organization
- basic documentation and handover support
Technology:
Apache Spark
•
BigQuery
•
Python
•
Scala
•
SQL
•
Apache Airflow
FAQ
Can you work with an existing Spark codebase?
Yes. I can improve, refactor, debug, or extend and existing Spark pipeline.
Can this include PySpark?
Yes. PySpark is fully supported.
Can you help with performance improvements?
Yes. If your current job is slow or hard to maintain, I can optimize the pipeline structure and processing flow.
Do you handle full deployment as well?
This gig focuses mainly on development, but deployment support can e discussed depending on the environment.
