I will build big data pipelines and process datasets using pyspark and sql

Pakistan

I speak English, French

AI, Data, and Web3 Engineer

I am an ML & Data Engineer with a Master in Data & Intelligence from Université Claude Bernard Lyon 1. I specialize in bridging the gap between advanced AI research and scalable, production-ready soft...
About this Gig

Struggling with massive datasets or slow processing times?


I am a Data Engineer specializing in large-scale Big Data processing, ETL, and analytics. I build highly optimized data pipelines to ingest, clean, and transform gigabytes of data efficiently using PySpark and Python. Whether you need complex aggregations, geospatial mapping, or clean visualizations, I deliver production-ready code.


My Core Services:


  • Big Data Pipelines: High-performance ETL workflows using Apache Spark, PySpark, and Python.
  • Advanced Transformations: Optimized Spark SQL queries, complex window functions, UDFs, and large-scale joins.
  • Data Integration: Cleaning and formatting structured/semi-structured data for downstream analytics.
  • Geospatial Data: Processing location-based and time-series data.
  • Visual Insights: Translating big data into actionable visualizations using Pandas and Matplotlib.


Tech Stack: Python | Apache Spark | PySpark | Spark SQL | Pandas | Matplotlib


Why Me?

I write clean, scalable, and fully documented code, ensuring your data operations are accurate and computationally optimized.


Please message me before ordering to discuss your dataset!

Destination Platform:

Databricks Lakehouse

PostgreSQL

Tools & Platforms:

Other