I will build an AWS data lake and etl pipeline using pyspark
About this Gig
As a Data Engineer, I design robust cloud-native architectures and scalable ETL pipelines. Whether processing high-volume logs or building Medallion Data Lakes, I deliver clean, optimized solutions.
️ What I Offer:
- End-to-End ETL Pipelines: Automated data extraction, transformation, and loading using Python and PySpark.
- Cloud Data Lakes: Architecting serverless Medallion Data Lakes (Bronze, Silver, Gold) on AWS (S3, Glue, Athena).
- Database Architecture: Designing relational databases (3NF) and optimizing complex SQL queries (CTEs, Window Functions) in PostgreSQL.
- Performance Optimization: Reducing data processing times and cutting storage costs using formats like Apache Parquet.
Tech Stack: AWS (S3, Glue, Athena) | PySpark | Python | PostgreSQL | Advanced SQL | Git/GitHub
Why choose me? I write production-ready code, ensure scalable designs, and strictly follow data engineering best practices.
Please message me before ordering to discuss your exact project!
My Portfolio
FAQ
Do you provide architecture diagrams before starting the project?
Yes! For Standard and Premium packages, I provide a complete high-level cloud architecture diagram (e.g., AWS S3, Glue, Athena flow) before writing the code to ensure we are on the same page.
What technologies do you use for data transformation?
I primarily use PySpark (via AWS Glue) for big data transformations and advanced SQL (PostgreSQL) for relational data engines, ensuring high performance and scalability.

