I will write pyspark jobs for both batch and streaming data
About this Gig
Are you in need of a skilled Data Engineer to streamline your data processing, ETL pipelines, and data lake architecture? Look no further! I bring in-depth expertise in crafting robust solutions using PySpark, EMR, Apache Hive, and even Apache Hudi. With a strong background in batch and streaming data processing, I'm here to optimize your data workflows for efficiency and accuracy.
Services I Offer:
PySpark ETL Jobs:
Transform, clean, and process your data efficiently using PySpark. I'll create custom ETL pipelines tailored to your specific data requirements, ensuring high-quality results.
Batch & Streaming Jobs:
Whether it's processing data in bulk or handling real-time streams, I can design and implement both batch and streaming jobs using industry best practices.
EMR Expertise:
Leverage the power of Amazon Elastic MapReduce (EMR) for distributed data processing. I'll create EMR clusters, optimize job execution, and fine-tune performance.
Others:
I can integrate your job with Apache Hive and can even provide my expertise in Apache
Hudi. I can dump your data on Amazon S3 as well if you are working on a DataLake.
Looking forward to start working with you. Cheers!
Expertise:
Big data
•
Data manipulation
•
ETL
•
Transformation
•
SQL
•
NoSQL
Technology:
Apache Hadoop
•
Apache Spark
•
Excel
•
Python
•
SQL
•
NoSQL
