I will build end to end gcp data pipelines using pubsub, kafka, and dataform
About this Gig
A modern data platform requires robust ingestion and meticulously modeled analytics. As a Google Cloud Certified Data Engineer, I build end-to-end systems guaranteeing data integrity from source to dashboard.
I engineer high-volume event-driven pipelines with strict at-least-once delivery , while architecting centralized BigQuery models unifying disparate tables from 19+ business units.
What I Can Do For You:
- Real-Time Ingestion: Architect secure systems using Apache Kafka & GCP Pub/Sub in Java Spring Boot.
- Serverless Processing: Design decoupled microservices via Cloud Run to transform large-scale datasets.
- Dimensional Modeling: Transform raw BigQuery data into Star Schemas using Dataform, applying SCD Type 2 & 4.
- Orchestration: Orchestrate multi-stage ELT workflows via Cloud Composer (Airflow) to automate Dataform jobs.
Technologies I Use: GCP Pub/Sub, Kafka, BigQuery, Dataform, Java (Spring Boot), Cloud Run, Airflow, and Terraform.
Why Choose Me? You get a certified cloud expert who implements robust data quality frameworks, logging assertion failures to persistent error tables so your analytics remain trustworthy.
Let's chat before you order to align on scope!
FAQ
How do you handle the difference between streaming data and batch modeling?
I use a modern approach where Pub/Sub and Cloud Run handle the real-time ingestion, landing the data safely into raw BigQuery tables. Then, I schedule Dataform via Cloud Composer (Airflow) to periodically clean, test, and model that raw data into business-ready curated tables
Can you guarantee that no streaming messages will be lost?
Yes. I engineer systems with strict at-least-once delivery guarantees using robust retry logic and intermediate object storage to ensure total fault tolerance
Do you use Dataform or dbt for the BigQuery modeling?
I highly recommend Dataform for native GCP stacks, as it is fully managed within BigQuery and integrates perfectly with Cloud Composer. However, I am proficient in both tools depending on your environment.
How do you ensure the modeled data is accurate?
I implement a robust data quality framework within Dataform to capture assertion failures. Any validation failures are automatically routed to a persistent BigQuery error log table for review

