I will provide aiops and sre consulting for devops and cloud reliability
GPU Infrastructure LLMOps Engineer NVIDIA Kubernetes Neo Cloud
About this Gig
Are you shipping LLM products but struggling with GPU infrastructure, scaling, and reliability? I help teams build production-grade GPU platforms end-to-end.
What you get: Neo cloud GPU setup and cluster hardening Kubernetes GPU scheduling and autoscaling for LLM training and inference (vLLM/Ollama/Triton) MLOps/LLMOps CI/CD for models and data pipelines GPU monitoring and alerts using NVIDIA DCGM + Prometheus + Grafana Cost optimization, capacity planning, and observability best practices
Deliverables can include architecture review, deployment plan, and hands-on implementation depending on package tier.
Tools:
Docker
•
GitLab
•
Jenkins
•
GitHub
•
CircleCI
Frameworks:
Terraform
•
Ansible
Programming language:
Bash
•
Python
•
Golang
Expertise:
Installation
•
Migration
•
Configuration
