Design professional grafana dashboards for kubernetes, linux, and hpc by Zeeshan_766

FAQ

Can you monitor GPU usage for AI model training?

Yes! I specialize in tracking NVIDIA and AMD GPU metrics, including memory usage, temperature, and power consumption. This is essential for optimizing AI training clusters and ensuring your hardware is running efficiently.

Which data sources do you support?

I work with a wide range of data sources, including Prometheus, VictoriaMetrics, InfluxDB, Loki (for logs), and cloud-native tools like AWS CloudWatch and Google Stackdriver. I can also integrate custom AI/ML metric exporters.

Can you set up alerts for Slack or Email?

Absolutely. I configure intelligent alerting rules so you are notified immediately of high CPU/GPU load, pod crashes in Kubernetes, or job failures in your HPC cluster. I can also set up on-call routing.

Do you support HPC schedulers like Slurm?

Yes. I can build dashboards that visualize Slurm job queues, node availability, and partition health. This provides HPC administrators and researchers with a clear view of their cluster's utilization.

Do I need to provide the server for Grafana?

I can work with your existing setup or help you deploy a new instance on AWS, GCP, Azure, or Bare Metal. I also support Grafana Cloud if you prefer a managed solution.

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will design professional grafana dashboards for kubernetes, linux, and hpc

About this Gig

My Portfolio

FAQ

Related tags