I will do local llm deployment on premise using vllm sglang ollama and llamacpp

IMRAN ULLAH

do local llm deployment on premise using vllm sglang ollama and llamacpp

Full Screen

About this gig

Advanced local and enterprise LLM deployment with secure on premises AI infrastructure and OpenAI compatible API.

If you want to run open-source language models on your own servers with full privacy, high speed, and no cloud dependency, you are in the right place.

I deploy and optimize LLM, Mixture of Experts, embedding models, multi model embeddings, and VLM systems using vLLM, SGLang, Ollama, TGI and llama.cpp for low latency and high tokens per second, exposed through an OpenAI compatible API for easy integration.

I work with modern models from Qwen3, DeepSeek 4.5, and GLM 4.5 for text, vision, and embedding workloads.

From lightweight local models to large deployments up to 500B+ parameters, I build production ready inference servers with multiuser support, batch processing, and real time monitoring.

Message me before ordering to discuss your system and goals.

Model expertise
- Custom model development
- Fine-tuning models
- Generative AI
- Predicitive analyatics
- Recommendation systems
- Other
Industry
- Art & design
- Audio & video
- Biotech
- Data analytics
- Financial services
- Gaming
- Transportation & automotive
Language
- English
- Korean
- Spanish
Technical expertise
- Machine learning (Supervised, Unsupervised, Reinforcement)
- Deep learning (Neural networks, GANs)
- Natural language processing (NLP)
- Computer Vision (Object detection, Image recognition)
- Reinforcement learning (Decision-making systems)
- Algorithm development and optimization
- Feature engineering and data processing
- AI ethics and bias mitigation

Get to know IMRAN ULLAH

IMRAN ULLAH

Building intelligent AI systems with NLP and Vision

FromPakistan
Member sinceMay 2026
Avg. response time1 hour
Languages
English, Urdu, Korean, Spanish, French, Arabic, Bengali, Kurdish

I am a Senior AI ML Engineer. I am new here but bring years of enterprise experience designing deep learning architectures. I build multi agent systems with agent2agent and MCP workflows. For NLP and vision, I create smart systems hybrid RAG and OCR pipelines using Qwen3 YOLOv12 and SAM3. I specialize in synthetic dataset generation and model fine tuning using PEFT LoRA QLoRA DoRA and Unsloth. I apply the latest reinforcement learning algorithms like RLHF DPO ORPO GRPO and DR GRPO. I optimize deployments using lightning-fast inference frameworks like vLLM SGLang TGI ONNX and TensorFlow.

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will do local llm deployment on premise using vllm sglang ollama and llamacpp

About this gig

Get to know IMRAN ULLAH

Related tags