I will do local llm deployment on premise using vllm sglang ollama and llamacpp

C
cortexforge_ai
C
cortexforge_ai
IMRAN ULLAH

About this gig

Advanced local and enterprise LLM deployment with secure on premises AI infrastructure and OpenAI compatible API.


If you want to run open-source language models on your own servers with full privacy, high speed, and no cloud dependency, you are in the right place.

I deploy and optimize LLM, Mixture of Experts, embedding models, multi model embeddings, and VLM systems using vLLM, SGLang, Ollama, TGI and llama.cpp for low latency and high tokens per second, exposed through an OpenAI compatible API for easy integration.

I work with modern models from Qwen3, DeepSeek 4.5, and GLM 4.5 for text, vision, and embedding workloads.


From lightweight local models to large deployments up to 500B+ parameters, I build production ready inference servers with multiuser support, batch processing, and real time monitoring.


Message me before ordering to discuss your system and goals.

Get to know IMRAN ULLAH

IMRAN ULLAH

Building intelligent AI systems with NLP and Vision

  • FromPakistan
  • Member sinceMay 2026
  • Avg. response time1 hour
  • Languages

    English, Urdu, Korean, Spanish, French, Arabic, Bengali, Kurdish
I am a Senior AI ML Engineer. I am new here but bring years of enterprise experience designing deep learning architectures. I build multi agent systems with agent2agent and MCP workflows. For NLP and vision, I create smart systems hybrid RAG and OCR pipelines using Qwen3 YOLOv12 and SAM3. I specialize in synthetic dataset generation and model fine tuning using PEFT LoRA QLoRA DoRA and Unsloth. I apply the latest reinforcement learning algorithms like RLHF DPO ORPO GRPO and DR GRPO. I optimize deployments using lightning-fast inference frameworks like vLLM SGLang TGI ONNX and TensorFlow.