I will private, on device ai desktop applications


Level 2
About this gig
Your clients won't let you send their data to ChatGPT. I get it. I will build you a Windows desktop application that runs a real LLM Phi-3, Llama 3, Mistral, Qwen, or Gemma entirely on the user's machine
No cloud. No API keys. No data leakage. No recurring OpenAI bills. No GDPR headaches. No HIPAA concerns.
I'm a senior Windows developer (13+ years) currently working on hardware debugging tools at a global silicon company. I understand ONNX Runtime, DirectML, GPU/NPU acceleration, and how to ship a 24 GB model inside an MSIX or Inno Setup installer without breaking it.
Perfect for:
- Law firms handling privileged client data
- Healthcare providers (HIPAA-bound)
- Financial advisors and accountants
- Defence and government contractors (ITAR, CMMC)
- HR tools with employee PII
- Any team bound by GDPR, SOC 2, or internal data-residency rules
- Companies in regions with poor internet connectivity
Tech stack: C# / WPF / WinUI, ONNX Runtime GenAI, llama.cpp, Microsoft.ML.OnnxRuntime, DirectML, Semantic Kernel (local mode), LiteDB for vector storage, MSIX / Inno Setup packaging
Hardware requirements I'll help you plan for: I will recommend minimum specs for your end-users based on the model size
Get to know Shashank
Windows Desktop Developer C Sharp, C plus plus , Python , WPF, XAML, AI
Level 2
- FromIndia
- Member sinceJan 2018
- Avg. response time1 hour
- Last delivery3 weeks
Languages
English, German, Portuguese, French
My Portfolio
FAQ
How good are local models compared to GPT-4
Honestly, not as good at everything — but surprisingly close for many tasks. Phi-3-mini and Llama 3 8B handle Q&A;, summarisation, extraction, and drafting very well. For tasks needing broad world knowledge or complex reasoning, cloud models still lead.
How big is the final installer
Between 2 GB and 8 GB depending on the model. I use installers that download the model on first launch if you prefer a smaller initial download
Will this work on a 5-year-old laptop
Yes, with a smaller model (Phi-3-mini, 3.8B parameters) on CPU — slower, maybe 3–6 tokens per second. For real-time-feeling responses, 16 GB RAM and a modern CPU are recommended
Can it use the NPU on newer Copilot+ PCs
Yes. ONNX Runtime with DirectML can target the NPU on Qualcomm Snapdragon X and newer Intel / AMD NPUs for dramatically faster inference with lower power use
What if I want to update the model later?
The Standard and Premium packages include a model-swap mechanism so you (or your users) can drop in a newer or different model without needing a new installer.
Do you handle fine-tuning
Fine-tuning is a separate engagement. For most use cases RAG (retrieving from your own documents) gets you the same practical result without the cost and complexity of fine-tuning. I'll advise honestly on which you need
Can you sign a HIPAA BAA?
I do not sign BAAs as a solo freelancer, but your app itself can be HIPAA-compliant by design — which is exactly what I build. I'll explain the distinction in our first chat.
What about commercial licensing of the models?
I only use models with permissive licences (Phi-3 MIT, Llama 3 with Meta's commercial licence, Mistral Apache 2.0, Qwen). I will flag licensing implications before we finalise which model to use.

