I will test your llm and ai chatbot for bugs, accuracy and prompt failures
Manual Tester and QA Specialist
About this Gig
Are you deploying LLMs but worried about hallucinations or prompt injections? Standard QA fails with non-deterministic AI. I bridge the gap between AI development and software reliability by testing, breaking, and securing your LLM APIs.
### What I Will Do:
1. LLM API & Endpoint Testing: Validate status codes, payload schemas, and latency benchmarks (OpenAI, Anthropic, Custom models).
2. Prompt Validation & Vulnerability Testing: Evaluate prompts using Promptfoo or DeepEval. Test for injections, drift, and toxicity.
3. Hallucination Audits: Set up programmatic assertions to measure factual accuracy and semantic similarity.
4. CI/CD Integration: Build regression pipelines to auto-validate prompts on every backend change.
### Tech & Tools:
- Python / TypeScript
- Promptfoo / DeepEval / TruLens
- Postman / Newman / PyTest / Playwright
- CI/CD (GitHub Actions, GitLab CI)
### Why Choose This Gig?
Traditional QA checks static results. LLMs require an engineering mindset to track probability, semantic metrics, and adversarial prompt structures.
Ensure your AI behaves exactly as intended. Message me with your project details today!
Testing application:
API
Development technology:
C/C++
•
HTML & CSS
•
SQL
Device:
PC
•
Linux
•
Android mobile phone
•
Windows phone
FAQ
What tools do you use for prompt testing?
I primarily use open-source automation frameworks like Promptfoo, DeepEval, or custom PyTest configurations.

