I will test your llm chatbot for jailbreaks, data leaks and unsafe behavior

Vladislav Boev

test your llm chatbot for jailbreaks, data leaks and unsafe behavior

Full Screen

View Presentation

About this gig

LLM Behavioral & Safety Testing by a QA Lead

I'm a QA Lead (6+ yrs) applying systematic test design to AI. I build test sets that surface where your LLM-powered bot behaves unsafely or breaks its own rules jailbreaks, prompt injection, prompt leaks, hallucinations, refusal failures, and data-access risks.

How it works:

You share your system prompt + how the bot is used
I map the risk zones specific to your use-case
I build the test cases (input expected behavior + severity + rationale)
You get JSONL + CSV + a readable report ready for your eval harness

Premium: I also run the tests against your model and deliver a findings report each failure with input, expected vs actual, and severity.

What I don't do: I don't judge factual or domain accuracy (legal, medical, etc.) that needs a subject-matter expert. I test behavior, safety & instruction-following.

Need a large or ongoing set? Message me for a custom quote. Written-first, GMT+7. Message me before ordering.

Model expertise
- Generative AI
- Other
Industry
- Cyber security
- Data analytics
- Food & beverage
- Gaming
- Marketing & advertising
- Travel & tourism
- Other
Programming language
- Python
- Other
Language
- English
- Russian
Technical expertise
- Natural language processing (NLP)
- AI ethics and bias mitigation
- Other

Get to know Vladislav Boev

Vladislav Boev

Senior QA Lead and Test Architect

FromVietnam
Member sinceJun 2026
Avg. response time1 hour
Languages
Russian, English

QA Lead with 6+ yrs. Test at architecture level: data flows, integrations, system design, risks. Services: QA Audit: process + test code review. Top risks + roadmap. Test Strategy: levels, tools, effort estimates. Auto-tests: Python + Playwright + Pytest (UI/API). Code Review for test automation. Requirements analysis: find contradictions, gaps, risks. I don't: CI/CD setup (only requirements), performance testing. Written-first. Clear reports. GMT+7 (Asia). Message me before ordering.

FAQ

Do you check if my bot's answers are factually correct?

No — I test behavior, safety and instruction-following (does it break rules, leak data, get jailbroken). Judging factual or domain accuracy (legal, medical, etc.) needs a subject-matter expert. I'll tell you upfront if your case needs that.

What do you need from me to start?

Your system prompt (the instructions you give the model) and a short description of how the bot is used. For Premium runs: API access to your model, or you run my test cases and send back the outputs.

Which models do you support?

Any text-based LLM or chatbot (GPT, Claude, Gemini, Llama, open-source, fine-tuned). I test behavior at the prompt level, so the underlying model doesn't matter.

Can you test legal, medical or financial bots?

I can test their safety and rule-following behavior (e.g. that they refuse advice they shouldn't give), but not whether their domain answers are correct. For high-risk domains I keep scope to behavior/safety and say so clearly.

I need a large or recurring test set — can you do that?

Yes. The packages cover focused sets; for large volumes or ongoing testing, message me before ordering and I'll send a custom quote.

Related tags

llm evaluation

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will test your llm chatbot for jailbreaks, data leaks and unsafe behavior

About this gig

Get to know Vladislav Boev

FAQ

Related tags