I will build a domain specific sft dataset for llm finetuning

Name: build a domain specific sft dataset for llm finetuning
Brand: Fiverr
Availability: InStock

Vietnam

I speak English, Vietnamese

LLM FineTuning Data and AI Automation

I'm an AI Engineer with a Computer Science background, specializing in LLM fine-tuning data and AI automation systems. I build production-ready SFT datasets, custom AI pipelines, and document-aware kn...

About this Gig

Fine-tuning a language model starts with the data. Vague responses, duplicate samples, or wrong formats will hurt your model regardless of how good your training setup is.

I build domain-specific SFT datasets through a 5-stage pipeline: generation, validation, deduplication, LLM-as-judge scoring, and human quality review. Every sample that reaches your training loop has passed all five stages.

WHAT YOU RECEIVE

train.jsonl + val.jsonl (90/10 split)
data_card.md (dataset documentation)

FORMATS

Alpaca single-turn, all packages
ShareGPT multi-turn, Standard and Premium

COMPATIBLE WITH

Axolotl, LLaMA-Factory, Unsloth, OpenAI Fine-tune API, Together AI

DOMAINS

E-commerce, healthcare Q&A, legal summarization, coding assistant, SaaS support, finance, HR, EdTech, multilingual support, and more. Message me if yours isn't listed.

Not sure which package fits your use case? Send me a message before ordering.

build a domain specific sft dataset for llm finetuning

Full Screen

View Presentation

Programming Language:

Python

•

Pytorch

AI Model Frameworks & Tools:

Hugging Face Transformers

+1 more

Data Type:

Text

AI Engine:

GPT

•

Gemini

•

DeepSeek

•

Llama

•

Grok

My Portfolio

FAQ

Is the data quality guaranteed?

Every sample passes a 5-stage pipeline - generation, validation, deduplication, LLM-as-judge scoring, and human quality review. Vague, inconsistent, or off-topic samples are filtered out or trigger a re-run. What you receive passed all five stages.

Is this synthetic data?

Yes, generated by a state-of-the-art LLM. This is standard practice for SFT dataset construction and works well for most fine-tuning use cases. Real-world edge cases may benefit from additional human-written examples on top.

What's the difference between Alpaca and ShareGPT?

Alpaca is single-turn - one instruction, one response. ShareGPT is multi-turn conversational. Use Alpaca for task-following or Q&A. Use ShareGPT for chatbot or assistant fine-tuning where context carry-over matters.

Can you handle niche or rare domains?

Yes. I've worked with domains like mental health support, Islamic finance, Vietnamese legal assistance, and technical B2B SaaS. If your domain isn't on the list, message me - most are doable.

What fine-tuning frameworks does this support?

Axolotl, LLaMA-Factory, Unsloth, OpenAI Fine-tune API, and Together AI. Both Alpaca and ShareGPT are production-ready for all of these out of the box.

What does the data card include?

Domain, sample count, train/val split, format, average tokens per sample, deduplication method, and intended use. Standard documentation for production ML datasets.

What do I need to provide to get started?

Fiverr will guide you through everything when you place the order. Just a few details about your use case and preferences - nothing complicated.

Related tags

machine learning

Need to get creative?

Looking for tech experts?

Ready to reach and convert consumers?

Looking for writers?

Get your business running smarter

I will build a domain specific sft dataset for llm finetuning

About this Gig

My Portfolio

FAQ

Related tags