I will fine tune open source llms with lora full tuning and rl


Level 1
About this gig
I can help you design and implement advanced LLM training and fine-tuning workflows for domain-specific assistants, reasoning models, chatbots, instruction-following models, and task-optimized language systems.
Data collection and dataset preparation
* Web and document-based data collection
* Instruction dataset creation
* Prompt-response pair generation
* Conversation and domain dataset curation
* Data cleaning, deduplication, filtering, and formatting
* Preference data preparation for reward modeling or RL
Supervised Fine-Tuning (SFT)
* LoRA / QLoRA fine-tuning
* Freeze fine-tuning
* Full fine-tuning
* Instruction tuning
* Chat model tuning
* Domain adaptation for finance, crypto, legal, support, technical, and private datasets
Reinforcement Learning methods
* RLHF-style pipeline design
* Reward modeling
* Preference optimization
* DPO / ORPO / PPO-style training workflows
* Alignment tuning for response quality, format, and task behavior
Training framework setup
* Hugging Face Transformers
* TRL
* PEFT
* DeepSpeed
* Accelerate
* PyTorch
* bitsandbytes
* vLLM inference integration
* Multi-GPU and distributed training setup
Get to know Djordje S
Level 1
- FromSerbia
- Member sinceJul 2024
- Avg. response time1 hour
- Last delivery1 month
Languages
English, Serbian
