I will fine tune open source llms with lora full tuning and rl

D
djordje2024
D
djordje2024
Djordje S

Level 1

About this gig

I can help you design and implement advanced LLM training and fine-tuning workflows for domain-specific assistants, reasoning models, chatbots, instruction-following models, and task-optimized language systems.


Data collection and dataset preparation


 * Web and document-based data collection

 * Instruction dataset creation

 * Prompt-response pair generation

 * Conversation and domain dataset curation

 * Data cleaning, deduplication, filtering, and formatting

 * Preference data preparation for reward modeling or RL


Supervised Fine-Tuning (SFT)


 * LoRA / QLoRA fine-tuning

 * Freeze fine-tuning

 * Full fine-tuning

 * Instruction tuning

 * Chat model tuning

 * Domain adaptation for finance, crypto, legal, support, technical, and private datasets


Reinforcement Learning methods


 * RLHF-style pipeline design

 * Reward modeling

 * Preference optimization

 * DPO / ORPO / PPO-style training workflows

 * Alignment tuning for response quality, format, and task behavior


Training framework setup


 * Hugging Face Transformers

 * TRL

 * PEFT

 * DeepSpeed

 * Accelerate

 * PyTorch

 * bitsandbytes

 * vLLM inference integration

 * Multi-GPU and distributed training setup

Get to know Djordje S

Djordje S
5.0(17)

Level 1

  • FromSerbia
  • Member sinceJul 2024
  • Avg. response time1 hour
  • Last delivery1 month
  • Languages

    English, Serbian
Hi! I'm Djordje, a passionate and dedicated and talented blockchain and AI expert with extensive experience and deep understanding in developing innovative solutions for the blockchain life system. With a focus on blockchain technology, artificial intelligence, I help clients navigate the complexities of decentralized systems and harness the power of emerging technologies to drive business growth and innovation.

My Portfolio