Inference Providers
Active filters: dpo, trl
syntropy-ai/Soren-1-Small
Text Generation
• 2B • Updated • 1.68k
• 26
codelion/Qwen3-0.6B-PTS-DPO
Text Generation
• 0.6B • Updated • 3
• 2
trentmkelly/gpt-4o-distil-Llama-3.3-70B-Instruct
Text Generation
• Updated • 15
• 3
mradermacher/gpt-4o-distil-Llama-3.1-8B-Instruct-PaperWitch-heresy-GGUF
8B • Updated • 720
• 7
mrshu/qwen35-0.8b-dpo-think
Text Generation
• 0.8B • Updated • 10
• 1
lewtun/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 5
alignment-handbook/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 20
• 3
alignment-handbook/zephyr-7b-dpo-qlora
Updated • 7
• 9
amirali1985/gpt-neo-125m_hh_reward
Text Generation
• 0.1B • Updated • 16
lewtun/zephyr-7b-dpo-qlora
sambar/zephyr-7b-ipo-lora
Text Generation
• Updated • 4
nikkoyabut/merged_model_dpo
sambar/zephyr-7b-ipo-lora-5ep
Text Generation
• Updated • 5
alexredna/TinyLlama-1.1B-Chat-v1.0-reasoning-v2-dpo
Text Generation
• 1B • Updated • 9
• 2
Yaxin1992/mixtral-dpo-1000
adhi29/openhermes-mistral-dpo-gptq
Updated
Text Generation
• 1.03M • Updated • 5
ybelkada/test-tags-model-2
Text Generation
• 1.03M • Updated • 7
justinj92/dpoplatypus-phi2
Text Generation
• 3B • Updated lewtun/zephyr-7b-dpo-qlora-8e0975a
akashkumarbtc/openhermes-mistral-dpo-gptq
Updated
darshan8950/openhermes-mistral-dpo-gptq
Updated
ondevicellm/zephyr-7b-dpo-full
Text Generation
• 7B • Updated • 4
jdang/openhermes-mistral-dpo-gptq
Updated