Qwen2.5-7B QLoRA Adapter Fine-Tuned on Dolly-15K

This repository contains a QLoRA adapter fine-tuned from Qwen/Qwen2.5-7B-Instruct on databricks/databricks-dolly-15k.

Project Purpose

This is a supervised fine-tuning experiment for learning and demonstrating the full 7B QLoRA workflow:

  1. Load a 7B instruction model in 4-bit
  2. Prepare the model for k-bit training
  3. Convert Dolly-15K into chat format
  4. Apply the Qwen chat template
  5. Add LoRA adapters
  6. Train with TRL SFTTrainer
  7. Save adapter weights
  8. Run inference with base model + adapter
  9. Upload adapter to Hugging Face Hub

Base Model

  • Qwen/Qwen2.5-7B-Instruct

Dataset

  • databricks/databricks-dolly-15k
  • Training subset: 10000 examples
  • Evaluation subset: 1000 examples

Training Method

  • Method: QLoRA
  • Quantization: 4-bit NF4
  • Double quantization: enabled
  • Compute dtype: bfloat16
  • LoRA rank: 16
  • LoRA alpha: 32
  • LoRA dropout: 0.05
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Max sequence length: 2048
  • Epochs: 1
  • Learning rate: 2e-4

Intended Use

This adapter is intended for instruction-following experiments and PEFT/QLoRA learning.

Example use cases:

  • Comparing base model and QLoRA-adapted outputs

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel
import torch

base_model = "Qwen/Qwen2.5-7B-Instruct"
adapter = "Kurapika993/qwen2.5-7b-qlora-dolly15k"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

tokenizer = AutoTokenizer.from_pretrained(adapter)

model = AutoModelForCausalLM.from_pretrained(
    base_model,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(model, adapter)
model.eval()

def generate_response(model, tokenizer, user_prompt, max_new_tokens=250):
    messages = [
        {
            "role": "system",
            "content": "You are a helpful assistant."
        },
        {
            "role": "user",
            "content": user_prompt
        }
    ]

    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer(
        text,
        return_tensors="pt"
    ).to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            temperature=0.7,
            top_p=0.9,
            repetition_penalty=1.05,
            pad_token_id=tokenizer.eos_token_id,
        )

    
    generated_tokens = outputs[0][inputs["input_ids"].shape[-1]:]
    response = tokenizer.decode(generated_tokens, skip_special_tokens=True)

    return response.strip()

prompt = "Explain instruction tuning to a beginner using a simple analogy."

response = generate_response(model, tokenizer, prompt)
print(response)
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kurapika993/qwen2.5-7b-qlora-dolly15k

Base model

Qwen/Qwen2.5-7B
Adapter
(2138)
this model

Dataset used to train Kurapika993/qwen2.5-7b-qlora-dolly15k