gemma-4-12B-it-abliterated-uncensored

Overview

Update incomming - this version needed a fix

Full BF16 weights of gemma-4-12B-it-abliterated-uncensored — an abliterated, uncensored variant of google/gemma-4-12B-it (Gemma 4 12B Unified, dense, ~11.95B parameters). The model keeps Gemma 4's encoder-free unified multimodal stack intact — text, image, and audio inputs flow straight into a single decoder-only transformer — so this checkpoint is a drop-in replacement for the original instruction-tuned model at the architecture level.

The pipeline:

  1. Refusal Ablation — Residual-stream refusal directions (one per decoder layer) were extracted via diff-in-means on a labeled harmful/harmless prompt set and baked into the weights as a per-matrix delta on the residual-write modules, using our own custom abliteration framework.
  2. Multimodal Preservation — Gemma 4 12B is encoder-free: image patches and audio waveforms are projected directly into the embedding space, so there is no separate vision/audio tower to graft back. Tensor names, shapes, and the config.json schema (Gemma4UnifiedForConditionalGeneration, model_type: gemma4_unified) match the base model exactly — this checkpoint loads anywhere the original loads.

Key Properties:

  • Uncensored across the standard refusal axes
  • Reasoning preserved (configurable thinking mode — see Best Practices)
  • Multimodal: text + image + audio carried forward
  • Drop-in shape compatibility with google/gemma-4-12B-it

Architecture

Property Value
Architecture Gemma4UnifiedForConditionalGeneration (model_type: gemma4_unified)
Total Parameters ~11.95B (dense)
Decoder Layers 48
Hidden Size 3840
Attention 16 heads / 8 KV heads, hybrid sliding-window (1024) + global (full) attention, p-RoPE
Vocabulary 262,144
Context Length up to 256K tokens
Modalities Text, Image, Audio (encoder-free / unified)

Files

File Description Size
model.safetensors BF16 weights (48 decoder layers, unified multimodal) ~23.9 GB
config.json Unified multimodal config (Gemma4UnifiedForConditionalGeneration)
processor_config.json Multimodal processor config
tokenizer.json, tokenizer_config.json, chat_template.jinja, generation_config.json Standard

Total on disk: ~24 GB.

Usage

from transformers import AutoProcessor, AutoModelForMultimodalLM

repo = "OpenYourMind/gemma-4-12B-it-abliterated-uncensored"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForMultimodalLM.from_pretrained(
    repo, dtype="bfloat16", device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image", "url": "path/to/image.jpg"},
        {"type": "text",  "text": "Describe this image in detail."},
    ]},
]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True, enable_thinking=False,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

out = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(out[0][input_len:], skip_special_tokens=True))

Text-only, audio, and video inputs work through the same class — place image content before the text and audio content after the text in the prompt for best results. Requires a recent transformers (the version that ships the Gemma 4 unified classes).

Best Practices

  • Sampling: temperature=1.0, top_p=0.95, top_k=64 (the values shipped in generation_config.json).
  • Thinking mode: enabled by setting enable_thinking=True in apply_chat_template; the processor's parse_response separates the reasoning block from the final answer. Do not feed previous-turn thoughts back into multi-turn history.

Hardware

Full BF16 weights (~24 GB). Fits on a single 24 GB GPU for inference with modest context, comfortably on a 40–80 GB card for long context and multimodal batches. For Apple Silicon, an MLX quant can be produced from these weights.

Support & Community

Notes

  • License: Gemma (inherits the Gemma 4 license from the base model)
  • Base Model: google/gemma-4-12B-it
  • Modality: Text + Image + Audio (encoder-free / unified)
  • Architecture: Gemma 4 12B Unified (dense, ~11.95B)

Thanks

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, the Gemma 4 license terms, and your deployment requirements.

Downloads last month
1,674
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenYourMind/gemma-4-12B-it-abliterated-uncensored

Finetuned
(34)
this model
Quantizations
5 models