gemma-4-12B-it-abliterated-uncensored

Overview

Update incomming - this version needed a fix

Full BF16 weights of gemma-4-12B-it-abliterated-uncensored — an abliterated, uncensored variant of google/gemma-4-12B-it (Gemma 4 12B Unified, dense, ~11.95B parameters). The model keeps Gemma 4's encoder-free unified multimodal stack intact — text, image, and audio inputs flow straight into a single decoder-only transformer — so this checkpoint is a drop-in replacement for the original instruction-tuned model at the architecture level.

The pipeline:

Refusal Ablation — Residual-stream refusal directions (one per decoder layer) were extracted via diff-in-means on a labeled harmful/harmless prompt set and baked into the weights as a per-matrix delta on the residual-write modules, using our own custom abliteration framework.
Multimodal Preservation — Gemma 4 12B is encoder-free: image patches and audio waveforms are projected directly into the embedding space, so there is no separate vision/audio tower to graft back. Tensor names, shapes, and the config.json schema (Gemma4UnifiedForConditionalGeneration, model_type: gemma4_unified) match the base model exactly — this checkpoint loads anywhere the original loads.

Key Properties:

Uncensored across the standard refusal axes
Reasoning preserved (configurable thinking mode — see Best Practices)
Multimodal: text + image + audio carried forward
Drop-in shape compatibility with google/gemma-4-12B-it

Architecture

Property	Value
Architecture	`Gemma4UnifiedForConditionalGeneration` (`model_type: gemma4_unified`)
Total Parameters	~11.95B (dense)
Decoder Layers	48
Hidden Size	3840
Attention	16 heads / 8 KV heads, hybrid sliding-window (1024) + global (full) attention, p-RoPE
Vocabulary	262,144
Context Length	up to 256K tokens
Modalities	Text, Image, Audio (encoder-free / unified)

Files

File	Description	Size
`model.safetensors`	BF16 weights (48 decoder layers, unified multimodal)	~23.9 GB
`config.json`	Unified multimodal config (`Gemma4UnifiedForConditionalGeneration`)	—
`processor_config.json`	Multimodal processor config	—
`tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`, `generation_config.json`	Standard	—

Total on disk: ~24 GB.

Usage

from transformers import AutoProcessor, AutoModelForMultimodalLM

repo = "OpenYourMind/gemma-4-12B-it-abliterated-uncensored"

processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForMultimodalLM.from_pretrained(
    repo, dtype="bfloat16", device_map="auto",
)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": [
        {"type": "image", "url": "path/to/image.jpg"},
        {"type": "text",  "text": "Describe this image in detail."},
    ]},
]
inputs = processor.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_tensors="pt", return_dict=True, enable_thinking=False,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]

out = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(out[0][input_len:], skip_special_tokens=True))

Text-only, audio, and video inputs work through the same class — place image content before the text and audio content after the text in the prompt for best results. Requires a recent transformers (the version that ships the Gemma 4 unified classes).

Best Practices

Sampling: temperature=1.0, top_p=0.95, top_k=64 (the values shipped in generation_config.json).
Thinking mode: enabled by setting enable_thinking=True in apply_chat_template; the processor's parse_response separates the reasoning block from the final answer. Do not feed previous-turn thoughts back into multi-turn history.

Hardware

Full BF16 weights (~24 GB). Fits on a single 24 GB GPU for inference with modest context, comfortably on a 40–80 GB card for long context and multimodal batches. For Apple Silicon, an MLX quant can be produced from these weights.

Support & Community

Discord: https://discord.gg/rhUZY5GEZr
Bitcoin Donations: bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv

Notes

License: Gemma (inherits the Gemma 4 license from the base model)
Base Model: google/gemma-4-12B-it
Modality: Text + Image + Audio (encoder-free / unified)
Architecture: Gemma 4 12B Unified (dense, ~11.95B)

Thanks

Google DeepMind — for the Gemma 4 open models.

Disclaimer

Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, the Gemma 4 license terms, and your deployment requirements.

Downloads last month: 1,674

Safetensors

Model size

12B params

Tensor type

BF16

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OpenYourMind/gemma-4-12B-it-abliterated-uncensored

Base model

google/gemma-4-12B

Finetuned

google/gemma-4-12B-it

Finetuned

(34)

this model

Quantizations

5 models