Instructions to use OpenYourMind/gemma-4-12B-it-abliterated-uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenYourMind/gemma-4-12B-it-abliterated-uncensored with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("OpenYourMind/gemma-4-12B-it-abliterated-uncensored") model = AutoModelForImageTextToText.from_pretrained("OpenYourMind/gemma-4-12B-it-abliterated-uncensored") - Notebooks
- Google Colab
- Kaggle
gemma-4-12B-it-abliterated-uncensored
Overview
Update incomming - this version needed a fix
Full BF16 weights of gemma-4-12B-it-abliterated-uncensored — an abliterated, uncensored variant of google/gemma-4-12B-it (Gemma 4 12B Unified, dense, ~11.95B parameters). The model keeps Gemma 4's encoder-free unified multimodal stack intact — text, image, and audio inputs flow straight into a single decoder-only transformer — so this checkpoint is a drop-in replacement for the original instruction-tuned model at the architecture level.
The pipeline:
- Refusal Ablation — Residual-stream refusal directions (one per decoder layer) were extracted via diff-in-means on a labeled harmful/harmless prompt set and baked into the weights as a per-matrix delta on the residual-write modules, using our own custom abliteration framework.
- Multimodal Preservation — Gemma 4 12B is encoder-free: image patches and audio waveforms are projected directly into the embedding space, so there is no separate vision/audio tower to graft back. Tensor names, shapes, and the
config.jsonschema (Gemma4UnifiedForConditionalGeneration,model_type: gemma4_unified) match the base model exactly — this checkpoint loads anywhere the original loads.
Key Properties:
- Uncensored across the standard refusal axes
- Reasoning preserved (configurable thinking mode — see Best Practices)
- Multimodal: text + image + audio carried forward
- Drop-in shape compatibility with
google/gemma-4-12B-it
Architecture
| Property | Value |
|---|---|
| Architecture | Gemma4UnifiedForConditionalGeneration (model_type: gemma4_unified) |
| Total Parameters | ~11.95B (dense) |
| Decoder Layers | 48 |
| Hidden Size | 3840 |
| Attention | 16 heads / 8 KV heads, hybrid sliding-window (1024) + global (full) attention, p-RoPE |
| Vocabulary | 262,144 |
| Context Length | up to 256K tokens |
| Modalities | Text, Image, Audio (encoder-free / unified) |
Files
| File | Description | Size |
|---|---|---|
model.safetensors |
BF16 weights (48 decoder layers, unified multimodal) | ~23.9 GB |
config.json |
Unified multimodal config (Gemma4UnifiedForConditionalGeneration) |
— |
processor_config.json |
Multimodal processor config | — |
tokenizer.json, tokenizer_config.json, chat_template.jinja, generation_config.json |
Standard | — |
Total on disk: ~24 GB.
Usage
from transformers import AutoProcessor, AutoModelForMultimodalLM
repo = "OpenYourMind/gemma-4-12B-it-abliterated-uncensored"
processor = AutoProcessor.from_pretrained(repo)
model = AutoModelForMultimodalLM.from_pretrained(
repo, dtype="bfloat16", device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": [
{"type": "image", "url": "path/to/image.jpg"},
{"type": "text", "text": "Describe this image in detail."},
]},
]
inputs = processor.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_tensors="pt", return_dict=True, enable_thinking=False,
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
out = model.generate(**inputs, max_new_tokens=512)
print(processor.decode(out[0][input_len:], skip_special_tokens=True))
Text-only, audio, and video inputs work through the same class — place image content before the text and audio content after the text in the prompt for best results. Requires a recent transformers (the version that ships the Gemma 4 unified classes).
Best Practices
- Sampling:
temperature=1.0,top_p=0.95,top_k=64(the values shipped ingeneration_config.json). - Thinking mode: enabled by setting
enable_thinking=Trueinapply_chat_template; the processor'sparse_responseseparates the reasoning block from the final answer. Do not feed previous-turn thoughts back into multi-turn history.
Hardware
Full BF16 weights (~24 GB). Fits on a single 24 GB GPU for inference with modest context, comfortably on a 40–80 GB card for long context and multimodal batches. For Apple Silicon, an MLX quant can be produced from these weights.
Support & Community
- Discord: https://discord.gg/rhUZY5GEZr
- Bitcoin Donations:
bc1qsvfduzj9fjs9fugpc52yver3f2g8fp7xjxecdv
Notes
- License: Gemma (inherits the Gemma 4 license from the base model)
- Base Model: google/gemma-4-12B-it
- Modality: Text + Image + Audio (encoder-free / unified)
- Architecture: Gemma 4 12B Unified (dense, ~11.95B)
Thanks
- Google DeepMind — for the Gemma 4 open models.
Disclaimer
Use is the responsibility of the user. Ensure your usage complies with applicable laws, platform rules, the Gemma 4 license terms, and your deployment requirements.
- Downloads last month
- 1,674