Qwen3 4B Hermes Strict Tool-Call V4 Targeted

Release status: public experimental adapter release.

Summary

This is a LoRA adapter for Qwen/Qwen3-4B-MLX-4bit trained for strict local Hermes-style tool-call output.

Adapter repo: https://huggingface.co/edithatogo/qwen3-4b-hermes-lora

The adapter is intended for local evaluation and agent-runtime packaging. It requires the recorded runtime prompt condition:

first user turn prefixed with /no_think
assistant prefill: <think>\n\n</think>\n\n

Without the assistant prefill, the model still emits an empty leading thinking wrapper and does not satisfy the strict raw-output gate.

Base Model

Base: Qwen/Qwen3-4B-MLX-4bit
Base license: Apache-2.0, checked via Hugging Face API on 2026-05-25

Training

Training config: gemma4/scripts/train_config.qwen3-4b.strict-toolcall-v4-targeted.yaml
Data: gemma4/data/strict_tool_call/expanded_splits_v4_targeted
Adapter: gemma4/experiments/qwen3-4b-strict-toolcall-v4-targeted/lora_adapter
Training tokens: 37,936
Dataset token audit: reports/publication/qwen3-4b-strict-toolcall-v4-targeted/dataset-token-audit.json
Dataset overlap audit: reports/publication/qwen3-4b-strict-toolcall-v4-targeted/dataset-overlap-audit.json
Peak memory: 3.785 GB

Evaluation

Held-out strict local tool-call gate:

Suite	Pass	JSON valid	Arguments	Invalid tool	Multi-turn
`benchmarks/tool_call_local/heldout_suite.json`	1.000	1.000	1.000	1.000	1.000

Mirrored regression:

Suite	Pass
`benchmarks/tool_call_local/suite.json`	1.000

Repo-native pilot benchmarks:

Pilot	Pass	Notes
BFCL-style pilot	0.667	local pilot only, not official BFCL
IFEval-style pilot	0.667	local pilot only, not official IFEval
Coding sanity pilot	1.000	local pilot only, not HumanEval/MBPP

Exact held-out command:

source scripts/env.sh
PYTHONPATH=scripts ./.venv/bin/python scripts/run_tool_call_benchmark.py \
  --model Qwen/Qwen3-4B-MLX-4bit \
  --adapter gemma4/experiments/qwen3-4b-strict-toolcall-v4-targeted/lora_adapter \
  --suite benchmarks/tool_call_local/heldout_suite.json \
  --user-prefix /no_think \
  --assistant-prefill $'<think>\n\n</think>\n\n' \
  --run-id qwen3-4b-strict-toolcall-v4-targeted-heldout-prefill-20260525 \
  --max-tokens 256

Raw local artifact:

/Volumes/PortableSSD/hermes-evals/tool-call-benchmark/qwen3-4b-strict-toolcall-v4-targeted-heldout-prefill-20260525

The reusable runtime prompt contract is recorded in RUNTIME_PROMPT_PROFILES.yaml as qwen3-no-think-assistant-prefill.

Limitations

This is a small local strict-format benchmark, not broad BFCL or production tool-use evidence.
The adapter is sensitive to runtime prompt formatting.
The V4 training data has no held-out user-prompt overlap in the recorded audit, but it shares one generic held-out tool name, notify_care_team.
Dataset/source redistribution review is complete for adapter-release purposes with caveats; public dataset publication remains separate and blocked pending scope approval.
Public release approval is recorded in release-decision.md; the publication bundle is expected to pass scripts/validate_publication_bundle.py --require-ready.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for edithatogo/qwen3-4b-hermes-lora

Base model

Qwen/Qwen3-4B-MLX-4bit

Adapter

(1)

this model