Instructions to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF", filename="Qwen3.5-9B.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Use Docker
docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
- Ollama
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Ollama:
ollama run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
- Unsloth Studio
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting
- Pi
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Docker Model Runner:
docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
- Lemonade
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF-Q4_K_M
List all available models
lemonade list
Non-reasoning mode
Hi
Thank you for your work, much appreciated.
Do I understand correctly that if I turn off the reasoning mode, there won't be much improvement over the base model?
The issue I'm having is that in agentic workflows, the tools already force the model to reason by issuing prompts like "create step by step plan" "research methods to perform X", and, in my experience, turning on reasoning doubles the time to achieve the goal without noticeably improving the quality.
What's your opinion on this?
Hi,
I think your interpretation is mostly correct.
In agentic workflows, the tools already push the model into structured reasoning through their prompts. So in those cases, turning on reasoning mode really comes down to whether the extra overhead is worth it. But when a task doesn’t have a clear structure, needs the model to plan on its own, or involves longer reasoning chains, reasoning mode can still help with stability and consistency.
It’s just that these benefits are much less noticeable when the whole process is tool‑driven.
Thanks for the explanation.
So would you say your finetuning would have a chance of measurably improving the output quality in non-reasoning mode, or it doesn't work that way?
sorry if it's a stupid question, I'm but a simple user, looking to automate some mundane daily tasks with agents on my own hardware. Qwen 3.5 was a real game changer for my case, what we had before was absolutely not suitable for anything beyond "hello world".
How do you turn off reasoning?
I tried /no_think, which doesn't work anymore with qwen3.5.
the "chat_template_kwargs": {"enable_thinking": false} also didn't work for me.
adding {%- set enable_thinking = false %} to the template also didn't work.
(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)
How do you turn off reasoning?
I tried
/no_think, which doesn't work anymore with qwen3.5.the
"chat_template_kwargs": {"enable_thinking": false}also didn't work for me.adding
{%- set enable_thinking = false %}to the template also didn't work.(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)
if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.
How do you turn off reasoning?
I tried
/no_think, which doesn't work anymore with qwen3.5.the
"chat_template_kwargs": {"enable_thinking": false}also didn't work for me.adding
{%- set enable_thinking = false %}to the template also didn't work.(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)
if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.
thanks, --reasoning-budget 0 this param it work for me
How do you turn off reasoning?
I tried
/no_think, which doesn't work anymore with qwen3.5.the
"chat_template_kwargs": {"enable_thinking": false}also didn't work for me.adding
{%- set enable_thinking = false %}to the template also didn't work.(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)
if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.
"--reasoning off" reasoning budget may have some issues for some models, but if it works keep it that way.