Instructions to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF",
	filename="Qwen3.5-9B.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Use Docker

docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Ollama
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Ollama:
```
ollama run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
```

Unsloth Studio

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF to start chatting

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Docker Model Runner:
```
docker model run hf.co/Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M
```

Lemonade

How to use Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull Jackrong/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF-Q4_K_M

List all available models

lemonade list

Non-reasoning mode

by Arikshtein - opened Mar 20

Discussion

Arikshtein

Mar 20

Hi
Thank you for your work, much appreciated.
Do I understand correctly that if I turn off the reasoning mode, there won't be much improvement over the base model?
The issue I'm having is that in agentic workflows, the tools already force the model to reason by issuing prompts like "create step by step plan" "research methods to perform X", and, in my experience, turning on reasoning doubles the time to achieve the goal without noticeably improving the quality.
What's your opinion on this?

Jackrong

Owner Mar 20

Hi,
I think your interpretation is mostly correct.
In agentic workflows, the tools already push the model into structured reasoning through their prompts. So in those cases, turning on reasoning mode really comes down to whether the extra overhead is worth it. But when a task doesn’t have a clear structure, needs the model to plan on its own, or involves longer reasoning chains, reasoning mode can still help with stability and consistency.
It’s just that these benefits are much less noticeable when the whole process is tool‑driven.

Arikshtein

Mar 20

Thanks for the explanation.
So would you say your finetuning would have a chance of measurably improving the output quality in non-reasoning mode, or it doesn't work that way?
sorry if it's a stupid question, I'm but a simple user, looking to automate some mundane daily tasks with agents on my own hardware. Qwen 3.5 was a real game changer for my case, what we had before was absolutely not suitable for anything beyond "hello world".

mindplay

Mar 23

How do you turn off reasoning?

I tried /no_think, which doesn't work anymore with qwen3.5.

the "chat_template_kwargs": {"enable_thinking": false} also didn't work for me.

adding {%- set enable_thinking = false %} to the template also didn't work.

(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)

cmy2019

Mar 24

How do you turn off reasoning?

I tried /no_think, which doesn't work anymore with qwen3.5.

the "chat_template_kwargs": {"enable_thinking": false} also didn't work for me.

adding {%- set enable_thinking = false %} to the template also didn't work.

(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)

if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.

samge

Mar 26

How do you turn off reasoning?

I tried /no_think, which doesn't work anymore with qwen3.5.

the "chat_template_kwargs": {"enable_thinking": false} also didn't work for me.

adding {%- set enable_thinking = false %} to the template also didn't work.

(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)

if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.

thanks, --reasoning-budget 0 this param it work for me

gargaud3029

Mar 27

How do you turn off reasoning?

I tried /no_think, which doesn't work anymore with qwen3.5.

the "chat_template_kwargs": {"enable_thinking": false} also didn't work for me.

adding {%- set enable_thinking = false %} to the template also didn't work.

(full disclosure: I have no idea what I'm doing, just trying thing I saw people doing on the internets.)

if you are using llama.cpp to run this model, you can try "--reasoning-budget 0" to trun off thinking mode.

"--reasoning off" reasoning budget may have some issues for some models, but if it works keep it that way.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment