Qwen3.6-27B-unsloth with llama.cpp recipe

Last Updated: April 23, 2026

Overview

This recipe runs Qwen3.6-27B-unsloth with the official llama.cpp CUDA server on a Salad GPU. The model downloads automatically on startup, the built-in llama.cpp web UI is available at your deployment URL, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other compatible clients. This recipe is designed to be easy for nontechnical users:

the model is already chosen for you
it is public by default, so you can test it immediately after deployment
thinking is enabled by default
you can start with the built-in web UI, then connect other tools later

Qwen describes Qwen3.6-27B-unsloth as a coding-focused dense model with stronger agentic coding behavior and support for preserving reasoning context across turns.

Quick Start

Open the SaladCloud Portal.
Deploy the Qwen3.6-27B-unsloth (llama.cpp) recipe.
Enter a Container Group Name.
Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
Choose whether to keep Enable Thinking / Reasoning turned on.
Deploy and wait for the first startup to finish.

Model is downloaded at startup, so the first startup can take several minutes before the deployment becomes ready.

Once the container is ready, you can either open the built-in UI in a browser or connect an OpenAI-compatible client to /v1/chat/completions.

Use With OpenClaw

If you want to connect this recipe to OpenClaw or other agentic tools, follow these guides:

Defaults

The recipe comes preconfigured with these defaults:

Model source: unsloth/Qwen3.6-27B-GGUF
Model file: Qwen3.6-27B-UD-Q4_K_XL.gguf
Model alias: qwen3.6-27b
Configured context size: 131072
Native model context: 262144
Parallel slots: 1
Thinking: enabled by default
Sampling defaults: temperature 0.6, top_p 0.95, min_p 0.0, top_k 20
Authentication: disabled by default

temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference payload. The recipe defaults to 131072 tokens because that is a more practical single-GPU llama.cpp starting point. If you need more context later, you can raise LLAMA_ARG_CTX_SIZE in Advanced Configuration.

Thinking Mode

Thinking is enabled by default. Qwen3.6 does not officially rely on prompt switches such as /think or /no_think in this recipe. Instead:

turn off Enable Thinking / Reasoning in the deployment form to make non-thinking mode the default
or pass chat_template_kwargs: {"enable_thinking": false} in clients that support extra request fields

Advanced clients can also request preserved reasoning context across turns with chat_template_kwargs: {"preserve_thinking": true}.

Authentication

Require Container Gateway Authentication is available in the deployment form and is unchecked by default.

Disabled: anyone with the URL can call the API.
Enabled: every request must include the Salad-Api-Key header.

If you enable authentication, see Sending Requests for the header format.

Example Request

curl https://<your-dns>.salad.cloud/v1/chat/completions \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a short explanation of why preserving reasoning context can help with iterative coding tasks."}
    ],
    "max_tokens": 256
  }'

If you enabled authentication during deployment, add:

-H 'Salad-Api-Key: <api-key>'

For Technical Users

If you want to tune llama.cpp later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:

LLAMA_ARG_HF_REPO to change the Hugging Face GGUF repo
LLAMA_ARG_HF_FILE to choose a specific GGUF file inside the repo
LLAMA_ARG_MODEL_URL to point directly to a .gguf file
LLAMA_ARG_CTX_SIZE to change the context window
LLAMA_ARG_N_GPU_LAYERS to control GPU offload
LLAMA_ARG_N_PARALLEL to change concurrency

For full llama.cpp server options, see:

Explanation

Tutorials

How-to Guides

Storage

Reference

Qwen3.6-27B-unsloth with llama.cpp recipe

Overview

Quick Start

Use With OpenClaw

Defaults

Thinking Mode

Authentication

Example Request

For Technical Users

Source Code

​Overview

​Quick Start

​Use With OpenClaw

​Defaults

​Thinking Mode

​Authentication

​Example Request

​For Technical Users

​Source Code

Overview

Quick Start

Use With OpenClaw

Defaults

Thinking Mode

Authentication

Example Request

For Technical Users

Source Code