Last Updated: April 23, 2026Documentation Index
Fetch the complete documentation index at: https://docs.salad.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This recipe runsQwen3.6-27B-unsloth with the official llama.cpp CUDA server
on a Salad GPU. The model downloads automatically on startup, the built-in llama.cpp web UI is available at your
deployment URL, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other
compatible clients.
This recipe is designed to be easy for nontechnical users:
- the model is already chosen for you
- it is public by default, so you can test it immediately after deployment
- thinking is enabled by default
- you can start with the built-in web UI, then connect other tools later
Quick Start
- Open the SaladCloud Portal.
- Deploy the Qwen3.6-27B-unsloth (llama.cpp) recipe.
- Enter a Container Group Name.
- Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
- Choose whether to keep Enable Thinking / Reasoning turned on.
- Deploy and wait for the first startup to finish.
Model is downloaded at startup, so the first startup can take several minutes before the deployment becomes ready.
/v1/chat/completions.
Use With OpenClaw
If you want to connect this recipe to OpenClaw or other agentic tools, follow these guides:- Use OpenClaw with a Salad-hosted LLM
- Use OpenCode with a Salad-hosted LLM
- Use Cline with a Salad-hosted LLM
- Use Aider with a Salad-hosted LLM
- Use Kilo Code with a Salad-hosted LLM
- Use Roo Code with a Salad-hosted LLM
- Use Continue with a Salad-hosted LLM
- Use Vercel AI SDK with a Salad-hosted LLM
- Use Goose with a Salad-hosted LLM
- Use Hermes with a Salad-hosted LLM
Defaults
The recipe comes preconfigured with these defaults:- Model source:
unsloth/Qwen3.6-27B-GGUF - Model file:
Qwen3.6-27B-UD-Q4_K_XL.gguf - Model alias:
qwen3.6-27b - Configured context size:
131072 - Native model context:
262144 - Parallel slots:
1 - Thinking: enabled by default
- Sampling defaults:
temperature 0.6,top_p 0.95,min_p 0.0,top_k 20 - Authentication: disabled by default
temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference
payload.
The recipe defaults to 131072 tokens because that is a more practical single-GPU llama.cpp starting point. If you need
more context later, you can raise LLAMA_ARG_CTX_SIZE in Advanced Configuration.
Thinking Mode
Thinking is enabled by default. Qwen3.6 does not officially rely on prompt switches such as/think or /no_think in this recipe. Instead:
- turn off Enable Thinking / Reasoning in the deployment form to make non-thinking mode the default
- or pass
chat_template_kwargs: {"enable_thinking": false}in clients that support extra request fields
chat_template_kwargs: {"preserve_thinking": true}.
Authentication
Require Container Gateway Authentication is available in the deployment form and is unchecked by default.- Disabled: anyone with the URL can call the API.
- Enabled: every request must include the
Salad-Api-Keyheader.
Example Request
For Technical Users
If you want to tune llama.cpp later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:LLAMA_ARG_HF_REPOto change the Hugging Face GGUF repoLLAMA_ARG_HF_FILEto choose a specific GGUF file inside the repoLLAMA_ARG_MODEL_URLto point directly to a.gguffileLLAMA_ARG_CTX_SIZEto change the context windowLLAMA_ARG_N_GPU_LAYERSto control GPU offloadLLAMA_ARG_N_PARALLELto change concurrency