Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.salad.com/llms.txt

Use this file to discover all available pages before exploring further.

Last Updated: April 23, 2026
Deploy from the SaladCloud Portal.

Overview

This recipe runs Qwen3.6-27B-unsloth with the official llama.cpp CUDA server on a Salad GPU. The model downloads automatically on startup, the built-in llama.cpp web UI is available at your deployment URL, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other compatible clients. This recipe is designed to be easy for nontechnical users:
  • the model is already chosen for you
  • it is public by default, so you can test it immediately after deployment
  • thinking is enabled by default
  • you can start with the built-in web UI, then connect other tools later
Qwen describes Qwen3.6-27B-unsloth as a coding-focused dense model with stronger agentic coding behavior and support for preserving reasoning context across turns.

Quick Start

  1. Open the SaladCloud Portal.
  2. Deploy the Qwen3.6-27B-unsloth (llama.cpp) recipe.
  3. Enter a Container Group Name.
  4. Decide whether to enable Require Container Gateway Authentication:
    • Disabled: public access.
    • Enabled: requests must include your SaladCloud API key.
  5. Choose whether to keep Enable Thinking / Reasoning turned on.
  6. Deploy and wait for the first startup to finish.
Model is downloaded at startup, so the first startup can take several minutes before the deployment becomes ready.
Once the container is ready, you can either open the built-in UI in a browser or connect an OpenAI-compatible client to /v1/chat/completions.

Use With OpenClaw

If you want to connect this recipe to OpenClaw or other agentic tools, follow these guides:

Defaults

The recipe comes preconfigured with these defaults:
  • Model source: unsloth/Qwen3.6-27B-GGUF
  • Model file: Qwen3.6-27B-UD-Q4_K_XL.gguf
  • Model alias: qwen3.6-27b
  • Configured context size: 131072
  • Native model context: 262144
  • Parallel slots: 1
  • Thinking: enabled by default
  • Sampling defaults: temperature 0.6, top_p 0.95, min_p 0.0, top_k 20
  • Authentication: disabled by default
temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference payload. The recipe defaults to 131072 tokens because that is a more practical single-GPU llama.cpp starting point. If you need more context later, you can raise LLAMA_ARG_CTX_SIZE in Advanced Configuration.

Thinking Mode

Thinking is enabled by default. Qwen3.6 does not officially rely on prompt switches such as /think or /no_think in this recipe. Instead:
  • turn off Enable Thinking / Reasoning in the deployment form to make non-thinking mode the default
  • or pass chat_template_kwargs: {"enable_thinking": false} in clients that support extra request fields
Advanced clients can also request preserved reasoning context across turns with chat_template_kwargs: {"preserve_thinking": true}.

Authentication

Require Container Gateway Authentication is available in the deployment form and is unchecked by default.
  • Disabled: anyone with the URL can call the API.
  • Enabled: every request must include the Salad-Api-Key header.
If you enable authentication, see Sending Requests for the header format.

Example Request

curl https://<your-dns>.salad.cloud/v1/chat/completions \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3.6-27b",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a short explanation of why preserving reasoning context can help with iterative coding tasks."}
    ],
    "max_tokens": 256
  }'
If you enabled authentication during deployment, add:
-H 'Salad-Api-Key: <api-key>'

For Technical Users

If you want to tune llama.cpp later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:
  • LLAMA_ARG_HF_REPO to change the Hugging Face GGUF repo
  • LLAMA_ARG_HF_FILE to choose a specific GGUF file inside the repo
  • LLAMA_ARG_MODEL_URL to point directly to a .gguf file
  • LLAMA_ARG_CTX_SIZE to change the context window
  • LLAMA_ARG_N_GPU_LAYERS to control GPU offload
  • LLAMA_ARG_N_PARALLEL to change concurrency
For full llama.cpp server options, see:

Source Code