Overview
This recipe runsQwen3.5-9B with the official llama.cpp CUDA server on a
Salad GPU. The model downloads automatically on first startup, the built-in llama.cpp web UI is available at your
deployment URL, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other
compatible clients.
This recipe is designed to be easy for nontechnical users:
- the model is already chosen for you
- it is public by default, so you can test it immediately after deployment
- thinking is enabled by default
- you can start with the built-in web UI, then connect other tools later
Quick Start
- Open the SaladCloud Portal.
- Deploy the Qwen3.5-9B (llama.cpp) recipe.
- Enter a Container Group Name.
- Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
- Choose whether to keep Enable Thinking / Reasoning turned on.
- Deploy and wait for the first startup to finish.
Model is downloaded from Hugging Face at startup, so it can take several minutes before the deployment becomes ready.
/v1/chat/completions.
Use With OpenClaw
If you want to connect this recipe to OpenClaw, follow this guide:Defaults
The recipe comes preconfigured with these defaults:- Model source:
unsloth/Qwen3.5-9B-GGUF - Model alias:
qwen3.5-9b - Context size:
262144 - Parallel slots:
1 - Thinking: enabled by default
- Sampling defaults:
temperature 0.6,top_p 0.95,min_p 0.0,top_k 20 - Authentication: disabled by default
temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference
payload.
Thinking Mode
When thinking is enabled, you can control it per request:- Add
/thinkto explicitly enable reasoning for that turn. - Add
/no_thinkto disable reasoning for that turn.
Authentication
Require Container Gateway Authentication is available in the deployment form and is unchecked by default.- Disabled: anyone with the URL can call the API.
- Enabled: every request must include the
Salad-Api-Keyheader.
Example Request
For Technical Users
If you want to switch this recipe to a different model later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:LLAMA_ARG_HF_REPOto use a different Hugging Face GGUF repoLLAMA_ARG_HF_FILEto select a specific file from that repoLLAMA_ARG_MODEL_URLto point directly to a.gguffileLLAMA_ARG_CTX_SIZEto change the context windowLLAMA_ARG_N_GPU_LAYERSto control GPU offload