Overview
This recipe runsQwen3.5-35B-A3B with the official llama.cpp CUDA server on a
Salad GPU. The model downloads automatically on first startup, the built-in llama.cpp web UI is available at your
deployment URL, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other
compatible clients.
This recipe is designed to be easy for nontechnical users:
- the model is already chosen for you
- it is public by default, so you can test it immediately after deployment
- thinking is enabled by default
- you can start with the built-in web UI, then connect other tools later
Quick Start
- Open the SaladCloud Portal.
- Deploy the Qwen3.5-35B-A3B (llama.cpp) recipe.
- Enter a Container Group Name.
- Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
- Choose whether to keep Enable Thinking / Reasoning turned on.
- Deploy and wait for the first startup to finish.
Model is downloaded at startup, so it can take up to about 20 minutes before the deployment becomes ready.
/v1/chat/completions.
Use With OpenClaw
If you want to connect this recipe to OpenClaw, follow this guide:Defaults
The recipe comes preconfigured with these defaults:- Model source:
unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL - Model alias:
qwen3.5-35b-a3b - Context size:
262144 - Parallel slots:
1 - Thinking: enabled by default
- Sampling defaults:
temperature 0.6,top_p 0.95,min_p 0.0,top_k 20 - Authentication: disabled by default
temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference
payload.
Thinking Mode
When thinking is enabled, you can control it per request:- Add
/thinkto explicitly enable reasoning for that turn. - Add
/no_thinkto disable reasoning for that turn.
/think will no longer override
it.
Authentication
Require Container Gateway Authentication is available in the deployment form and is unchecked by default.- Disabled: anyone with the URL can call the API.
- Enabled: every request must include the
Salad-Api-Keyheader.
Example Request
For Technical Users
If you want to tune llama.cpp later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:LLAMA_ARG_HF_REPOto change the Hugging Face GGUF repoLLAMA_ARG_MODEL_URLto point directly to a.gguffileLLAMA_ARG_CTX_SIZEto change the context windowLLAMA_ARG_N_GPU_LAYERSto control GPU offloadLLAMA_ARG_N_PARALLELto change concurrency