Qwen3.5-9B with SGLang

Last Updated: March 24, 2026

Overview

This recipe runs Qwen3.5-9B with the official SGLang runtime on a Salad GPU. The model is downloaded automatically on first startup, and the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other compatible clients. This recipe is designed to be easy to use:

the model is already chosen for you
it is public by default, so you can test it immediately after deployment
it is preconfigured for Qwen reasoning and tool-calling
you can connect it to OpenClaw or any other OpenAI-compatible client

Quick Start

Open the SaladCloud Portal.
Open your Organization and click “Deploy Container Group”
Pick Qwen3.5-9B (SGLang) from the recipe list.
Enter a Container Group Name.
Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
Deploy and wait for the first startup to finish.

The model is downloaded from Hugging Face at startup, so it can take several minutes before the deployment becomes ready.

Once the container is ready, connect your client to https://<your-dns>.salad.cloud/v1/chat/completions.

Use With OpenClaw

If you want to connect this recipe to OpenClaw, follow this guide:

Use OpenClaw with a Salad-hosted LLM

Current Defaults

The recipe currently defaults to:

Model source: Qwen/Qwen3.5-9B
Served model name: qwen3.5-9b
Host bind: ::
Runtime image: lmsysorg/sglang:latest-runtime
Context length: 131072
Tensor parallel size: 1
Memory fraction: 0.8
Attention backend: triton
Reasoning parser: qwen3
Tool call parser: qwen3_coder
Authentication: disabled by default

This follows the official Qwen SGLang launch pattern for Qwen/Qwen3.5-9B, with a triton attention backend selected for better compatibility on Salad GPU nodes.

Thinking Mode

Qwen thinking is enabled by default. If you want a direct response for a specific request, include this field in the JSON body:

"chat_template_kwargs": {"enable_thinking": false}

Authentication

Require Container Gateway Authentication is available in the deployment form and is unchecked by default.

Disabled: anyone with the URL can call the API.
Enabled: every request must include the Salad-Api-Key header.

If you enable authentication, see Sending Requests for the header format.

Example Request

curl https://<your-dns>.salad.cloud/v1/chat/completions \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "qwen3.5-9b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Write a short explanation of mixture-of-experts models."}
    ],
    "temperature": 0.6,
    "top_p": 0.95,
    "top_k": 20,
    "max_tokens": 512
  }'

If you enabled authentication during deployment, add:

-H 'Salad-Api-Key: <api-key>'

For Technical Users

If you want to change the model or runtime settings later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:

SGLANG_MODEL_PATH
SGLANG_SERVED_MODEL_NAME
SGLANG_CONTEXT_LENGTH
SGLANG_MEM_FRACTION_STATIC
SGLANG_TP_SIZE

For more detail, see:

Documentation Index

​Overview

​Quick Start

​Use With OpenClaw

​Current Defaults

​Thinking Mode

​Authentication

​Example Request

​For Technical Users

​Source Code

Overview

Quick Start

Use With OpenClaw

Current Defaults

Thinking Mode

Authentication

Example Request

For Technical Users

Source Code