Last Updated: April 7, 2026Documentation Index
Fetch the complete documentation index at: https://docs.salad.com/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This recipe runsGemma 4 31B IT with the official llama.cpp CUDA server. The
model downloads automatically on first startup, the built-in llama.cpp web UI is available at your deployment URL, and
the container exposes an OpenAI-compatible API for tools such as OpenClaw, OpenCode, and other compatible clients.
This recipe is designed to be easy for nontechnical users:
- the model is already chosen for you
- it is public by default, so you can test it immediately after deployment
- it includes the built-in llama.cpp web UI
- it works with OpenAI-compatible apps and agent tools
Quick Start
- Open the SaladCloud Portal.
- Deploy the Gemma 4 31B IT (llama.cpp) recipe.
- Enter a Container Group Name.
- Decide whether to enable Require Container Gateway Authentication:
- Disabled: public access.
- Enabled: requests must include your SaladCloud API key.
- Deploy and wait for the first startup to finish.
The model is downloaded from Hugging Face at startup, so it can take several minutes before the deployment becomes
ready.
Use With Agentic Tools
This recipe exposes an OpenAI-compatible API, so you can connect tools such as OpenClaw, OpenCode, Cline, Cursor, and other compatible clients. Useful setup guides:Defaults
The recipe comes preconfigured with these defaults:- Model source:
unsloth/gemma-4-31B-it-GGUF - Model file:
gemma-4-31B-it-UD-Q4_K_XL.gguf - Model alias:
gemma-4-31b-it - Context size:
262144 - GPU offload:
auto - Parallel slots:
1 - KV cache types:
q8_0 / q8_0 - Sampling defaults:
temperature 1.0,top_p 0.95,min_p 0.0,top_k 64 - Authentication: disabled by default
temperature, top_p, and min_p are startup defaults. You can still override them per request in your inference
payload.
Thinking Mode
Reasoning is controlled per request. To turn reasoning on, start the system prompt with<|think|>:
Authentication
Require Container Gateway Authentication is available in the deployment form and is unchecked by default.- Disabled: anyone with the URL can call the API.
- Enabled: every request must include the
Salad-Api-Keyheader.
Example Request
Reasoning Request
For Technical Users
If you want to tune llama.cpp later, open the container group in the SaladCloud Portal and edit Advanced Configuration. Useful environment variables include:LLAMA_ARG_HF_REPOto use a different Hugging Face GGUF repoLLAMA_ARG_HF_FILEto select a specific file from that repoLLAMA_ARG_CTX_SIZEto change the context windowLLAMA_ARG_CACHE_TYPE_KandLLAMA_ARG_CACHE_TYPE_Vto tune KV cache memory useLLAMA_ARG_N_GPU_LAYERSto control GPU offloadLLAMA_ARG_N_PARALLELto change concurrency