Use Hermes Agent with SaladCloud

Last Updated: May 18, 2026

Introduction

Hermes Agent is an open-source AI agent by Nous Research with persistent memory, 40+ built-in tools, MCP integration, and a self-improvement loop that learns from every task it completes. It supports custom OpenAI-compatible endpoints, making it straightforward to point at a model on SaladCloud. Hermes Agent pairs especially well with SaladCloud because:

OpenAI-compatible endpoint support - connects to any SaladCloud deployment via the built-in setup wizard
Rich MCP integration - connect GitHub, Webflow, Playwright, and any other MCP server
Self-improving - the agent writes structured records of what worked and what didn’t, improving future task performance automatically

Hermes Agent works with SaladCloud in two ways:

Salad AI Gateway - no infrastructure to deploy or manage. Sign up for access, use your Salad API key with standard authentication. Currently in closed beta with monthly flat-rate access.
Self-hosted model - deploy your own LLM recipe on SaladCloud Container Engine, billed per hour. Note: Hermes Agent cannot pass custom headers, so authentication must be disabled on self-hosted deployments.

Prerequisites

Before getting started, make sure you have:

A SaladCloud account
Linux, macOS, or Windows (WSL2)
Python 3.10+ and Node.js 18+ (the installer handles these automatically)

Step-by-Step Setup

Step 1: Choose Your Backend

Salad AI Gateway
Self-Hosted on SaladCloud

Salad AI Gateway is the recommended option for Hermes Agent - it uses standard Bearer token authentication, which Hermes Agent supports natively.

Sign up for early access at salad.com/ai-gateway.
Once approved, find your Salad API key in the portal.

Available models:

Model	Description
`qwen3.6-35b-a3b`	Qwen 3.6 35B-A3B - best for agentic tasks, coding, and complex reasoning
`qwen3.6-27b`	Qwen 3.6 27B - strong balance of capability and speed
`qwen3.5-9b`	Qwen 3.5 9B - fastest response times, suited for lighter tasks

Deploy an OpenAI-compatible LLM server on SaladCloud with the Authentication toggle disabled.

Hermes Agent does not support custom HTTP headers for the main LLM provider. SaladCloud uses a Salad-Api-Key header for authentication, which Hermes Agent cannot pass yet. Deploy your SaladCloud container group with authentication disabled when using Hermes Agent or create a custom proxy to handle authentication.

Go to the SaladCloud portal and create an account if you do not already have one.
Create an organization or choose an existing one, then click “Deploy a container group”.
Select an LLM recipe. For the best experience with Hermes Agent’s tool-calling capabilities, use vLLM with a Hermes-format model, or the Qwen3.6-35B-A3B (llama.cpp) recipe.
Leave Authentication disabled - Hermes Agent cannot pass a Salad-Api-Key header at the moment.
Once deployed, note your API endpoint URL from SaladCloud (e.g., https://your-endpoint.salad.cloud). The OpenAI-compatible API will be available at {endpoint}/v1.

Available recipes:Ready-to-deploy recipes (best for less technical users):

qwen3.6-35B-A3B - Mixture of Experts model optimized for instruction-following and tool-calling tasks.
qwen3.5-9b-llama-cpp - Optimized for Qwen3.5 9B model.

Recipes for custom deployments (best for advanced users):

vllm - Recommended; enable Hermes tool calling with --enable-auto-tool-choice --tool-call-parser hermes
llama.cpp - Supports GGUF models
ollama - Simple model management

Step 2: Install Hermes Agent

Run the official installer - it handles Python, Node.js, dependencies, and the hermes command automatically:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

Reload your shell, then verify:

source ~/.bashrc   # or ~/.zshrc on macOS
hermes --version

Step 3: Run the Setup Wizard

Run the interactive setup wizard, which handles all model configuration:

hermes setup

When the wizard asks for a model provider, select Custom OpenAI-compatible endpoint and fill in:

Salad AI Gateway
Self-Hosted on SaladCloud

Base URL: https://ai.salad.cloud/v1
API key: your Salad API key
Context length: 262144

The wizard will auto-detect the available models from your endpoint.

Base URL: https://your-endpoint.salad.cloud/v1
API key: dummy (any non-empty string - the endpoint has no auth)
Context length: 262144

The wizard will auto-detect the available models from your endpoint, so you do not need to specify a model name manually.

It also configures memory, tools, terminal backend, and optional messaging channels - you can accept defaults for everything and adjust later via hermes config edit, which opens ~/.hermes/config.yaml.

Hermes Agent can import existing OpenClaw settings automatically if you have previously configured it.

Step 4: Verify Your Configuration

Before testing, confirm your setup by checking the configuration file:

hermes config show

Verify that:

model.base_url points to your endpoint with /v1 appended
model.api_key is set to your Salad API key (AI Gateway) or any non-empty value (self-hosted)
model.default is detected correctly from your endpoint

If anything looks incorrect, edit the config:

hermes config edit

Step 5: Test the Connection

Launch Hermes Agent:

hermes chat

Test with a simple task:

“Summarize what SaladCloud is in two sentences.”

If Hermes Agent responds, your setup is complete.

Built-in Tools

Hermes Agent ships with 40+ built-in tools out of the box - no MCP servers or plugins required for most common tasks.

Web & Search

Tool	What it does
`web_search`	Search the web and return results
`web_extract`	Extract and parse content from any URL
`browser_navigate`	Control a real browser to navigate pages
`browser_snapshot`	Capture the current state of a browser page
`browser_vision`	Interact with pages using vision (click, read, etc.)

Files & Terminal

Tool	What it does
`terminal`	Execute shell commands
`process`	Start and manage background processes
`read_file`	Read file contents
`patch`	Modify files with targeted edits
`execute_code`	Run code in an isolated environment

Media & Voice

Tool	What it does
`text_to_speech`	Convert text to spoken audio
`vision_analyze`	Analyze and describe images
`image_generate`	Create images via an image generation model

Memory & Tasks

Tool	What it does
`memory`	Read and write persistent memory across sessions
`session_search`	Search past conversation sessions
`todo`	Create and manage task lists
`cronjob`	Schedule recurring tasks
`clarify`	Ask the user a clarifying question before proceeding

Agent Orchestration

Tool	What it does
`delegate_task`	Spin up a subagent to handle a subtask in parallel
`send_message`	Send a message to a connected messaging channel

Most tools are enabled by default. You can enable or disable individual tools with:

hermes tools

MCP Server Integration

Hermes Agent has first-class MCP support. Add any MCP server to ~/.hermes/config.yaml under mcp_servers. More details can be found in the official MCP documentation

Filesystem MCP

mcp_servers:
  filesystem:
    command: 'npx'
    args: ['-y', '@modelcontextprotocol/server-filesystem', '/path/to/your/project']

GitHub MCP

mcp_servers:
  github:
    command: 'npx'
    args: ['-y', '@modelcontextprotocol/server-github']
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: your-github-token

Playwright MCP (Browser Automation)

mcp_servers:
  playwright:
    command: 'npx'
    args: ['-y', '@playwright/mcp@latest']

Webflow MCP

mcp_servers:
  webflow:
    command: 'npx'
    args: ['-y', 'webflow-mcp-server@latest']
    env:
      WEBFLOW_TOKEN: your-webflow-token

Remote HTTP MCP Server

mcp_servers:
  remote_api:
    url: 'https://mcp.example.com/mcp'
    headers:
      Authorization: 'Bearer your-token'

After editing the config, reload MCP servers without restarting:

/reload-mcp

You can also filter which tools each server exposes:

mcp_servers:
  github:
    command: 'npx'
    args: ['-y', '@modelcontextprotocol/server-github']
    env:
      GITHUB_PERSONAL_ACCESS_TOKEN: your-token
    tools:
      include: [create_issue, list_issues, get_file_contents]

Configure Memory

Hermes Agent’s persistent memory is one of its strongest features. It works out of the box but can be tuned:

memory:
  memory_enabled: true
  memory_char_limit: 2200 # ~800 tokens

Increase memory_char_limit if the agent is working on large projects and needs to retain more context between sessions.

Use Docker for Tool Execution

For safer code execution and better isolation, set the terminal backend to Docker:

terminal:
  backend: docker
  docker_image: 'nikolaik/python-nodejs:python3.11-nodejs20'

Model Recommendations

Qwen 3.6-35B-A3B with llama.cpp: Good for conversational tasks and simpler agentic workflows
Qwen 3.5-9B: Suitable for lightweight tasks; less reliable for complex multi-step tool chains

Documentation Index

​Introduction

​Prerequisites

​Step-by-Step Setup

​Step 1: Choose Your Backend

​Step 2: Install Hermes Agent

​Step 3: Run the Setup Wizard

​Step 4: Verify Your Configuration

​Step 5: Test the Connection

​Built-in Tools

​Web & Search

​Files & Terminal

​Media & Voice

​Memory & Tasks

​Agent Orchestration

​MCP Server Integration

​Filesystem MCP

​GitHub MCP

​Playwright MCP (Browser Automation)

​Webflow MCP

​Remote HTTP MCP Server

​Configure Memory

​Use Docker for Tool Execution

​Model Recommendations

Introduction

Prerequisites

Step-by-Step Setup

Step 1: Choose Your Backend

Step 2: Install Hermes Agent

Step 3: Run the Setup Wizard

Step 4: Verify Your Configuration

Step 5: Test the Connection

Built-in Tools

Web & Search

Files & Terminal

Media & Voice

Memory & Tasks

Agent Orchestration

MCP Server Integration

Filesystem MCP

GitHub MCP

Playwright MCP (Browser Automation)

Webflow MCP

Remote HTTP MCP Server

Configure Memory

Use Docker for Tool Execution

Model Recommendations