Skip to main content
Last Updated: January 8, 2026 This guide covers deploying any ComfyUI workflow as a production API on SaladCloud, including image generation models (Stable Diffusion 1.5, SDXL, SD 3.5, Flux) and video generation models (LTX Video, Hunyuan, CogVideoX, Mochi, Cosmos).

Overview

ComfyUI is a powerful, modular inference engine for diffusion models. ComfyUI API wraps ComfyUI to provide:
  • Stateless REST API - Horizontally scalable, returns images/videos in responses or via webhooks
  • Multiple storage backends - S3, Azure Blob, Hugging Face, or HTTP endpoints
  • Dynamic model loading - Download and cache models on-demand via URLs
  • Custom workflow endpoints - Create simplified APIs with TypeScript/JavaScript
  • Production features - Health probes, warmup workflows, execution statistics
For full documentation, see the ComfyUI API GitHub repository.

Choosing Your Deployment Pattern

  • Simple setup with synchronous HTTP responses
  • Images returned directly in the response body
  • 100-second timeout limit (Cloudflare hard limit)
  • Best for: SD 1.5, SDXL, Flux image generation (typically 2-30 seconds)

Job Queue (Required for video and long-running tasks)

  • Asynchronous processing with webhooks or polling
  • No timeout limit - handles multi-minute generation
  • Automatic retries on node interruption
  • Built-in request buffering and autoscaling
  • Best for: Video generation, batch processing, complex multi-step workflows
See the Job Queue documentation for setup details, or the Video Generation Guide for a complete example.

Hardware Recommendations

Model TypeMin VRAMRecommended GPUSystem RAMTypical TimeDeployment
SD 1.58GBRTX 3060 Ti12GB2-5sGateway
SDXL12GBRTX 4070 Ti24GB5-15sGateway
SDXL + Refiner24GBRTX 409030GB15-30sGateway
Flux (fp8)16GBRTX 409024GB5-20sGateway
SD 3.516GBRTX 409024GB5-15sGateway
Video (short)24GBRTX 409030GB1-5 minJob Queue
Video (long)24GB+RTX 4090/509032GB+5-30 minJob Queue

Base Docker Images

Pre-built images are available from the GitHub Container Registry:
ghcr.io/saladtechnologies/comfyui-api:<tag>
Tag format: comfy<version>-api<version>-torch<version>-cuda<version>-<runtime|devel>
TagUse Case
comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtimeMost image generation workloads
comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-develVideo generation (requires full CUDA toolkit)
Browse all available tags on the GitHub Container Registry. Key directories in the image:
  • Model directory: /opt/ComfyUI/models
  • Custom nodes: /opt/ComfyUI/custom_nodes
  • Workflows directory: /workflows (for custom endpoints)

Building Your Image

We recommend using a manifest file to download models at container startup rather than baking them into your Docker image. Model registries (like Hugging Face and Civitai) are often faster than our container registry plus decompression time, and this approach keeps your images small and flexible. Create a manifest.yaml file that specifies your models and custom nodes:
manifest.yaml
# Custom nodes to install from the Comfy Registry
custom_nodes:
  - comfyui-impact-pack
  - comfyui-videohelpersuite

# Models to download
models:
  # Downloaded before the server starts accepting requests
  before_start:
    - url: https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors
      local_path: /opt/ComfyUI/models/checkpoints/flux1-schnell-fp8.safetensors
    - url: https://civitai.com/api/download/models/128713
      local_path: /opt/ComfyUI/models/checkpoints/dreamshaper_8.safetensors
  # Optional: Downloaded in the background after startup
  after_start:
    - url: https://huggingface.co/some/optional-model.safetensors
      local_path: /opt/ComfyUI/models/checkpoints/optional-model.safetensors
Then create a minimal Dockerfile:
FROM ghcr.io/saladtechnologies/comfyui-api:comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime

# Copy manifest for automatic model/node installation at startup
COPY manifest.yaml .
ENV MANIFEST=manifest.yaml

# Optional: Warmup workflow to pre-load models into VRAM
COPY warmup_workflow.json .
ENV WARMUP_PROMPT_FILE=warmup_workflow.json

CMD ["./comfyui-api"]
The manifest supports:
  • apt packages - System dependencies
  • pip packages - Python dependencies
  • custom_nodes - From Comfy Registry (by name) or Git URLs
  • models.before_start - Downloaded before accepting requests
  • models.after_start - Downloaded in the background after startup

Manifest URL Sources

Models can be downloaded from multiple sources:
SourceURL FormatNotes
HTTP/HTTPShttps://example.com/model.safetensorsDirect downloads
Hugging Facehttps://huggingface.co/org/repo/resolve/main/model.safetensorsSupports gated models with HF_TOKEN
Civitaihttps://civitai.com/api/download/models/12345Use API download URLs
S3s3://bucket/path/model.safetensorsRequires AWS credentials
Azurehttps://account.blob.core.windows.net/container/model.safetensorsRequires Azure credentials
For authenticated sources, set the appropriate environment variables (HF_TOKEN, AWS_ACCESS_KEY_ID, etc.) or use per-request credentials.

Alternative: Baking Models Into Images

If you prefer to include models in your Docker image (useful for air-gapped environments or guaranteed availability):
FROM ghcr.io/saladtechnologies/comfyui-api:comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime

# Copy model into the image (increases image size significantly)
COPY my-model.safetensors /opt/ComfyUI/models/checkpoints/

# Install custom nodes at build time
RUN comfy node registry-install comfyui-impact-pack

CMD ["./comfyui-api"]
Note: This approach results in larger images and slower cold starts on SaladCloud.

Dynamic Model Loading

In addition to the manifest, you can load models dynamically at runtime. This is useful for:
  • User-specified models per request
  • A/B testing different model versions
  • Multi-tenant deployments

Model URLs in Workflows

Specify a URL instead of a filename in your workflow, and the model will be automatically downloaded and cached:
{
  "4": {
    "inputs": {
      "ckpt_name": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors"
    },
    "class_type": "CheckpointLoaderSimple"
  }
}
Supported URL schemes:
  • https:// - Direct HTTP downloads (including Hugging Face and Azure Blob Storage)
  • s3://bucket/path - S3-compatible storage

The /download Endpoint

Pre-download models before they’re needed:
# Asynchronous download (returns immediately with 202, default behavior)
curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors",
    "model_type": "checkpoints"
  }'

# Synchronous download (blocks until complete)
curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors",
    "model_type": "checkpoints",
    "wait": true
  }'
Parameters:
  • url - The URL to download from
  • model_type - Where to save the model (e.g., checkpoints, loras, vae, controlnet)
  • filename - Optional filename override (defaults to URL basename)
  • wait - Set to true to wait for download to complete (default: false)

Credentials for Protected Sources

Credentials for model downloads and storage backends can be configured globally via environment variables, and optionally overridden per-request. Global credentials (environment variables):
# Hugging Face (for gated models)
ENV HF_TOKEN=hf_xxxxxxxxxxxxx

# AWS S3
ENV AWS_ACCESS_KEY_ID=your-access-key
ENV AWS_SECRET_ACCESS_KEY=your-secret-key
ENV AWS_REGION=us-east-1

# Azure Blob Storage
ENV AZURE_STORAGE_CONNECTION_STRING=your-connection-string

# Custom HTTP auth header (for any HTTP source)
ENV HTTP_AUTH_HEADER_NAME=Authorization
ENV HTTP_AUTH_HEADER_VALUE=Bearer your-token
Per-request credentials (override global): You can also provide credentials in individual requests, which override any global settings:
{
  "prompt": { "...": "..." },
  "credentials": [
    {
      "url_pattern": "https://huggingface.co/**",
      "auth": {
        "type": "bearer",
        "token": "hf_xxxxxxxxxxxxx"
      }
    },
    {
      "url_pattern": "s3://my-private-bucket/**",
      "auth": {
        "type": "s3",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      }
    }
  ]
}
Supported authentication types: bearer, basic, header, query, s3.

LRU Caching

Limit local storage usage with automatic cache eviction:
ENV LRU_CACHE_SIZE_GB=50
When the cache exceeds this size, least-recently-used files are automatically deleted.

Warmup Configuration

The first request to a cold instance is slow because models must be loaded into VRAM. Configure a warmup workflow to pre-load models before accepting traffic:
  1. Create a warmup workflow JSON (use reduced steps/resolution for speed)
  2. Add it to your Dockerfile:
COPY warmup_workflow.json .
ENV WARMUP_PROMPT_FILE=warmup_workflow.json
The /ready probe will return 503 until warmup completes, ensuring the load balancer only routes to warm instances. Tip: Use 1-4 steps and small dimensions (256x256) for fast warmups while still loading the model.

Output Handling

Base64 Response (Default)

Images are returned as base64-encoded strings in the response:
{
  "id": "abc123",
  "images": ["base64encodeddata..."],
  "filenames": ["ComfyUI_00001.png"],
  "execution_time_ms": 2500
}
Decode with CLI tools:
jq -r '.images[0]' response.json | base64 -d > image.png

Image Format Conversion

Convert output to JPEG or WebP with quality control:
{
  "prompt": { "...": "..." },
  "convert_output": {
    "format": "jpeg",
    "options": {
      "quality": 85,
      "progressive": true
    }
  }
}

Storage Backends

Upload outputs directly to cloud storage and receive URLs in the response instead of base64 data. This is especially important for video generation where files are large. Credentials can be configured globally via environment variables (see Credentials for Protected Sources) or provided per-request. S3-Compatible Storage (AWS, Cloudflare R2, etc.):
{
  "prompt": { "...": "..." },
  "s3": {
    "bucket": "my-bucket",
    "prefix": "outputs/",
    "async": false
  }
}
Response returns S3 URLs instead of base64:
{
  "id": "abc123",
  "images": ["s3://my-bucket/outputs/abc123_ComfyUI_0.png"],
  "filenames": ["abc123_ComfyUI_0.png"],
  "execution_time_ms": 2500
}
Azure Blob Storage:
{
  "prompt": { "...": "..." },
  "azure_blob_upload": {
    "container": "my-container",
    "blob_prefix": "outputs/",
    "async": false
  }
}
Response returns Azure URLs:
{
  "id": "abc123",
  "images": ["https://your-account.blob.core.windows.net/my-container/outputs/abc123_ComfyUI_0.png"],
  "filenames": ["abc123_ComfyUI_0.png"]
}
Hugging Face:
{
  "prompt": { "...": "..." },
  "hf_upload": {
    "repo": "username/my-outputs",
    "path_prefix": "generated/"
  }
}
Upload behavior:
  • async: false (default) - Waits for upload to complete, returns URLs in response
  • async: true - Returns immediately, uploads in background (use with webhooks)
See the ComfyUI API documentation for all storage options.

Webhook Delivery

Send results to a webhook URL for async processing:
{
  "prompt": { "...": "..." },
  "webhook": "https://api.example.com/comfy-complete"
}
Webhooks follow the Standard Webhooks specification with HMAC signature verification. Configure a secret:
ENV WEBHOOK_SECRET=your-secret-here

Custom Workflow Endpoints

Create simplified REST endpoints by adding TypeScript/JavaScript files to the /workflows directory. This transforms complex ComfyUI node graphs into clean, validated REST APIs with automatic Swagger documentation. Why use custom endpoints?
  • Hide ComfyUI’s node-based complexity from API consumers
  • Add input validation with helpful error messages
  • Set sensible defaults for optional parameters
  • Create task-specific APIs (txt2img, img2img, upscale, etc.)

Basic Structure

Each workflow file must export a Workflow object with:
  • RequestSchema - A Zod schema defining the input parameters
  • generateWorkflow - A function that transforms input into a ComfyUI prompt
  • summary / description - Optional metadata for Swagger docs
Example: Create /workflows/txt2img.ts:
import { z } from 'zod'
import config from '../config'

const RequestSchema = z.object({
  prompt: z.string().describe('The positive prompt for image generation'),
  negative_prompt: z.string().optional().default('text, watermark').describe('The negative prompt'),
  width: z.number().int().min(256).max(2048).optional().default(1024).describe('Width of the generated image'),
  height: z.number().int().min(256).max(2048).optional().default(1024).describe('Height of the generated image'),
  steps: z.number().int().min(1).max(100).optional().default(20).describe('Number of sampling steps'),
  cfg_scale: z.number().min(0).max(20).optional().default(7).describe('Classifier-free guidance scale'),
  seed: z
    .number()
    .int()
    .optional()
    .default(() => Math.floor(Math.random() * 100000000000))
    .describe('Seed for reproducibility'),
  // Use config to get available models, samplers, schedulers
  sampler_name: config.samplers.optional().default('euler'),
  scheduler: config.schedulers.optional().default('normal'),
  checkpoint: z
    .string()
    .optional()
    .default(config.warmupCkpt || config.models.checkpoints.all[0]),
})

type Input = z.infer<typeof RequestSchema>

function generateWorkflow(input: Input) {
  return {
    '3': {
      inputs: {
        seed: input.seed,
        steps: input.steps,
        cfg: input.cfg_scale,
        sampler_name: input.sampler_name,
        scheduler: input.scheduler,
        denoise: 1,
        model: ['4', 0],
        positive: ['6', 0],
        negative: ['7', 0],
        latent_image: ['5', 0],
      },
      class_type: 'KSampler',
    },
    '4': {
      inputs: { ckpt_name: input.checkpoint },
      class_type: 'CheckpointLoaderSimple',
    },
    '5': {
      inputs: { width: input.width, height: input.height, batch_size: 1 },
      class_type: 'EmptyLatentImage',
    },
    '6': {
      inputs: { text: input.prompt, clip: ['4', 1] },
      class_type: 'CLIPTextEncode',
    },
    '7': {
      inputs: { text: input.negative_prompt, clip: ['4', 1] },
      class_type: 'CLIPTextEncode',
    },
    '8': {
      inputs: { samples: ['3', 0], vae: ['4', 2] },
      class_type: 'VAEDecode',
    },
    '9': {
      inputs: { filename_prefix: 'output', images: ['8', 0] },
      class_type: 'SaveImage',
    },
  }
}

export default {
  RequestSchema,
  generateWorkflow,
  summary: 'Text to Image',
  description: 'Generate an image from a text prompt',
}

Directory Structure = URL Paths

The directory structure maps directly to endpoint URLs:
/workflows
├── txt2img.ts              → POST /workflow/txt2img
├── img2img.ts              → POST /workflow/img2img
└── sdxl/
    ├── txt2img.ts          → POST /workflow/sdxl/txt2img
    ├── txt2img-refiner.ts  → POST /workflow/sdxl/txt2img-refiner
    └── img2img.ts          → POST /workflow/sdxl/img2img

The Config Object

Import config to access runtime information:
import config from '../config'

// Available models (scanned from /opt/ComfyUI/models at startup)
config.models.checkpoints.all // ['model1.safetensors', 'model2.safetensors']
config.models.loras.all // ['lora1.safetensors', ...]
config.models.controlnet.all // [...]

// Zod enums for validation
config.samplers // z.enum(['euler', 'euler_ancestral', 'heun', ...])
config.schedulers // z.enum(['normal', 'karras', 'exponential', ...])

// Warmup checkpoint (if WARMUP_PROMPT_FILE was set)
config.warmupCkpt // 'model-used-in-warmup.safetensors'

Adding to Your Dockerfile

COPY workflows /workflows
Workflow files are loaded at runtime via eval, so no rebuild is needed when using the pre-built binary.

Calling Custom Endpoints

Custom endpoints wrap your input parameters in an input field, and support all the same output options as /prompt:
curl -X POST http://localhost:3000/workflow/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "a beautiful sunset over mountains",
      "width": 1024,
      "height": 768
    },
    "s3": {
      "bucket": "my-bucket",
      "prefix": "outputs/"
    }
  }'
The input object contains your workflow-specific parameters (defined by RequestSchema), while output options like s3, webhook, and convert_output remain at the root level.

AI-Assisted Workflow Generation

The comfyui-api repository includes a script to automatically generate workflow endpoints from ComfyUI JSON exports using Claude:
# Requires ANTHROPIC_API_KEY environment variable
./generate-workflow workflow_api.json /workflows/my-workflow.ts
See the comfyui-api documentation for details.

Debugging

Enable debug logging to troubleshoot workflow loading:
ENV LOG_LEVEL=debug
This shows which workflows are loaded, transpilation status, and any errors. For a complete real-world example with video generation, see the Video Generation Guide.

Local Testing

  1. Build your Docker image:
docker build -t my-comfyui .
  1. Run locally with GPU access:
docker run -it --rm --gpus all \
  -p 3000:3000 \
  -p 8188:8188 \
  my-comfyui
  1. Access the interfaces:
  2. Design your workflow in the Web UI, then export it:
    • Open Settings (gear icon)
    • Enable “Dev mode Options”
    • Click “Save (API Format)” to download the workflow JSON
  3. Test the API:
curl -X POST http://localhost:3000/prompt \
  -H "Content-Type: application/json" \
  -d '{"prompt": <your-workflow-json>}' \
  | jq -r '.images[0]' | base64 -d > output.png

Deploying to SaladCloud

Container Gateway Deployment

  1. Push your image to a container registry:
docker push your-registry/your-image:tag
  1. Create a Container Group via the Portal or API:
    • Set appropriate hardware (see Hardware Recommendations)
    • Enable Container Gateway on port 3000
    • Use least-connection load balancing (recommended for inference workloads)
    • Configure health probes:
      • Startup probe: GET /health
      • Readiness probe: GET /ready
    • Set replicas to 3 minimum (5+ for production)
    • Optional: Enable authentication with Salad-Api-Key header
  2. Wait for deployment:
    • Image caches to SaladCloud’s internal registry
    • Nodes download and start your container
    • Warmup workflow runs (if configured)
    • Instances become “ready”
  3. Submit requests to your Access Domain Name.

Job Queue Deployment

For video generation or long-running tasks (anything that might exceed the 100-second gateway timeout), use the Job Queue system.

How the Job Queue Works

Flow:
  1. Submit jobs to the queue - Your client sends requests to the SaladCloud Job Queue API (not directly to your container)
  2. Queue buffers and distributes - Jobs are held in the queue and distributed to available container instances
  3. Worker fetches and processes - The job queue worker binary running in your container fetches jobs and forwards them to comfyui-api on localhost
  4. Results returned to queue - Outputs are sent back to the queue, where your client can fetch them or receive them via webhook
  5. Automatic retries - If a node is interrupted (common on SaladCloud), the job is automatically retried on another node (up to 3 times)

When to Use Job Queues

Use CaseGatewayJob Queue
Image generation (<30s)RecommendedWorks
Complex image workflows (30-90s)Usually OKSafer
Video generation (1-30 min)Won’t workRequired
Batch processingNot idealRecommended
Production with SLA requirementsRiskyRecommended

Setup Steps

  1. Add the job queue worker to your Dockerfile:
# Download and install the job queue worker
RUN wget https://github.com/SaladTechnologies/salad-cloud-job-queue-worker/releases/download/v0.5.0/salad-http-job-queue-worker_x86_64.tar.gz && \
    tar -xvf salad-http-job-queue-worker_x86_64.tar.gz && \
    rm salad-http-job-queue-worker_x86_64.tar.gz && \
    chmod +x salad-http-job-queue-worker

# Run both the worker and comfyui-api
CMD ["bash", "-c", "./salad-http-job-queue-worker & ./comfyui-api"]
  1. Create a Job Queue via the API:
curl -X POST "https://api.salad.com/api/public/organizations/{org}/projects/{project}/queues" \
  -H "Salad-Api-Key: $SALAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "comfyui-jobs",
    "display_name": "ComfyUI Job Queue",
    "description": "Queue for ComfyUI image/video generation"
  }'
  1. Create a Container Group connected to the queue (do not enable Container Gateway)
  2. Submit jobs to the queue:
curl -X POST "https://api.salad.com/api/public/organizations/{org}/projects/{project}/queues/{queue}/jobs" \
  -H "Salad-Api-Key: $SALAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": {"4": {"inputs": {}, "class_type": "CheckpointLoaderSimple"}},
      "s3": {
        "bucket": "my-bucket",
        "prefix": "outputs/"
      }
    },
    "webhook": "https://api.example.com/job-complete"
  }'
  1. Receive results via webhook or poll the job status endpoint
See the Job Queue documentation for full API details, and the Video Generation Guide for a complete implementation example.

API Reference

POST /prompt

Submit a ComfyUI workflow for execution. Request:
{
  "id": "optional-request-id",
  "prompt": {},
  "webhook": "https://example.com/callback",
  "convert_output": {
    "format": "jpeg",
    "options": { "quality": 85 }
  },
  "credentials": [],
  "s3": { "bucket": "my-bucket", "prefix": "outputs/" }
}
The prompt field contains your ComfyUI workflow JSON. See Output Handling for storage options. Response:
{
  "id": "request-id",
  "images": ["base64..."],
  "filenames": ["output_00001.png"],
  "execution_time_ms": 2500,
  "execution_stats": {
    "comfy_time_ms": 2300,
    "preprocessing_ms": 100,
    "upload_time_ms": 100
  }
}

GET /health

Returns 200 when the server has started. Use for startup probes.

GET /ready

Returns 200 when ready to serve traffic (warmup complete, not overloaded). Returns 503 otherwise. Use for readiness probes.

POST /download

Download a model to the local cache.

GET /models

List available models by type (checkpoints, loras, etc.).

POST /interrupt

Cancel a running prompt by ID.

Video Generation

ComfyUI API supports video generation models including:
  • LTX Video - Fast, Apache 2.0 licensed
  • Hunyuan Video - High quality, open source
  • CogVideoX - Text and image to video
  • Mochi Video - Research model
  • Cosmos 1.0 - NVIDIA’s video model
Important: For video generation, we strongly recommend uploading outputs to cloud storage (S3, Azure, Hugging Face) rather than returning them as base64 strings. Video files are significantly larger than images and can cause memory issues or timeout problems when returned in the response body. Video generation typically takes 1-30 minutes depending on length and quality, so Job Queues are required. For a complete video generation tutorial, see the Video Generation Guide.

Next Steps