Deploy Image and Video Generation with ComfyUI

Last Updated: January 8, 2026 This guide covers deploying any ComfyUI workflow as a production API on SaladCloud, including image generation models (Stable Diffusion 1.5, SDXL, SD 3.5, Flux) and video generation models (LTX Video, Hunyuan, CogVideoX, Mochi, Cosmos).

Overview

ComfyUI is a powerful, modular inference engine for diffusion models. ComfyUI API wraps ComfyUI to provide:

Stateless REST API - Horizontally scalable, returns images/videos in responses or via webhooks
Multiple storage backends - S3, Azure Blob, Hugging Face, or HTTP endpoints
Dynamic model loading - Download and cache models on-demand via URLs
Custom workflow endpoints - Create simplified APIs with TypeScript/JavaScript
Production features - Health probes, warmup workflows, execution statistics

For full documentation, see the ComfyUI API GitHub repository.

Choosing Your Deployment Pattern

Container Gateway (Recommended for most image generation)

Simple setup with synchronous HTTP responses
Images returned directly in the response body
100-second timeout limit (Cloudflare hard limit)
Best for: SD 1.5, SDXL, Flux image generation (typically 2-30 seconds)

Job Queue (Required for video and long-running tasks)

Asynchronous processing with webhooks or polling
No timeout limit - handles multi-minute generation
Automatic retries on node interruption
Built-in request buffering and autoscaling
Best for: Video generation, batch processing, complex multi-step workflows

See the Job Queue documentation for setup details, or the Video Generation Guide for a complete example.

Hardware Recommendations

Model Type	Min VRAM	Recommended GPU	System RAM	Typical Time	Deployment
SD 1.5	8GB	RTX 3060 Ti	12GB	2-5s	Gateway
SDXL	12GB	RTX 4070 Ti	24GB	5-15s	Gateway
SDXL + Refiner	24GB	RTX 4090	30GB	15-30s	Gateway
Flux (fp8)	16GB	RTX 4090	24GB	5-20s	Gateway
SD 3.5	16GB	RTX 4090	24GB	5-15s	Gateway
Video (short)	24GB	RTX 4090	30GB	1-5 min	Job Queue
Video (long)	24GB+	RTX 4090/5090	32GB+	5-30 min	Job Queue

Base Docker Images

Pre-built images are available from the GitHub Container Registry:

ghcr.io/saladtechnologies/comfyui-api:<tag>

Tag format: comfy<version>-api<version>-torch<version>-cuda<version>-<runtime|devel>

Tag	Use Case
`comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime`	Most image generation workloads
`comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-devel`	Video generation (requires full CUDA toolkit)

Browse all available tags on the GitHub Container Registry. Key directories in the image:

Model directory: /opt/ComfyUI/models
Custom nodes: /opt/ComfyUI/custom_nodes
Workflows directory: /workflows (for custom endpoints)

Building Your Image

We recommend using a manifest file to download models at container startup rather than baking them into your Docker image. Model registries (like Hugging Face and Civitai) are often faster than our container registry plus decompression time, and this approach keeps your images small and flexible.

Recommended: Manifest-Based Dockerfile

Create a manifest.yaml file that specifies your models and custom nodes:

manifest.yaml

# Custom nodes to install from the Comfy Registry
custom_nodes:
  - comfyui-impact-pack
  - comfyui-videohelpersuite

# Models to download
models:
  # Downloaded before the server starts accepting requests
  before_start:
    - url: https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors
      local_path: /opt/ComfyUI/models/checkpoints/flux1-schnell-fp8.safetensors
    - url: https://civitai.com/api/download/models/128713
      local_path: /opt/ComfyUI/models/checkpoints/dreamshaper_8.safetensors
  # Optional: Downloaded in the background after startup
  after_start:
    - url: https://huggingface.co/some/optional-model.safetensors
      local_path: /opt/ComfyUI/models/checkpoints/optional-model.safetensors

Then create a minimal Dockerfile:

FROM ghcr.io/saladtechnologies/comfyui-api:comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime

# Copy manifest for automatic model/node installation at startup
COPY manifest.yaml .
ENV MANIFEST=manifest.yaml

# Optional: Warmup workflow to pre-load models into VRAM
COPY warmup_workflow.json .
ENV WARMUP_PROMPT_FILE=warmup_workflow.json

CMD ["./comfyui-api"]

The manifest supports:

apt packages - System dependencies
pip packages - Python dependencies
custom_nodes - From Comfy Registry (by name) or Git URLs
models.before_start - Downloaded before accepting requests
models.after_start - Downloaded in the background after startup

Manifest URL Sources

Models can be downloaded from multiple sources:

Source	URL Format	Notes
HTTP/HTTPS	`https://example.com/model.safetensors`	Direct downloads
Hugging Face	`https://huggingface.co/org/repo/resolve/main/model.safetensors`	Supports gated models with `HF_TOKEN`
Civitai	`https://civitai.com/api/download/models/12345`	Use API download URLs
S3	`s3://bucket/path/model.safetensors`	Requires AWS credentials
Azure	`https://account.blob.core.windows.net/container/model.safetensors`	Requires Azure credentials

For authenticated sources, set the appropriate environment variables (HF_TOKEN, AWS_ACCESS_KEY_ID, etc.) or use per-request credentials.

Alternative: Baking Models Into Images

If you prefer to include models in your Docker image (useful for air-gapped environments or guaranteed availability):

FROM ghcr.io/saladtechnologies/comfyui-api:comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime

# Copy model into the image (increases image size significantly)
COPY my-model.safetensors /opt/ComfyUI/models/checkpoints/

# Install custom nodes at build time
RUN comfy node registry-install comfyui-impact-pack

CMD ["./comfyui-api"]

Note: This approach results in larger images and slower cold starts on SaladCloud.

Dynamic Model Loading

In addition to the manifest, you can load models dynamically at runtime. This is useful for:

User-specified models per request
A/B testing different model versions
Multi-tenant deployments

Model URLs in Workflows

Specify a URL instead of a filename in your workflow, and the model will be automatically downloaded and cached:

{
  "4": {
    "inputs": {
      "ckpt_name": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors"
    },
    "class_type": "CheckpointLoaderSimple"
  }
}

Supported URL schemes:

https:// - Direct HTTP downloads (including Hugging Face and Azure Blob Storage)
s3://bucket/path - S3-compatible storage

The /download Endpoint

Pre-download models before they’re needed:

# Asynchronous download (returns immediately with 202, default behavior)
curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors",
    "model_type": "checkpoints"
  }'

# Synchronous download (blocks until complete)
curl -X POST http://localhost:3000/download \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://huggingface.co/Comfy-Org/flux1-schnell/resolve/main/flux1-schnell-fp8.safetensors",
    "model_type": "checkpoints",
    "wait": true
  }'

Parameters:

url - The URL to download from
model_type - Where to save the model (e.g., checkpoints, loras, vae, controlnet)
filename - Optional filename override (defaults to URL basename)
wait - Set to true to wait for download to complete (default: false)

Credentials for Protected Sources

Credentials for model downloads and storage backends can be configured globally via environment variables, and optionally overridden per-request. Global credentials (environment variables):

# Hugging Face (for gated models)
ENV HF_TOKEN=hf_xxxxxxxxxxxxx

# AWS S3
ENV AWS_ACCESS_KEY_ID=your-access-key
ENV AWS_SECRET_ACCESS_KEY=your-secret-key
ENV AWS_REGION=us-east-1

# Azure Blob Storage
ENV AZURE_STORAGE_CONNECTION_STRING=your-connection-string

# Custom HTTP auth header (for any HTTP source)
ENV HTTP_AUTH_HEADER_NAME=Authorization
ENV HTTP_AUTH_HEADER_VALUE=Bearer your-token

Per-request credentials (override global): You can also provide credentials in individual requests, which override any global settings:

{
  "prompt": { "...": "..." },
  "credentials": [
    {
      "url_pattern": "https://huggingface.co/**",
      "auth": {
        "type": "bearer",
        "token": "hf_xxxxxxxxxxxxx"
      }
    },
    {
      "url_pattern": "s3://my-private-bucket/**",
      "auth": {
        "type": "s3",
        "access_key_id": "AKIAIOSFODNN7EXAMPLE",
        "secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
      }
    }
  ]
}

Supported authentication types: bearer, basic, header, query, s3.

LRU Caching

Limit local storage usage with automatic cache eviction:

ENV LRU_CACHE_SIZE_GB=50

When the cache exceeds this size, least-recently-used files are automatically deleted.

Warmup Configuration

The first request to a cold instance is slow because models must be loaded into VRAM. Configure a warmup workflow to pre-load models before accepting traffic:

Create a warmup workflow JSON (use reduced steps/resolution for speed)
Add it to your Dockerfile:

COPY warmup_workflow.json .
ENV WARMUP_PROMPT_FILE=warmup_workflow.json

The /ready probe will return 503 until warmup completes, ensuring the load balancer only routes to warm instances. Tip: Use 1-4 steps and small dimensions (256x256) for fast warmups while still loading the model.

Output Handling

Base64 Response (Default)

Images are returned as base64-encoded strings in the response:

{
  "id": "abc123",
  "images": ["base64encodeddata..."],
  "filenames": ["ComfyUI_00001.png"],
  "execution_time_ms": 2500
}

Decode with CLI tools:

jq -r '.images[0]' response.json | base64 -d > image.png

Image Format Conversion

Convert output to JPEG or WebP with quality control:

{
  "prompt": { "...": "..." },
  "convert_output": {
    "format": "jpeg",
    "options": {
      "quality": 85,
      "progressive": true
    }
  }
}

Storage Backends

Upload outputs directly to cloud storage and receive URLs in the response instead of base64 data. This is especially important for video generation where files are large. Credentials can be configured globally via environment variables (see Credentials for Protected Sources) or provided per-request. S3-Compatible Storage (AWS, Cloudflare R2, etc.):

{
  "prompt": { "...": "..." },
  "s3": {
    "bucket": "my-bucket",
    "prefix": "outputs/",
    "async": false
  }
}

Response returns S3 URLs instead of base64:

{
  "id": "abc123",
  "images": ["s3://my-bucket/outputs/abc123_ComfyUI_0.png"],
  "filenames": ["abc123_ComfyUI_0.png"],
  "execution_time_ms": 2500
}

Azure Blob Storage:

{
  "prompt": { "...": "..." },
  "azure_blob_upload": {
    "container": "my-container",
    "blob_prefix": "outputs/",
    "async": false
  }
}

Response returns Azure URLs:

{
  "id": "abc123",
  "images": ["https://your-account.blob.core.windows.net/my-container/outputs/abc123_ComfyUI_0.png"],
  "filenames": ["abc123_ComfyUI_0.png"]
}

Hugging Face:

{
  "prompt": { "...": "..." },
  "hf_upload": {
    "repo": "username/my-outputs",
    "path_prefix": "generated/"
  }
}

Upload behavior:

async: false (default) - Waits for upload to complete, returns URLs in response
async: true - Returns immediately, uploads in background (use with webhooks)

See the ComfyUI API documentation for all storage options.

Webhook Delivery

Send results to a webhook URL for async processing:

{
  "prompt": { "...": "..." },
  "webhook": "https://api.example.com/comfy-complete"
}

Webhooks follow the Standard Webhooks specification with HMAC signature verification. Configure a secret:

ENV WEBHOOK_SECRET=your-secret-here

Custom Workflow Endpoints

Create simplified REST endpoints by adding TypeScript/JavaScript files to the /workflows directory. This transforms complex ComfyUI node graphs into clean, validated REST APIs with automatic Swagger documentation. Why use custom endpoints?

Hide ComfyUI’s node-based complexity from API consumers
Add input validation with helpful error messages
Set sensible defaults for optional parameters
Create task-specific APIs (txt2img, img2img, upscale, etc.)

Basic Structure

Each workflow file must export a Workflow object with:

RequestSchema - A Zod schema defining the input parameters
generateWorkflow - A function that transforms input into a ComfyUI prompt
summary / description - Optional metadata for Swagger docs

Example: Create /workflows/txt2img.ts:

import { z } from 'zod'
import config from '../config'

const RequestSchema = z.object({
  prompt: z.string().describe('The positive prompt for image generation'),
  negative_prompt: z.string().optional().default('text, watermark').describe('The negative prompt'),
  width: z.number().int().min(256).max(2048).optional().default(1024).describe('Width of the generated image'),
  height: z.number().int().min(256).max(2048).optional().default(1024).describe('Height of the generated image'),
  steps: z.number().int().min(1).max(100).optional().default(20).describe('Number of sampling steps'),
  cfg_scale: z.number().min(0).max(20).optional().default(7).describe('Classifier-free guidance scale'),
  seed: z
    .number()
    .int()
    .optional()
    .default(() => Math.floor(Math.random() * 100000000000))
    .describe('Seed for reproducibility'),
  // Use config to get available models, samplers, schedulers
  sampler_name: config.samplers.optional().default('euler'),
  scheduler: config.schedulers.optional().default('normal'),
  checkpoint: z
    .string()
    .optional()
    .default(config.warmupCkpt || config.models.checkpoints.all[0]),
})

type Input = z.infer<typeof RequestSchema>

function generateWorkflow(input: Input) {
  return {
    '3': {
      inputs: {
        seed: input.seed,
        steps: input.steps,
        cfg: input.cfg_scale,
        sampler_name: input.sampler_name,
        scheduler: input.scheduler,
        denoise: 1,
        model: ['4', 0],
        positive: ['6', 0],
        negative: ['7', 0],
        latent_image: ['5', 0],
      },
      class_type: 'KSampler',
    },
    '4': {
      inputs: { ckpt_name: input.checkpoint },
      class_type: 'CheckpointLoaderSimple',
    },
    '5': {
      inputs: { width: input.width, height: input.height, batch_size: 1 },
      class_type: 'EmptyLatentImage',
    },
    '6': {
      inputs: { text: input.prompt, clip: ['4', 1] },
      class_type: 'CLIPTextEncode',
    },
    '7': {
      inputs: { text: input.negative_prompt, clip: ['4', 1] },
      class_type: 'CLIPTextEncode',
    },
    '8': {
      inputs: { samples: ['3', 0], vae: ['4', 2] },
      class_type: 'VAEDecode',
    },
    '9': {
      inputs: { filename_prefix: 'output', images: ['8', 0] },
      class_type: 'SaveImage',
    },
  }
}

export default {
  RequestSchema,
  generateWorkflow,
  summary: 'Text to Image',
  description: 'Generate an image from a text prompt',
}

Directory Structure = URL Paths

The directory structure maps directly to endpoint URLs:

/workflows
├── txt2img.ts              → POST /workflow/txt2img
├── img2img.ts              → POST /workflow/img2img
└── sdxl/
    ├── txt2img.ts          → POST /workflow/sdxl/txt2img
    ├── txt2img-refiner.ts  → POST /workflow/sdxl/txt2img-refiner
    └── img2img.ts          → POST /workflow/sdxl/img2img

The Config Object

Import config to access runtime information:

import config from '../config'

// Available models (scanned from /opt/ComfyUI/models at startup)
config.models.checkpoints.all // ['model1.safetensors', 'model2.safetensors']
config.models.loras.all // ['lora1.safetensors', ...]
config.models.controlnet.all // [...]

// Zod enums for validation
config.samplers // z.enum(['euler', 'euler_ancestral', 'heun', ...])
config.schedulers // z.enum(['normal', 'karras', 'exponential', ...])

// Warmup checkpoint (if WARMUP_PROMPT_FILE was set)
config.warmupCkpt // 'model-used-in-warmup.safetensors'

Adding to Your Dockerfile

COPY workflows /workflows

Workflow files are loaded at runtime via eval, so no rebuild is needed when using the pre-built binary.

Calling Custom Endpoints

Custom endpoints wrap your input parameters in an input field, and support all the same output options as /prompt:

curl -X POST http://localhost:3000/workflow/txt2img \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": "a beautiful sunset over mountains",
      "width": 1024,
      "height": 768
    },
    "s3": {
      "bucket": "my-bucket",
      "prefix": "outputs/"
    }
  }'

The input object contains your workflow-specific parameters (defined by RequestSchema), while output options like s3, webhook, and convert_output remain at the root level.

AI-Assisted Workflow Generation

The comfyui-api repository includes a script to automatically generate workflow endpoints from ComfyUI JSON exports using Claude:

# Requires ANTHROPIC_API_KEY environment variable
./generate-workflow workflow_api.json /workflows/my-workflow.ts

See the comfyui-api documentation for details.

Debugging

Enable debug logging to troubleshoot workflow loading:

ENV LOG_LEVEL=debug

This shows which workflows are loaded, transpilation status, and any errors. For a complete real-world example with video generation, see the Video Generation Guide.

Local Testing

Build your Docker image:

docker build -t my-comfyui .

Run locally with GPU access:

docker run -it --rm --gpus all \
  -p 3000:3000 \
  -p 8188:8188 \
  my-comfyui

Access the interfaces:
- API: http://localhost:3000
- Swagger docs: http://localhost:3000/docs
- ComfyUI Web UI: http://localhost:8188
Design your workflow in the Web UI, then export it:
- Open Settings (gear icon)
- Enable “Dev mode Options”
- Click “Save (API Format)” to download the workflow JSON
Test the API:

curl -X POST http://localhost:3000/prompt \
  -H "Content-Type: application/json" \
  -d '{"prompt": <your-workflow-json>}' \
  | jq -r '.images[0]' | base64 -d > output.png

Deploying to SaladCloud

Container Gateway Deployment

Push your image to a container registry:

docker push your-registry/your-image:tag

Create a Container Group via the Portal or API:
- Set appropriate hardware (see Hardware Recommendations)
- Enable Container Gateway on port 3000
- Use least-connection load balancing (recommended for inference workloads)
- Configure health probes:
  - Startup probe: GET /health
  - Readiness probe: GET /ready
- Set replicas to 3 minimum (5+ for production)
- Optional: Enable authentication with Salad-Api-Key header
Wait for deployment:
- Image caches to SaladCloud’s internal registry
- Nodes download and start your container
- Warmup workflow runs (if configured)
- Instances become “ready”
Submit requests to your Access Domain Name.

Job Queue Deployment

For video generation or long-running tasks (anything that might exceed the 100-second gateway timeout), use the Job Queue system.

How the Job Queue Works

Flow:

Submit jobs to the queue - Your client sends requests to the SaladCloud Job Queue API (not directly to your container)
Queue buffers and distributes - Jobs are held in the queue and distributed to available container instances
Worker fetches and processes - The job queue worker binary running in your container fetches jobs and forwards them to comfyui-api on localhost
Results returned to queue - Outputs are sent back to the queue, where your client can fetch them or receive them via webhook
Automatic retries - If a node is interrupted (common on SaladCloud), the job is automatically retried on another node (up to 3 times)

When to Use Job Queues

Use Case	Gateway	Job Queue
Image generation (<30s)	Recommended	Works
Complex image workflows (30-90s)	Usually OK	Safer
Video generation (1-30 min)	Won’t work	Required
Batch processing	Not ideal	Recommended
Production with SLA requirements	Risky	Recommended

Setup Steps

Add the job queue worker to your Dockerfile:

# Download and install the job queue worker
RUN wget https://github.com/SaladTechnologies/salad-cloud-job-queue-worker/releases/download/v0.5.0/salad-http-job-queue-worker_x86_64.tar.gz && \
    tar -xvf salad-http-job-queue-worker_x86_64.tar.gz && \
    rm salad-http-job-queue-worker_x86_64.tar.gz && \
    chmod +x salad-http-job-queue-worker

# Run both the worker and comfyui-api
CMD ["bash", "-c", "./salad-http-job-queue-worker & ./comfyui-api"]

Create a Job Queue via the API:

curl -X POST "https://api.salad.com/api/public/organizations/{org}/projects/{project}/queues" \
  -H "Salad-Api-Key: $SALAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "comfyui-jobs",
    "display_name": "ComfyUI Job Queue",
    "description": "Queue for ComfyUI image/video generation"
  }'

Create a Container Group connected to the queue (do not enable Container Gateway)
Submit jobs to the queue:

curl -X POST "https://api.salad.com/api/public/organizations/{org}/projects/{project}/queues/{queue}/jobs" \
  -H "Salad-Api-Key: $SALAD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "prompt": {"4": {"inputs": {}, "class_type": "CheckpointLoaderSimple"}},
      "s3": {
        "bucket": "my-bucket",
        "prefix": "outputs/"
      }
    },
    "webhook": "https://api.example.com/job-complete"
  }'

Receive results via webhook or poll the job status endpoint

See the Job Queue documentation for full API details, and the Video Generation Guide for a complete implementation example.

API Reference

POST /prompt

Submit a ComfyUI workflow for execution. Request:

{
  "id": "optional-request-id",
  "prompt": {},
  "webhook": "https://example.com/callback",
  "convert_output": {
    "format": "jpeg",
    "options": { "quality": 85 }
  },
  "credentials": [],
  "s3": { "bucket": "my-bucket", "prefix": "outputs/" }
}

The prompt field contains your ComfyUI workflow JSON. See Output Handling for storage options. Response:

{
  "id": "request-id",
  "images": ["base64..."],
  "filenames": ["output_00001.png"],
  "execution_time_ms": 2500,
  "execution_stats": {
    "comfy_time_ms": 2300,
    "preprocessing_ms": 100,
    "upload_time_ms": 100
  }
}

GET /health

Returns 200 when the server has started. Use for startup probes.

GET /ready

Returns 200 when ready to serve traffic (warmup complete, not overloaded). Returns 503 otherwise. Use for readiness probes.

POST /download

Download a model to the local cache.

GET /models

List available models by type (checkpoints, loras, etc.).

POST /interrupt

Cancel a running prompt by ID.

Video Generation

ComfyUI API supports video generation models including:

LTX Video - Fast, Apache 2.0 licensed
Hunyuan Video - High quality, open source
CogVideoX - Text and image to video
Mochi Video - Research model
Cosmos 1.0 - NVIDIA’s video model

Important: For video generation, we strongly recommend uploading outputs to cloud storage (S3, Azure, Hugging Face) rather than returning them as base64 strings. Video files are significantly larger than images and can cause memory issues or timeout problems when returned in the response body. Video generation typically takes 1-30 minutes depending on length and quality, so Job Queues are required. For a complete video generation tutorial, see the Video Generation Guide.

Next Steps

ComfyUI Recipe - Deploy pre-configured images from the Portal
Video Generation Guide - Complete video generation tutorial
Job Queue Documentation - Async processing setup
ComfyUI API GitHub - Full documentation and examples
Salad Recipes Repository - Example Dockerfiles for popular models

Documentation Index

​Overview

​Choosing Your Deployment Pattern

​Container Gateway (Recommended for most image generation)

​Job Queue (Required for video and long-running tasks)

​Hardware Recommendations

​Base Docker Images

​Building Your Image

​Recommended: Manifest-Based Dockerfile

​Manifest URL Sources

​Alternative: Baking Models Into Images

​Dynamic Model Loading

​Model URLs in Workflows

​The /download Endpoint

​Credentials for Protected Sources

​LRU Caching

​Warmup Configuration

​Output Handling

​Base64 Response (Default)

​Image Format Conversion

​Storage Backends

​Webhook Delivery

​Custom Workflow Endpoints

​Basic Structure

​Directory Structure = URL Paths

​The Config Object

​Adding to Your Dockerfile

​Calling Custom Endpoints

​AI-Assisted Workflow Generation

​Debugging

​Local Testing

​Deploying to SaladCloud

​Container Gateway Deployment

​Job Queue Deployment

​How the Job Queue Works

​When to Use Job Queues

​Setup Steps

​API Reference

​POST /prompt

​GET /health

​GET /ready

​POST /download

​GET /models

​POST /interrupt

​Video Generation

​Next Steps