Overview
ComfyUI is a powerful, modular inference engine for diffusion models. ComfyUI API wraps ComfyUI to provide:- Stateless REST API - Horizontally scalable, returns images/videos in responses or via webhooks
- Multiple storage backends - S3, Azure Blob, Hugging Face, or HTTP endpoints
- Dynamic model loading - Download and cache models on-demand via URLs
- Custom workflow endpoints - Create simplified APIs with TypeScript/JavaScript
- Production features - Health probes, warmup workflows, execution statistics
Choosing Your Deployment Pattern
Container Gateway (Recommended for most image generation)
- Simple setup with synchronous HTTP responses
- Images returned directly in the response body
- 100-second timeout limit (Cloudflare hard limit)
- Best for: SD 1.5, SDXL, Flux image generation (typically 2-30 seconds)
Job Queue (Required for video and long-running tasks)
- Asynchronous processing with webhooks or polling
- No timeout limit - handles multi-minute generation
- Automatic retries on node interruption
- Built-in request buffering and autoscaling
- Best for: Video generation, batch processing, complex multi-step workflows
Hardware Recommendations
| Model Type | Min VRAM | Recommended GPU | System RAM | Typical Time | Deployment |
|---|---|---|---|---|---|
| SD 1.5 | 8GB | RTX 3060 Ti | 12GB | 2-5s | Gateway |
| SDXL | 12GB | RTX 4070 Ti | 24GB | 5-15s | Gateway |
| SDXL + Refiner | 24GB | RTX 4090 | 30GB | 15-30s | Gateway |
| Flux (fp8) | 16GB | RTX 4090 | 24GB | 5-20s | Gateway |
| SD 3.5 | 16GB | RTX 4090 | 24GB | 5-15s | Gateway |
| Video (short) | 24GB | RTX 4090 | 30GB | 1-5 min | Job Queue |
| Video (long) | 24GB+ | RTX 4090/5090 | 32GB+ | 5-30 min | Job Queue |
Base Docker Images
Pre-built images are available from the GitHub Container Registry:comfy<version>-api<version>-torch<version>-cuda<version>-<runtime|devel>
| Tag | Use Case |
|---|---|
comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-runtime | Most image generation workloads |
comfy0.7.0-api1.16.1-torch2.8.0-cuda12.8-devel | Video generation (requires full CUDA toolkit) |
- Model directory:
/opt/ComfyUI/models - Custom nodes:
/opt/ComfyUI/custom_nodes - Workflows directory:
/workflows(for custom endpoints)
Building Your Image
We recommend using a manifest file to download models at container startup rather than baking them into your Docker image. Model registries (like Hugging Face and Civitai) are often faster than our container registry plus decompression time, and this approach keeps your images small and flexible.Recommended: Manifest-Based Dockerfile
Create amanifest.yaml file that specifies your models and custom nodes:
manifest.yaml
- apt packages - System dependencies
- pip packages - Python dependencies
- custom_nodes - From Comfy Registry (by name) or Git URLs
- models.before_start - Downloaded before accepting requests
- models.after_start - Downloaded in the background after startup
Manifest URL Sources
Models can be downloaded from multiple sources:| Source | URL Format | Notes |
|---|---|---|
| HTTP/HTTPS | https://example.com/model.safetensors | Direct downloads |
| Hugging Face | https://huggingface.co/org/repo/resolve/main/model.safetensors | Supports gated models with HF_TOKEN |
| Civitai | https://civitai.com/api/download/models/12345 | Use API download URLs |
| S3 | s3://bucket/path/model.safetensors | Requires AWS credentials |
| Azure | https://account.blob.core.windows.net/container/model.safetensors | Requires Azure credentials |
HF_TOKEN, AWS_ACCESS_KEY_ID, etc.) or use
per-request credentials.
Alternative: Baking Models Into Images
If you prefer to include models in your Docker image (useful for air-gapped environments or guaranteed availability):Dynamic Model Loading
In addition to the manifest, you can load models dynamically at runtime. This is useful for:- User-specified models per request
- A/B testing different model versions
- Multi-tenant deployments
Model URLs in Workflows
Specify a URL instead of a filename in your workflow, and the model will be automatically downloaded and cached:https://- Direct HTTP downloads (including Hugging Face and Azure Blob Storage)s3://bucket/path- S3-compatible storage
The /download Endpoint
Pre-download models before they’re needed:url- The URL to download frommodel_type- Where to save the model (e.g.,checkpoints,loras,vae,controlnet)filename- Optional filename override (defaults to URL basename)wait- Set totrueto wait for download to complete (default:false)
Credentials for Protected Sources
Credentials for model downloads and storage backends can be configured globally via environment variables, and optionally overridden per-request. Global credentials (environment variables):bearer, basic, header, query, s3.
LRU Caching
Limit local storage usage with automatic cache eviction:Warmup Configuration
The first request to a cold instance is slow because models must be loaded into VRAM. Configure a warmup workflow to pre-load models before accepting traffic:- Create a warmup workflow JSON (use reduced steps/resolution for speed)
- Add it to your Dockerfile:
/ready probe will return 503 until warmup completes, ensuring the load balancer only routes to warm instances.
Tip: Use 1-4 steps and small dimensions (256x256) for fast warmups while still loading the model.
Output Handling
Base64 Response (Default)
Images are returned as base64-encoded strings in the response:Image Format Conversion
Convert output to JPEG or WebP with quality control:Storage Backends
Upload outputs directly to cloud storage and receive URLs in the response instead of base64 data. This is especially important for video generation where files are large. Credentials can be configured globally via environment variables (see Credentials for Protected Sources) or provided per-request. S3-Compatible Storage (AWS, Cloudflare R2, etc.):async: false(default) - Waits for upload to complete, returns URLs in responseasync: true- Returns immediately, uploads in background (use with webhooks)
Webhook Delivery
Send results to a webhook URL for async processing:Custom Workflow Endpoints
Create simplified REST endpoints by adding TypeScript/JavaScript files to the/workflows directory. This transforms
complex ComfyUI node graphs into clean, validated REST APIs with automatic Swagger documentation.
Why use custom endpoints?
- Hide ComfyUI’s node-based complexity from API consumers
- Add input validation with helpful error messages
- Set sensible defaults for optional parameters
- Create task-specific APIs (txt2img, img2img, upscale, etc.)
Basic Structure
Each workflow file must export aWorkflow object with:
RequestSchema- A Zod schema defining the input parametersgenerateWorkflow- A function that transforms input into a ComfyUI promptsummary/description- Optional metadata for Swagger docs
/workflows/txt2img.ts:
Directory Structure = URL Paths
The directory structure maps directly to endpoint URLs:The Config Object
Importconfig to access runtime information:
Adding to Your Dockerfile
eval, so no rebuild is needed when using the pre-built binary.
Calling Custom Endpoints
Custom endpoints wrap your input parameters in aninput field, and support all the same output options as /prompt:
input object contains your workflow-specific parameters (defined by RequestSchema), while output options like
s3, webhook, and convert_output remain at the root level.
AI-Assisted Workflow Generation
The comfyui-api repository includes a script to automatically generate workflow endpoints from ComfyUI JSON exports using Claude:Debugging
Enable debug logging to troubleshoot workflow loading:Local Testing
- Build your Docker image:
- Run locally with GPU access:
-
Access the interfaces:
- API: http://localhost:3000
- Swagger docs: http://localhost:3000/docs
- ComfyUI Web UI: http://localhost:8188
-
Design your workflow in the Web UI, then export it:
- Open Settings (gear icon)
- Enable “Dev mode Options”
- Click “Save (API Format)” to download the workflow JSON
- Test the API:
Deploying to SaladCloud
Container Gateway Deployment
- Push your image to a container registry:
-
Create a Container Group via the Portal or
API:
- Set appropriate hardware (see Hardware Recommendations)
- Enable Container Gateway on port 3000
- Use least-connection load balancing (recommended for inference workloads)
- Configure health probes:
- Startup probe:
GET /health - Readiness probe:
GET /ready
- Startup probe:
- Set replicas to 3 minimum (5+ for production)
- Optional: Enable authentication with
Salad-Api-Keyheader
-
Wait for deployment:
- Image caches to SaladCloud’s internal registry
- Nodes download and start your container
- Warmup workflow runs (if configured)
- Instances become “ready”
- Submit requests to your Access Domain Name.
Job Queue Deployment
For video generation or long-running tasks (anything that might exceed the 100-second gateway timeout), use the Job Queue system.How the Job Queue Works
Flow:- Submit jobs to the queue - Your client sends requests to the SaladCloud Job Queue API (not directly to your container)
- Queue buffers and distributes - Jobs are held in the queue and distributed to available container instances
- Worker fetches and processes - The job queue worker binary running in your container fetches jobs and forwards them to comfyui-api on localhost
- Results returned to queue - Outputs are sent back to the queue, where your client can fetch them or receive them via webhook
- Automatic retries - If a node is interrupted (common on SaladCloud), the job is automatically retried on another node (up to 3 times)
When to Use Job Queues
| Use Case | Gateway | Job Queue |
|---|---|---|
| Image generation (<30s) | Recommended | Works |
| Complex image workflows (30-90s) | Usually OK | Safer |
| Video generation (1-30 min) | Won’t work | Required |
| Batch processing | Not ideal | Recommended |
| Production with SLA requirements | Risky | Recommended |
Setup Steps
- Add the job queue worker to your Dockerfile:
- Create a Job Queue via the API:
- Create a Container Group connected to the queue (do not enable Container Gateway)
- Submit jobs to the queue:
- Receive results via webhook or poll the job status endpoint
API Reference
POST /prompt
Submit a ComfyUI workflow for execution. Request:prompt field contains your ComfyUI workflow JSON. See Output Handling for storage options.
Response:
GET /health
Returns 200 when the server has started. Use for startup probes.GET /ready
Returns 200 when ready to serve traffic (warmup complete, not overloaded). Returns 503 otherwise. Use for readiness probes.POST /download
Download a model to the local cache.GET /models
List available models by type (checkpoints, loras, etc.).POST /interrupt
Cancel a running prompt by ID.Video Generation
ComfyUI API supports video generation models including:- LTX Video - Fast, Apache 2.0 licensed
- Hunyuan Video - High quality, open source
- CogVideoX - Text and image to video
- Mochi Video - Research model
- Cosmos 1.0 - NVIDIA’s video model
Next Steps
- ComfyUI Recipe - Deploy pre-configured images from the Portal
- Video Generation Guide - Complete video generation tutorial
- Job Queue Documentation - Async processing setup
- ComfyUI API GitHub - Full documentation and examples
- Salad Recipes Repository - Example Dockerfiles for popular models