Overview
This guide covers migrating workloads from Lambda Cloud On-Demand (Linux GPU virtual machines fromlambda.ai) to SaladCloud (containerized workloads running on Salad’s distributed GPU
network).
If you currently work on Lambda Cloud by launching an on-demand instance, SSH-ing in, installing dependencies, and
running notebooks or scripts, the SaladCloud equivalent is to build a container image that encodes that setup and
deploy it as a container group. Your code stays the same, only the runtime environment changes.
What Stays Exactly the Same
- Your application code, models, and algorithms remain unchanged
- Same Python libraries, PyTorch/TensorFlow frameworks, and CUDA operations
- Identical API patterns, data processing workflows, and model inference logic
- Same Docker containers if you’re already containerized
Key Differences (VMs vs. Containers)
Lambda Cloud (On-Demand):- You get a GPU VM with SSH access and open ports via firewall rules
- Default image is a pre-configured Ubuntu environment with ML tooling (Lambda Stack)
- Optional persistent Filesystems mount at
/lambda/nfs/<FILESYSTEM_NAME> - Instances start in minutes and are billed per hour
- You deploy a container image to a container group
- You can access replicas via web terminal or SSH (coming soon)
- Local disk is ephemeral; plan to persist state to external storage
- Containers run on distributed consumer GPUs worldwide
- Savings of up to 90% compared to Lambda Cloud pricing (workload dependent)
- Longer startup times but with significant cost advantages
- Consumer GPUs (RTX 5090, 4090, 3090 and more) instead of datacenter GPUs (H100, A100)
- Nodes are interruptible; design for resilience, similar to spot instances
pip install commands each time, you write those same commands once in a Dockerfile, Docker builds an
immutable image with everything pre-installed, and that same image runs consistently across all instances.
New to containerization? Check out our comprehensive getting started guide
for a step-by-step introduction to deploying on SaladCloud, or explore our
architectural overview to understand how
SaladCloud’s distributed GPU network works.
Product Comparison: Lambda vs. SaladCloud
Lambda Cloud offers three main products. Below we map each to its closest SaladCloud equivalent and assess migration fit.| Lambda Product | SaladCloud Equivalent | Fit |
|---|---|---|
| On-Demand Cloud: GPU VM instances with SSH, managed images, and optional add-ons | Salad Container Engine (SCE): Managed container service with Container Gateway for real-time inference or Job Queues for batch processing | ✅ Excellent for inference, batch, and rendering workloads. Interactive dev is different but supported via terminal/SSH. |
| 1-Click Clusters: Multi-node clusters for distributed training (Kubernetes or Slurm) | No equivalent | ❌ Not a perfect fit. SaladCloud does not support multi-node training with InfiniBand/NVLink interconnect patterns, but does support long-running jobs which can be adapted for training. |
| Private Cloud: Single-tenant clusters and low-level infrastructure control | No equivalent | ❌ Not a fit. SaladCloud is a distributed public cloud. |
Workload Mapping
| Workload Type | Best Fit on SaladCloud | Notes |
|---|---|---|
| Real-Time Inference (HTTP APIs, streaming tokens) | Container groups + Container Gateway | Ensure your app is IPv6-ready: Enabling IPv6 |
| Batch Processing (embeddings, transcription, rendering) | Job Queues or Kelpie + autoscaling | Prefer queues when cold starts and retries are acceptable |
| Interactive Development (SSH, JupyterLab, notebooks) | Container groups + terminal/SSH + Tailscale (optional) | Use VS Code Remote Dev or Tailscale basic |
| Single-Node Fine-Tuning (LoRA, QLoRA) | SCE + checkpointing (often with Kelpie patterns) | See Kelpie and Long-running tasks |
| Distributed Training (multi-node) | Keep on Lambda (1-Click Clusters) | SaladCloud does not support multi-node interconnects at the moment |
Hybrid Strategy: Many teams keep multi-node distributed training on Lambda Cloud, then deploy the trained model to
SaladCloud for cost-effective production inference.
Cost Comparison
SaladCloud pricing shown below is for the Batch priority tier (lowest cost). Higher priority tiers cost more but provide greater availability guarantees.| Use Case | Lambda GPU | Lambda $/hr | SaladCloud GPU | SaladCloud $/hr | Savings |
|---|---|---|---|---|---|
| LLM Inference (7–13B) | A10 (24 GB) | ~$0.86 | RTX 4090 (24 GB) | ~$0.16 | ~81% |
| Image Generation (SD/FLUX) | A10 (24 GB) | ~$0.86 | RTX 3090 (24 GB) | ~$0.09 | ~90% |
| Transcription (Whisper) | Quadro RTX 6000 (24 GB) | ~$0.58 | RTX 3090 (24 GB) | ~$0.09 | ~84% |
| Batch Embeddings | A100 PCIe (40 GB) | ~$1.48 | RTX 4090 (24 GB) | ~$0.16 | ~89% |
| LLM Inference (34B–70B quant.) | A6000 (48 GB) | ~$0.92 | RTX 5090 (32 GB) | ~$0.25 | ~73% |
- Lambda (A10): 100 × 2,580/month
- SaladCloud (RTX 4090, batch): 100 × 480/month
Prices are approximate and subject to change. Check salad.com/pricing and
lambda.ai/pricing for current rates.
Step-by-Step Migration
1. Inventory What Runs on Lambda Cloud
For each workload, capture:- Work type: notebook/dev, training, batch processing, or real-time inference
- Data size and where it lives today (Lambda Filesystem, local disk, S3, etc.)
- Startup cost: installs, model downloads, compilation steps
- Whether the workload must survive interruptions (checkpointing requirements)
2. Turn Your VM Setup Into a Container Image
In Lambda Cloud you often “bootstrap” a VM by runningapt-get, pip install, cloning a repo, and launching your app.
On SaladCloud, put those steps into a Dockerfile so every replica starts the same way.
Good starting points:
Practical tips:
- Prefer reproducible builds (pin dependencies where reasonable)
- Keep images as small as possible to reduce startup time
- Prebuild model into the image or download on startup
3. Handle Data and Persistence (Replacing Lambda Filesystems)
Lambda Cloud Filesystems are NFS-mounted for sharing datasets and checkpoints across instances. On SaladCloud mounting is not supported. Recommended approach:- Store datasets, checkpoints, and outputs in cloud object storage (S3-compatible is common)
- Download inputs at startup (or per job), and upload outputs/checkpoints periodically if needed
4. Deploy on SaladCloud
Deploy your image as a container group and configure:- GPU/CPU/RAM resources appropriate for your workload
- Environment variables and secrets
- Health probes for services:
5. Scale and Operate
Common patterns:- Interactive: 1 replica, high priority, SSH/terminal access. Recommended for dev and debugging, not production workloads, since nodes are interruptible.
- Inference service: N replicas, gateway load balancing, readiness checks
- Batch: Job Queues + autoscaling based on queue depth / Kelpie for long-running jobs
6. Cut Over Safely
- Deploy on SaladCloud and run in parallel
- Replay a small sample of production events (shadow traffic) and compare outputs
- Gradually shift traffic/jobs (canary) and monitor performance
Real-Time Inference: End-to-End Walkthrough
Here is a concrete walkthrough for migrating a FastAPI inference service from Lambda to SaladCloud.Capture your Lambda environment
On your Lambda instance, document everything you need to reproduce the setup:
Create a Dockerfile
Convert your manual Lambda setup into a reproducible Dockerfile. Here’s a minimal example for a FastAPI server:
Build and push your image
Deploy on SaladCloud
Deploy via the SaladCloud Portal or the
API:
- Image Source:
your-registry/your-inference:latest - GPU Class: Select based on your model’s VRAM needs (e.g., RTX 4090 for 24 GB, RTX 5090 for 32 GB)
- Replicas: Start with 3+ for production reliability
- Networking: Enable Container Gateway for an HTTPS endpoint
- Health Probes: Configure health probes for your server
- Environment Variables: Pass any config (model paths, API keys, etc.)
Batch Processing: Lambda to SaladCloud Job Queues
If you’re running batch processing on Lambda, SaladCloud’s Job Queues provide automatic job distribution and retries.Before (Lambda)
- Spin up one or more GPU instances
- SSH in and install dependencies
- Upload data or connect to storage
- Run processing scripts
- Download results
- Terminate instances (avoid idle billing)
After (SaladCloud)
- Containerize your processing logic as an HTTP API
- Deploy on SaladCloud with Job Queues enabled
- Submit jobs via the Job Queue API
- Jobs are distributed across healthy GPUs and retried on transient failures
- Retrieve results via API polling
- Autoscale replicas based on queue depth
Converting Your Script to an HTTP API
Your batch script needs to be wrapped as an HTTP endpoint that the Job Queue Worker can call:Autoscaling: Enable autoscaling to automatically
scale replicas based on queue depth.
Storage Migration
Lambda offers persistent networked filesystems that mount at/lambda/nfs/. SaladCloud containers are ephemeral, so
you must use external cloud storage for durable data.
Recommended Storage Solutions
| Provider | Advantages | Best For |
|---|---|---|
| Cloudflare R2 | S3-compatible; no egress fees | Many workloads, large outputs |
| AWS S3 | Mature ecosystem; extensive tooling | Teams already on AWS |
| Backblaze B2 | Low cost; S3-compatible | Large datasets, archival |
| Salad Simple Storage (S4) | Simple service; clear limits | Temporary assets (smaller files) |
Migrating Data from Lambda Filesystems
Before terminating your Lambda instances, copy data to cloud storage:Accessing Cloud Storage from SaladCloud Containers
Pass storage credentials via environment variables in your container configuration:Best Practices
Design for Interruptions
SaladCloud nodes are interruptible, similar to spot instances but unlike AWS spot instances, there are no advance warning notifications. Your application should:- Be stateless: pull inputs from cloud storage, push results back
- Checkpoint periodically: save progress to cloud storage for long-running tasks
- Use Job Queues for automatic retry on interruptions
- Run 3+ replicas minimum for production real-time services
Implement Health Probes
Configure health probes so SaladCloud can manage your container lifecycle:- Startup probe: Runs during initialization; prevents premature traffic
- Readiness probe: Signals when your container is ready to accept work
- Liveness probe: Detects if your container has become unresponsive
Debugging Running Containers
Use External Logging
SaladCloud provides built-in container logs, but you can also integrate an external logging provider:Optimize Container Images
- Use multi-stage Docker builds to minimize image size
- Remove unnecessary build artifacts
- Keep image size reasonable - larger images mean slower cold starts across the distributed fleet
Migration Checklist
Pre-Migration
- Identify Lambda workloads that are candidates (inference, batch, rendering)
- Document current environment: GPU type, libraries, model sizes, storage usage
- Verify your model fits within your target GPU VRAM
- Create a SaladCloud account
- Set up cloud storage if needed (S3-compatible recommended)
Migration
- Write a Dockerfile capturing your Lambda environment (or deploy a recipe)
- Build and test locally if possible with
docker run --gpus all - Push image to a container registry
- Migrate data from Lambda filesystems to cloud storage
- Deploy on SaladCloud: select GPU class, configure replicas, enable gateway or job queues
- Configure health probes
Post-Migration
- Validate output quality against Lambda baseline
- Monitor performance: latency, throughput, error rates
- Tune replica count and priority tier based on usage
- Terminate Lambda instances once SaladCloud deployment is stable
Further Resources
- SaladCloud Quickstart
- Architectural Overview
- Container Gateway Guide
- Job Queue Documentation
- Autoscaling
- High-Performance Storage Solutions
- Migrate from RunPod
- Migrate from Vast.ai
- AWS Batch to SaladCloud Migration Guide