Segment Anything Model 3 (SAM3) is Meta’s third-generation foundation model for image segmentation. With 848 million
parameters, SAM3 introduces Promptable Concept Segmentation (PCS) - a breakthrough capability that allows you to
segment objects using natural language descriptions, not just coordinates.Inference is powered by Ultralytics SAM3, providing a simple API for
zero-shot image segmentation.
When deploying this recipe, you can optionally enable authentication in the container gateway. If you enable
authentication, all requests to your API will need to include your SaladCloud API key in the header Salad-Api-Key. See
the documentation for more information about authentication.
The recipe is configured for 3 replicas by default, and we recommend using at least 3 for testing, and at least 5 for
production workloads. SaladCloud’s distributed GPU cloud is powered by idle gaming PCs around the world. A consequence
of this unique infrastructure is that all nodes must be considered interruptible without warning. This means you may
want to slightly over-provision the capacity you expect to need in order to have adequate coverage during node
reallocations. Don’t worry, we only charge for instances that are actually running.
SaladCloud offers a simple built-in method to view logs from the portal, to facilitate testing and development. For
production workloads, we highly recommend connecting an external logging source, such as Axiom. This can be done during
container group creation.
When you deploy the recipe, SaladCloud will find the desired number of qualified nodes, and begin the process of
downloading the container image and SAM3 model weights.Once at least 1 instance is running and passing health checks, the container group will be considered running. For
production, wait until an adequate number of nodes have become ready before moving traffic over.
SAM3 requires approximately 4-8GB of VRAM for inference. The default configuration uses RTX 4060Ti GPUs with 16GB VRAM,
which provides ample headroom for the model.For high-throughput workloads, you may want to consider:
Using GPUs with more VRAM for batch processing
Increasing replica count for parallel request handling