Documentation Index
Fetch the complete documentation index at: https://docs.salad.com/llms.txt
Use this file to discover all available pages before exploring further.
Last updated: May 10, 2025
Overview
GPU performance can vary over time with factors like utilization and temperature. To ensure your application runs
smoothly, it’s important to monitor these metrics and take action if they exceed certain thresholds. This can be
accomplished easily using python, nvidia-smi and the psutil library. We recommend instrumenting this as a background
process, so that it runs independently of your application. This way, you can monitor the performance of your
application without affecting its performance.
Getting GPU Stats
To get GPU stats, you can use the nvidia-smi command which is available by default in all gpu instances. This command
provides a wealth of information about the GPU, including utilization, memory usage, temperature, and more. You can see
the complete list of options by running nvidia-smi --help-query-gpu.
To get this information in python, we can use the subprocess library to run the command and capture its output. Here’s
an example of how to do this:
import subprocess
import csv
def get_cuda_version():
try:
# Run nvidia-smi -q command
nvidia_output = subprocess.run(
["nvidia-smi", "-q"],
capture_output=True,
text=True
).stdout
# Filter for lines containing "CUDA Version"
cuda_lines = [line.strip() for line in nvidia_output.split('\n')
if "CUDA Version" in line]
if cuda_lines:
# Extract version from the first matching line
# Typically in format " CUDA Version : 12.6"
return cuda_lines[0].split(':')[1].strip()
else:
raise ValueError("CUDA version not found in nvidia-smi output")
except Exception as e:
return f"An error occurred: {str(e)}"
def get_gpu_stats(stats_to_query: list = []):
"""
Run nvidia-smi with JSON output format and return the parsed data
All options: `nvidia-smi --help-query-gpu`
"""
# Run nvidia-smi with csv output format
result = subprocess.run(
["nvidia-smi",
f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
capture_output=True,
text=True,
check=True
)
cuda_version = get_cuda_version()
# Process the CSV output
csv_data = result.stdout.strip().splitlines()
reader = csv.DictReader(csv_data)
stats = []
for row in reader:
data = {key.strip(): value.strip()
for key, value in row.items()}
data["cuda_version"] = cuda_version
for key, value in data.items():
# Convert numeric values to appropriate types
if value.isdigit():
data[key] = int(value)
else:
try:
data[key] = float(value)
except ValueError:
pass
stats.append(data)
return stats
# Full list of options available with `nvidia-smi --help-query-gpu`
gpu_stats_to_query = [
"timestamp",
"index",
"gpu_name",
"driver_version",
"memory.total",
"memory.free",
"memory.used",
"memory.reserved",
"compute_cap",
"utilization.gpu",
"utilization.memory",
"temperature.gpu",
"temperature.memory",
"power.draw",
"power.limit"
]
gpu_stats = get_gpu_stats(gpu_stats_to_query)
print(gpu_stats)
You will get an output like this:
[
{
"timestamp": "2025/03/31 15:21:49.377",
"index": 0,
"name": "NVIDIA GeForce RTX 3080 Ti Laptop GPU",
"driver_version": 561.19,
"memory.total [MiB]": 16384,
"memory.free [MiB]": 14804,
"memory.used [MiB]": 1372,
"memory.reserved [MiB]": 209,
"compute_cap": 8.6,
"utilization.gpu [%]": 4,
"utilization.memory [%]": 12,
"temperature.gpu": 49,
"temperature.memory": "N/A",
"power.draw [W]": 21.09,
"power.limit [W]": "[N/A]",
"cuda_version": 12.6
}
]
Getting System Stats
Other system information, such as CPU and Memory utilization, can be obtained using the psutil library. This library
is a cross-platform library for retrieving information on running processes and system utilization. It is not included
by default in the base python installation, so you will need to install it separately. You can do this using pip:
Once you have psutil installed, you can use it to get system stats like CPU and Memory utilization. Here’s an example
of how to do this:
import psutil
def get_system_stats():
cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
cpu_freq = psutil.cpu_freq()
# Get memory stats
virtual_mem = psutil.virtual_memory()
swap_mem = psutil.swap_memory()
return {
"cpu_percent": cpu_percent,
"cpu_freq": cpu_freq,
"virtual_memory": {
"total": virtual_mem.total,
"available": virtual_mem.available,
"used": virtual_mem.used,
"free": virtual_mem.free,
"percent": virtual_mem.percent
},
"swap_memory": {
"total": swap_mem.total,
"used": swap_mem.used,
"free": swap_mem.free,
"percent": swap_mem.percent
}
}
system_stats = get_system_stats()
print(system_stats)
You will get an output like this:
{
"cpu_percent": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0],
"cpu_freq": [2918.3989999999994, 0.0, 0.0],
"virtual_memory": {
"total": 25329766400,
"available": 20256694272,
"used": 4660776960,
"free": 19216257024,
"percent": 20.0
},
"swap_memory": {
"total": 7516192768,
"used": 0,
"free": 7516192768,
"percent": 0.0
}
}
Combined example
import psutil
import subprocess
import csv
def get_cuda_version():
try:
# Run nvidia-smi -q command
nvidia_output = subprocess.run(
["nvidia-smi", "-q"],
capture_output=True,
text=True
).stdout
# Filter for lines containing "CUDA Version"
cuda_lines = [line.strip() for line in nvidia_output.split('\n')
if "CUDA Version" in line]
if cuda_lines:
# Extract version from the first matching line
# Typically in format " CUDA Version : 12.6"
return cuda_lines[0].split(':')[1].strip()
else:
raise ValueError("CUDA version not found in nvidia-smi output")
except Exception as e:
return f"An error occurred: {str(e)}"
def get_gpu_stats(stats_to_query: list = []):
"""
Run nvidia-smi with JSON output format and return the parsed data
All options: `nvidia-smi --help-query-gpu`
"""
# Run nvidia-smi with csv output format
result = subprocess.run(
["nvidia-smi",
f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
capture_output=True,
text=True,
check=True
)
cuda_version = get_cuda_version()
# Process the CSV output
csv_data = result.stdout.strip().splitlines()
reader = csv.DictReader(csv_data)
stats = []
for row in reader:
data = {key.strip(): value.strip()
for key, value in row.items()}
data["cuda_version"] = cuda_version
for key, value in data.items():
# Convert numeric values to appropriate types
if value.isdigit():
data[key] = int(value)
else:
try:
data[key] = float(value)
except ValueError:
pass
stats.append(data)
return stats
def get_system_stats():
cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
cpu_freq = psutil.cpu_freq()
# Get memory stats
virtual_mem = psutil.virtual_memory()
swap_mem = psutil.swap_memory()
return {
"cpu_percent": cpu_percent,
"cpu_freq": cpu_freq,
"virtual_memory": {
"total": virtual_mem.total,
"available": virtual_mem.available,
"used": virtual_mem.used,
"free": virtual_mem.free,
"percent": virtual_mem.percent
},
"swap_memory": {
"total": swap_mem.total,
"used": swap_mem.used,
"free": swap_mem.free,
"percent": swap_mem.percent
}
}
def get_all_stats():
gpu_stats = get_gpu_stats([
"timestamp",
"index",
"gpu_name",
"driver_version",
"memory.total",
"memory.free",
"memory.used",
"memory.reserved",
"compute_cap",
"utilization.gpu",
"utilization.memory",
"temperature.gpu",
"temperature.memory",
"power.draw",
"power.limit"
])
system_stats = get_system_stats()
system_stats["gpu"] = gpu_stats
return system_stats
if __name__ == "__main__":
stats = get_all_stats()
print(stats)
You will get an output like this:
{
"cpu_percent": [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0],
"cpu_freq": [2918.3989999999994, 0.0, 0.0],
"virtual_memory": {
"total": 25329766400,
"available": 20256694272,
"used": 4660776960,
"free": 19216257024,
"percent": 20.0
},
"swap_memory": {
"total": 7516192768,
"used": 0,
"free": 7516192768,
"percent": 0.0
},
"gpu": [
{
"timestamp": "2025/03/31 15:21:49.377",
"index": 0,
"name": "NVIDIA GeForce RTX 3080 Ti Laptop GPU",
"driver_version": 561.19,
"memory.total [MiB]": 16384,
"memory.free [MiB]": 14804,
"memory.used [MiB]": 1372,
"memory.reserved [MiB]": 209,
"compute_cap": 8.6,
"utilization.gpu [%]": 4,
"utilization.memory [%]": 12,
"temperature.gpu": 49,
"temperature.memory": "N/A",
"power.draw [W]": 21.09,
"power.limit [W]": "[N/A]",
"cuda_version": 12.6
}
]
}
Now that we know how to get the stats, we can use this information to reallocate under-performing nodes. This can be
done by checking the stats and if they exceed certain thresholds, we can reallocate the node using the
IMDS Reallocation Endpoint. You must provide a reason to the reallocation endpoint. We use
this data to continuously improve the quality of our network.
Use reallocation for requirements that matter to your workload, and set conservative thresholds. When you reallocate
away from a node, SaladCloud temporarily excludes that node from your allocation pool for 48 hours. Aggressive
reallocation logic can cycle through available nodes and leave replicas unable to allocate until capacity changes or
exclusions expire. For network-specific startup checks, see
Checking Network Bandwidth Before Startup.
Here’s an example of how to do this:
import requests
import os
max_gpu_temp = int(os.getenv("MAX_GPU_TEMP", "80"))
def reallocate_me(reason: str):
url = 'http://169.254.169.254/v1/reallocate'
headers = {
'Content-Type': 'application/json',
"Metadata": "true"
}
data = {
"reason": reason
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
print("Reallocation successfully requested.")
else:
print(f"Reallocation failed: {response.status_code} - {response.text}")
# Example usage
stats = get_all_stats()
if stats["gpu"][0]["temperature.gpu"] > max_gpu_temp:
reallocate_me(f"GPU temperature exceeded threshold: {stats["gpu"][0]["temperature.gpu"]} / {max_gpu_temp}C")
Putting it All Together
You can put all of this together in a single script that runs continuously and checks the stats at regular intervals,
taking any required actions. We recommend taking all configuration values from environment variables, so that you can
adjust these values easily without rebuilding your container.
monitor.py
import subprocess
import csv
import psutil
import requests
import os
import time
max_gpu_temp = int(os.getenv("MAX_GPU_TEMP", "80"))
max_vram_usage = float(os.getenv("MAX_VRAM_USAGE", "95"))
check_interval = int(os.getenv("CHECK_INTERVAL", "60")) # in seconds
def get_cuda_version():
try:
# Run nvidia-smi -q command
nvidia_output = subprocess.run(
["nvidia-smi", "-q"],
capture_output=True,
text=True
).stdout
# Filter for lines containing "CUDA Version"
cuda_lines = [line.strip() for line in nvidia_output.split('\n')
if "CUDA Version" in line]
if cuda_lines:
# Extract version from the first matching line
# Typically in format " CUDA Version : 12.6"
return cuda_lines[0].split(':')[1].strip()
else:
raise ValueError("CUDA version not found in nvidia-smi output")
except Exception as e:
return f"An error occurred: {str(e)}"
def get_gpu_stats(stats_to_query: list = []):
"""
Run nvidia-smi with JSON output format and return the parsed data
All options: `nvidia-smi --help-query-gpu`
"""
# Run nvidia-smi with csv output format
result = subprocess.run(
["nvidia-smi",
f"--query-gpu={','.join(stats_to_query)}", "--format=csv,nounits"],
capture_output=True,
text=True,
check=True
)
cuda_version = get_cuda_version()
# Process the CSV output
csv_data = result.stdout.strip().splitlines()
reader = csv.DictReader(csv_data)
stats = []
for row in reader:
data = {key.strip(): value.strip()
for key, value in row.items()}
data["cuda_version"] = cuda_version
for key, value in data.items():
# Convert numeric values to appropriate types
if value.isdigit():
data[key] = int(value)
else:
try:
data[key] = float(value)
except ValueError:
pass
stats.append(data)
return stats
def get_system_stats():
cpu_percent = psutil.cpu_percent(interval=1, percpu=True)
cpu_freq = psutil.cpu_freq()
# Get memory stats
virtual_mem = psutil.virtual_memory()
swap_mem = psutil.swap_memory()
return {
"cpu_percent": cpu_percent,
"cpu_freq": cpu_freq,
"virtual_memory": {
"total": virtual_mem.total,
"available": virtual_mem.available,
"used": virtual_mem.used,
"free": virtual_mem.free,
"percent": virtual_mem.percent
},
"swap_memory": {
"total": swap_mem.total,
"used": swap_mem.used,
"free": swap_mem.free,
"percent": swap_mem.percent
}
}
def get_all_stats():
gpu_stats = get_gpu_stats([
"timestamp",
"index",
"gpu_name",
"driver_version",
"memory.total",
"memory.free",
"memory.used",
"memory.reserved",
"compute_cap",
"utilization.gpu",
"utilization.memory",
"temperature.gpu",
"temperature.memory",
"power.draw",
"power.limit"
])
system_stats = get_system_stats()
system_stats["gpu"] = gpu_stats
return system_stats
def reallocate_me(reason: str):
url = 'http://169.254.169.254/v1/reallocate'
headers = {
'Content-Type': 'application/json',
"Metadata": "true"
}
data = {
"reason": reason
}
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
print("Reallocation successfully requested.")
else:
print(f"Reallocation failed: {response.status_code} - {response.text}")
if __name__ == "__main__":
while True:
stats = get_all_stats()
gpu_temp = stats["gpu"][0]["temperature.gpu"]
vram_usage = stats["gpu"][0]["utilization.memory [%]"]
if gpu_temp > max_gpu_temp:
reallocate_me(f"GPU temperature exceeded threshold: {gpu_temp} / {max_gpu_temp}C")
break
elif vram_usage > max_vram_usage:
reallocate_me(f"VRAM usage exceeded threshold: {vram_usage} / {max_vram_usage}%")
break
time.sleep(check_interval)
You can then run this script in the background and it will continuously check the stats and take action if any of the
thresholds are exceeded. A simple way to do this is to use & in your Dockerfile CMD to run the script in the
background
CMD ["bash", "-c", "python monitor.py & python my_app.py"]
Conclusion
In this tutorial, we learned how to implement performance monitoring in your application using nvidia-smi and the
psutil library. We also learned how to reallocate under-performing nodes using the IMDS Reallocation Endpoint. By
monitoring the performance of your application, you can ensure that it runs smoothly and efficiently, even in the face
of changing conditions. This is an important step in building a robust and reliable application.