Running GPU Infrastructure on Kubernetes: What Enterprise Platform Teams Must Get Right
Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now

Whether you’re training deep learning models, running simulations, or just curious about your GPU’s performance, nvidia-smi is your go-to command-line tool. Short for NVIDIA System Management Interface, this utility provides essential real-time information about your NVIDIA GPU’s health, workload, and performance.
In this blog, we’ll explore what nvidia-smi is, how to use it, and walk through a real output from a system using an NVIDIA T1000 8GB GPU.
nvidia-smi?nvidia-smi is a CLI utility bundled with the NVIDIA driver. It enables:
You can execute it using:
nvidia-smi
Let’s use real life output from a system using the T1000 8GB Nvidia GPU to review each section in detail.


In this case, the GPU is mostly idle, used lightly by background processes.

G: Graphics processC+G: Uses both compute and graphicsseed-version-...: Likely a custom or sandboxed job with a version tagTo investigate it further:
ps -fp <span class="m">4535</span>
ls -l /proc/4535/exewatch -n <span class="m">1</span> nvidia-smiOr reset the GPU:
sudo nvidia-smi --gpu-reset -i <span class="m">0</span>nvidia-smi --query-gpu<span class="o">=</span>utilization.gpu,memory.used --format<span class="o">=</span>csvUseful for logging and dashboards.
sudo nvidia-smi -pm <span class="m">1</span>sudo nvidia-smi -c EXCLUSIVE_PROCESSnvidia-smi -q -d CLOCKWith tools like nvidia-smi, you gain critical visibility into GPU usage and health. It’s an essential part of any ML or HPC workflow. We have developed the integrated GPU Dashboards in the Rafay Platform to provide the same information in a graphical manner. In additionl, users do not require any form of privileged, root access to visualize this critical data.

Scaling GPUs on Kubernetes is a governance problem, where utilization, cost control, and access define success.
Read Now

GOpen-source momentum, driven in part by NVIDIA, is pushing GPUs into Kubernetes as native resources, with advances in allocation, scheduling, and isolation.
Read Now
.png)
A deep dive into OpenClaw as a gateway-centric AI runtime and how platform teams can deploy, secure, and scale it as a governed service on Kubernetes.
Read Now