Getting the GPU usage of NVIDIA cards with the Linux dstat tool

The dstat is an awesome little tool which allows you to get resource statistics for your Linux box. It has a modular architecture which allows you to develop additional plugins and itโ€™s easy to use. Recently I was profiling a Deep Learning pipeline developed with Keras and Tensorflow and I needed detailed statistics about the CPU, Hard Disk and GPU usage. The first two are available out-of-the-box by dstat, nevertheless as far as I know there is no plugin for monitoring GPU usage for NVIDIA graphics cards.

Thankfully it is super easy to write a python plugin for dstat. I have already sent a pull-request on the official repo but since new versions are released relatively rarely here are some instructions on how to set up the dstat NVIDIA GPU usage plugin on your box.

Installation

The following commands are tested on Ubuntu 16.04 and they will help you install dstat, the Python NVIDIA Management Library and my dstat nvidia plugin:

sudo apt-get install dstat #install dstat
sudo pip install nvidia-ml-py #install Python NVIDIA Management Library
wget https://raw.githubusercontent.com/datumbox/dstat/master/plugins/dstat_nvidia_gpu.py
sudo mv dstat_nvidia_gpu.py /usr/share/dstat/ #move file to the plugins directory of dstat

To get all the default statistics along with GPU usage (percentage) type the following command:

dstat -a --nvidia-gpu

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system-- gpu-u
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw |total
  2   1  96   0   0   0|5816k   15M|   0     0 |   0     0 |  45k   98k|   68
  0   1  98   0   0   0|  57M  128k| 104B  902B|   0     0 |  42k   85k|   50
  8   7  84   1   0   0| 152M    0 | 292B  448B|   0     0 |  52k   93k|   39
  1   1  97   1   0   0| 111M    0 |  52B  374B|   0     0 |  51k  116k|   62
  0   1  98   1   0   0| 129M    0 |  80B  416B|   0     0 |  43k   85k|   92
  0   2  98   0   0   0|   0     0 |  52B  374B|   0     0 |  41k   83k|   81

To get all the usage statistics for each GPU use the following command:

dstat --nvidia-gpu -f
-------------------------------------------gpu-usage-nvidia------------------------------------------
total  gpu0  gpu1  gpu2  gpu3  gpu4  gpu5  gpu6  gpu7  gpu8  gpu9 gpu10 gpu11 gpu12 gpu13 gpu14 gpu15
   19    23    22    21    21    20    22    23    25    15    18    16    16    16    18    16    14
   18    21    20    18    22    21    21    22    21    15    15    14    14    14    15    16    13
   10    14     9    13     8     9    11     9    12     9     9    10    10     8     7     9     9
   18    20    22    19    21    20    21    21    22    14    15    14    15    14    15    15    15
   20    24    22    23    24    25    22    22    22    16    16    16    16    16    16    18    16
   15    21    18    19    18    17    17    16    18    14    13    13    14    13    12    11    11
   20    24    22    22    24    25    23    24    22    16    18    16    14    17    17    17    15
   19    29    18    23    21    22    21    20    21    18    16    16    18    14    14    17    17

How it works

The plugin fetches the number of available GPUs on the system and samples 10 times the usage metric for each GPU. Sampling multiple times will hopefully return smoother metrics than getting a single measurement. After that it averages the usage across all GPUs and returns the results to the user. The source code of the plugin is available here.

Hope you enjoy it, happy GPU programming! ๐Ÿ™‚

About 

My name is Vasilis Vryniotis. I'm a Machine Learning Engineer and a Data Scientist. Learn more

Latest Comments
  1. Indu

    Module dstat_nvidia_gpu failed to load. (The “pynvml” library is missing from this system.)

  2. Ali. D

    Works perfectly, thanks.

  3. Dimitris Gkyrtis

    I get a bucnh of errors and at last “pynvml.NVMLError_NotSupported: Not Supported”
    So, as I imagine, my GeForce gtx460 is not supported, right?

    Thanks

  4. Chandan

    Are these expected to run on Ubuntu 18.04 as well? I get “pynvml” library is missing error, but pynvml is installed. Am I missing something?
    # dstat -a –nvidia-gpu
    Module dstat_nvidia_gpu failed to load. (The “pynvml” library is missing from this system.)
    –total-cpu-usage– -dsk/total- -net/total- —paging– —system–
    usr sys idl wai stl| read writ| recv send| in out | int csw
    0 0 100 0 0|2375k 368k| 0 0 | 0 0 |2298 4100

    $ pip list | grep -i pynvml
    pynvml 8.0.3


Leave a Reply

Your email address will not be published. Required fields are marked *

Captcha * Time limit is exhausted. Please reload the CAPTCHA.