Nvidia Dockerを用いてDockerコンテナからGPUを利用する

ディープラーニング系の処理でもDockerを使いたい！ということで、Nvidia Dockerを利用してDockerコンテナでGPUを使ってみました。マシンはAWSのp2.xlargeインスタンス（NVIDIA K80を積んでいます）を、OSはUbuntu 18.04を利用しました。

Dockerのインストール・設定

まずはDockerをインストールします。

インストール

バージョンは1.12以降である必要があります。

Dockerの公式ドキュメントを参考にしています。

$ sudo apt-get update
$ sudo apt-get install \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
$ sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io

確認

実際に入ったか確かめます。Hello from Docker! みたいなのが表示されれば成功です。

$ sudo docker run hello-world

権限周りの設定

rootでdockerコマンドを毎回実行するのもあれなので、設定します。 dockerグループに現在のユーザーを追加します。

$ sudo usermod -aG docker $USER

再ログインすると、sudoなしでdockerコマンドを実行できるようになります。

Nvidia CUDA Toolkitのインストール

Nvidia Dockerをインストールするために必要なCUDA Toolkitをインストールします。 Nvidiaの公式ドキュメントが参考になります。

準備

gccやmakeなどもろもろが必要になるので、インストールします。

$ sudo apt-get install build-essential

インストール

インストーラをダウンロードします。

$ wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.105_418.39_linux.run

こちらを参考にハッシュ値を確認します。

$ md5sum cuda_10.1.105_418.39_linux.run

インストーラを実行します。

$ sudo sh cuda_10.1.105_418.39_linux.run

途中で、規約への同意が求められます。また、その後何をインストールしたいのか聞かれますが全部選択しても大丈夫だと思います。

確認

インストールが終了したら、以下のコマンドを実行してみます。現在のGPUの状態などが表示されたら成功です。

$ nvidia-smi

Nvidia Dockerのインストール

以下のようなコマンドでNvidia Dockerをインストールします。これで終わりです。

$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
  sudo apt-key add -
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
$ sudo apt-get update
$ sudo apt-get install -y nvidia-docker2
$ sudo pkill -SIGHUP dockerd

実際に使ってみる

TensorFlowの公式ドキュメントにあった、GPUを用いた計算の例をを実行してみます。2次元のテンソルの要素の和を求めています。

$ docker run --runtime=nvidia -it --rm tensorflow/tensorflow:latest-gpu \
   python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
2019-02-28 00:15:38.920366: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-28 00:15:41.032922: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-28 00:15:41.033785: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3f815f0 executing computations on platform CUDA. Devices:
2019-02-28 00:15:41.033815: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): Tesla K80, Compute Capability 3.7
2019-02-28 00:15:41.055212: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300025000 Hz
2019-02-28 00:15:41.055484: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x3fe90b0 executing computations on platform Host. Devices:
2019-02-28 00:15:41.055521: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-02-28 00:15:41.056190: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:00:1e.0
totalMemory: 11.17GiB freeMemory: 11.11GiB
2019-02-28 00:15:41.056223: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2019-02-28 00:15:41.057198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-28 00:15:41.057230: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0
2019-02-28 00:15:41.057255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N
2019-02-28 00:15:41.057853: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10805 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:00:1e.0, compute capability: 3.7)
tf.Tensor(54.97377, shape=(), dtype=float32)

うまくいったみたいです。