First encounter with GPU for ML

Lijo Jose
3 min readJun 22, 2019

I got a new laptop. What next? Check what is there inside.

  • 9th Generation Intel® Core™ i5 processor
  • NVIDIA® GeForce® GTX 1650 Graphics (4 GB GDDR5 dedicated). GTX 1650 has 896 cuda cores.

Now, I want to test the GPU for ML capabilities. On ubuntu 18.04, installed python3, tensorflow, tensorflow-gpu. How will I evaluate the machine? came for my help.

With the code placed in HDD and ran. I got ~400 examples/sec. Looks like GPU is not used as per the numbers from the above post. Moved the code to SSD and the performance doubled (~800 examples/sec).

A few search online gave me some ideas.

lspci | grep -i nvidia

Next, lets get the nvidia version. cat /prov/drivers/nvidia/version. To my surprise, the file doesn’t exist. What to do?

Solution: install nvidia CUDA toolkit. As per this, installed CUDA 10.1. Post this, reboot the machine and make sure that, right boot configuration is selected.

As of 2019–06–22, GTX 1650 is not listed in cuda GPUs. Will my effort go waste after all these? Lets give a try.

Next, ran the basic testing using python:

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

Library mismatch (Could not dlopen library ‘’; dlerror: cannot open shared object file: No such file or directory). Tensorflow(1.14.0) downloaded using pip3 expects CUDA 10.0.

Gave a try to build Tensorflow from source. Spend 5 hours with negative outcome.

What’s the next option? Downgrade CUDA 10.0.

As per the ‘CUDA 10.1 + Tensorflow 1.13’ discussions, downgraded CUDA to 10.0

sudo apt uninstall cuda
sudo apt autoremove
sudo apt update
sudo apt install cuda-10–0

Next? missing cuDNN. Installed cuDNN as per nvidia DEEP LEARNING SDK DOCUMENTATION. Point to catch here was to get cuDNN 7.6.0 for CUDA 10.0. You have to register nvidia developer program to download cuDNN.

After all these, I got the cuDNN sample app working.

Coming back to the tensorflow model example: I tried again. There comes one another problem: Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR. Any further problem? I had to set config.gpu_options.allow_growth = True to solve this.


Finally, got the example working. and I got a performance improvement of 10x compared to initial run.

Performance Improvement


  • Use python3
  • tensorflow 1.14.0 (don’t try build from source as a first attempt)
  • use nvidia CUDA 10.0 as tensorflow 1.14.0 release build supports this version only. (after all, if this is not working, first install CUDA 10.1 along with nvidia drivers and then downgrade to CUDA 10.0 as I am using nvidia drivers installed along with CUDA 10.1 (nvidia-drivers-418… )
  • use nvidia cuDNN 7.6.0 for CUDA 10.0

Final note

On NVIDIA® GeForce® GTX 1650 Graphics (4 GB GDDR5 dedicated), is 10x performance as per expectation or not? Please comment your views.

Diving into the ML world now….