{"id":4575,"date":"2024-08-14T09:08:51","date_gmt":"2024-08-14T09:08:51","guid":{"rendered":"https:\/\/truelogic.org\/wordpress\/?p=4575"},"modified":"2024-08-14T09:08:51","modified_gmt":"2024-08-14T09:08:51","slug":"installing-llama-cpp-on-ubuntu-with-an-nvidia-gpu","status":"publish","type":"post","link":"https:\/\/truelogic.org\/wordpress\/2024\/08\/14\/installing-llama-cpp-on-ubuntu-with-an-nvidia-gpu\/","title":{"rendered":"Installing Llama.cpp on Ubuntu with an NVIDIA GPU"},"content":{"rendered":"\n<p>1.Run <em>sudo apt update<\/em> to make sure all packages are updated to the latest versions<\/p>\n\n\n\n<p>2.Run <em>sudo apt install build-essential<\/em> to install the toolchain for building applications using C++<\/p>\n\n\n\n<p>3.Create a directory to setup llama.cpp:<br>   <em> mkdir \/var\/projects<\/em><br><em>    cd \/var\/projects<\/em><\/p>\n\n\n\n<p><br>4.Get the llama.cpp code from Github:<br>     <em>git clone https:\/\/github.com\/ggerganov\/llama.cpp<\/em><br>    <em> cd llama.cpp<\/em><\/p>\n\n\n\n<p><br>5.Verify that nvidia drivers are present in the system by typing the command:<br>  <em>  sudo ubuntu-drivers list<\/em>  OR <em>sudo ubuntu-drivers list &#8211;gpgpu<br><\/em>    You should get a listing similar to the output below:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: bash; title: ; notranslate\" title=\"\">\nnvidia-driver-390, (kernel modules provided by linux-modules-nvidia-390-generic)\nnvidia-driver-535, (kernel modules provided by linux-modules-nvidia-535-generic)\nnvidia-driver-470, (kernel modules provided by linux-modules-nvidia-470-generic)\nnvidia-driver-418-server, (kernel modules provided by nvidia-dkms-418-server)\nnvidia-driver-450-server, (kernel modules provided by linux-modules-nvidia-450-server-generic)\nnvidia-driver-535-server, (kernel modules provided by linux-modules-nvidia-535-server-generic)\nnvidia-driver-470-server, (kernel modules provided by linux-modules-nvidia-470-server-generic)\nnvidia-driver-545, (kernel modules provided by nvidia-dkms-545)\n<\/pre><\/div>\n\n\n<p>6.Update the nvidia drivers in the current Ubuntu installation:<br>    <em>sudo ubuntu-drivers install<\/em><\/p>\n\n\n\n<p>7.Reboot the system for the drivers to take effect<br>    <em>sudo shutdown -r now<\/em><\/p>\n\n\n\n<p>8.After rebooting, verify that nvidia is active by the command:<br>     <em>nvidia-smi<\/em><br>     You should get a similar output as below:<br><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"812\" height=\"322\" src=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/nvidia-smi.png\" alt=\"\" class=\"wp-image-4577\" srcset=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/nvidia-smi.png 812w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/nvidia-smi-620x246.png 620w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/nvidia-smi-300x119.png 300w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/nvidia-smi-768x305.png 768w\" sizes=\"auto, (max-width: 812px) 100vw, 812px\" \/><\/figure>\n\n\n\n<p>9.Install gpustat utility:<br>     <em>sudo apt install gpustat<\/em><\/p>\n\n\n\n<p>10.Run gpustat. You should get a similar output as below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"510\" height=\"38\" src=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/gpustat.png\" alt=\"\" class=\"wp-image-4580\" srcset=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/gpustat.png 510w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/gpustat-300x22.png 300w\" sizes=\"auto, (max-width: 510px) 100vw, 510px\" \/><\/figure>\n\n\n\n<p><br>11.Install the NVCC compiler with the command:<br>    <em>sudo apt install nvidia-cuda-toolkit<\/em><\/p>\n\n\n\n<p>12.Before we can build llama.cpp we need to know the Compute Capability of the GPU:<br><em>     nvidia-smi &#8211;query-gpu=compute_cap &#8211;format=csv<\/em><br>     This will give a single score eg 3.0, 5.2 etc.<\/p>\n\n\n\n<p>13.Set the Compute Capability score in the shell by typing:<br>      <em>export CUDA_DOCKER_ARCH=compute_XX<\/em> where XX will be the score (without the decimal point)      eg. <em>export CUDA_DOCKER_ARCH=compute_35<\/em> if the score is 3.5<\/p>\n\n\n\n<p>14.Next step is to build llama.cpp:<br>     <em>  cd \/var\/projects\/llama.cpp<br>       make GGML_CUDA=1<\/em><\/p>\n\n\n\n<p>15.This completes the building of llama.cpp. Next we will run a quick test to see if its working<br>16.We download a small gguf into the models folder in llama.cpp:<br>      <em> cd models<\/em><br>      <em> wget https:\/\/huggingface.co\/afrideva\/Tiny-Vicuna-1B-GGUF\/resolve\/main\/tiny-vicuna-1b.q5_k_m.gguf<\/em><\/p>\n\n\n\n<p>17.We run a test query from the llama.cpp root folder<br><em>      .\/llama-cli -m models\/tiny-vicuna-1b.q5_k_m.gguf -p &#8220;I believe the meaning of life is&#8221; -n 128 &#8211;n-gpu-layers 6<\/em><\/p>\n\n\n\n<p>       You should get an output similar to the output below:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"909\" height=\"276\" src=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/output.png\" alt=\"\" class=\"wp-image-4582\" srcset=\"https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/output.png 909w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/output-620x188.png 620w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/output-300x91.png 300w, https:\/\/truelogic.org\/wordpress\/wp-content\/uploads\/2024\/08\/output-768x233.png 768w\" sizes=\"auto, (max-width: 909px) 100vw, 909px\" \/><\/figure>\n","protected":false},"excerpt":{"rendered":"<div class=\"mh-excerpt\"><p>1.Run sudo apt update to make sure all packages are updated to the latest versions 2.Run sudo apt install build-essential to install the toolchain for <a class=\"mh-excerpt-more\" href=\"https:\/\/truelogic.org\/wordpress\/2024\/08\/14\/installing-llama-cpp-on-ubuntu-with-an-nvidia-gpu\/\" title=\"Installing Llama.cpp on Ubuntu with an NVIDIA GPU\">[&#8230;]<\/a><\/p>\n<\/div>","protected":false},"author":1,"featured_media":4584,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[368],"tags":[],"class_list":["post-4575","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-gpu-and-ai"],"_links":{"self":[{"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/posts\/4575","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/comments?post=4575"}],"version-history":[{"count":5,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/posts\/4575\/revisions"}],"predecessor-version":[{"id":4583,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/posts\/4575\/revisions\/4583"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/media\/4584"}],"wp:attachment":[{"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/media?parent=4575"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/categories?post=4575"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/truelogic.org\/wordpress\/wp-json\/wp\/v2\/tags?post=4575"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}