Building a "Lawyer GPT" for Your Blog - Part 2: GPU Setup & CUDA in C# (English)

Introduction

Welcome to Part 2! In Part 1, we laid out the vision for building a writing assistant that uses your blog as a knowledge base - like how lawyers use LLMs trained on case law to draft documents. Now it's time to get our hands dirty with the foundation: making sure your GPU is ready for AI workloads.

NOTE: This is part of my experiments with AI (assisted drafting) + my own editing. Same voice, same pragmatism; just faster fingers.

About my hardware: I'm using an NVIDIA RTX A4000 (16GB VRAM), AMD Ryzen 9 9950X, and 96GB DDR5 RAM. But you don't need this! As covered in Part 1, you can use any NVIDIA GPU with 8GB+ VRAM, or even run CPU-only (slower but functional). This part focuses on GPU setup, but I'll note CPU-only alternatives where relevant.

This part might seem basic if you're already familiar with CUDA, but trust me - I've wasted countless hours debugging mysterious errors that traced back to version mismatches, missing environment variables, or incorrect cuDNN installations. We'll do this right from the start.

Why GPU Acceleration Matters

Before we dive into installation, let's understand why we need GPU acceleration at all.

CPU vs GPU for AI Workloads

graph LR
    A[AI Workload] --> B{Type?}
    B -->|Matrix Operations| C[GPU: 100x+ faster]
    B -->|Sequential Logic| D[CPU: Better]

    C --> E[Embedding Generation]
    C --> F[LLM Inference]
    C --> G[Vector Search]

    D --> H[Application Logic]
    D --> I[File I/O]

    class C gpu
    class E,F,G aiTasks

    classDef gpu stroke:#333,stroke-width:4px
    classDef aiTasks stroke:#333

Why GPUs dominate for AI:

Parallelism - GPUs have thousands of cores vs CPUs' dozens
Matrix Operations - AI is mostly matrix math, which GPUs excel at
Memory Bandwidth - GPUs move data much faster
Specialized Hardware - Tensor cores accelerate AI-specific operations

Real-world example (on my hardware): Generating embeddings for a blog post:

CPU (Ryzen 9 9950X): ~5 seconds per post
GPU (A4000 16GB): ~0.3 seconds per post

That's 16x faster, and it compounds when processing hundreds of posts!

Performance on other GPUs (approximate):

RTX 4090 (24GB): Fastest, ~0.2s per post
RTX 3060 (12GB): ~0.5s per post
GTX 1070 Ti (8GB): ~0.8s per post
Still much faster than CPU!

Understanding Your Hardware

Before installing anything, let's understand what we're working with.

My GPU: NVIDIA RTX A4000

This is my specific GPU - a professional workstation card with:

16GB GDDR6 VRAM - Enough for 7B-13B parameter models
6144 CUDA cores - Parallel processing power
256 Tensor cores - Accelerated AI operations
CUDA Compute Capability 8.6 - Important for compatibility

Common GPU Options

Here's what various GPUs can handle:

GPU	VRAM	Max Model Size	Good For
RTX 4090	24GB	13B-30B	Overkill for this project
RTX 4070 Ti	12GB	7B-13B	Excellent choice
RTX 3060	12GB	7B	Budget-friendly
RTX 4060 Ti	16GB	7B-13B	Great value
A4000 (mine)	16GB	7B-13B	Workstation GPU
GTX 1070 Ti	8GB	7B (tight)	Minimum viable

What About Intel/AMD NPUs?

You might have heard about Intel Core Ultra or AMD Ryzen AI chips with built-in NPUs (Neural Processing Units). Can you use those instead of NVIDIA?

Short answer: Still not recommended for this use case (as of late 2025), but getting closer!

Current state (late 2025):

DirectML has matured: ONNX Runtime DirectML execution provider is now production-ready
NPU support improving: Intel's Core Ultra (Meteor Lake/Lunar Lake) and AMD's Ryzen AI can run inference via DirectML
Model format limitations: NPUs work best with ONNX models; GGUF models (used for LLMs) still target CUDA/CPU primarily
LLM performance: For large language models, NPUs are currently slower than discrete GPUs and often slower than modern CPUs
Best for: Small models, on-device inference, battery-powered scenarios

Why we still use NVIDIA CUDA:

Ecosystem maturity: CUDA has 15+ years of optimization for LLMs
LLamaSharp: Uses llama.cpp which targets CUDA/CPU, not DirectML/NPU
Performance: A mid-range NVIDIA GPU still outperforms NPUs significantly for our workload
Community support: Vastly more resources and troubleshooting available

If you have an NPU-equipped CPU:

For embedding generation (Part 3), you can use DirectML with ONNX Runtime - might work reasonably well
For LLM inference (Part 6), stick with CPU mode or get a discrete GPU
Watch for llama.cpp DirectML backend (experimental as of Nov 2025)

Future outlook: NPUs are improving rapidly. By early 2026, they may become viable for LLM inference, especially for smaller models (1B-3B parameters). Intel's upcoming Arrow Lake and AMD's next-gen XDNA 2 promise significant AI performance improvements.

This series focuses on NVIDIA CUDA because it delivers the best performance today, but the architecture concepts will translate to NPUs as the ecosystem matures!

Check Your GPU

First, verify Windows sees your GPU:

# PowerShell
wmic path win32_VideoController get name

You should see something like:

Name
NVIDIA RTX A4000

Or use NVIDIA Control Panel → System Information → Display tab.

Architecture Overview

Here's how the software stack works:

graph TB
    A[Your C# Application] --> B[ONNX Runtime / LLamaSharp]
    B --> C[CUDA Toolkit]
    C --> D[cuDNN Libraries]
    D --> E[NVIDIA Driver]
    E --> F[GPU Hardware]

    G[TensorRT] -.Optional.-> D

    class A,F endpoints
    class C,D,E install

    classDef endpoints stroke:#333,stroke-width:4px
    classDef install stroke:#333,stroke-width:2px

    subgraph "What We'll Install"
    C
    D
    E
    end

Layer breakdown:

NVIDIA Driver - Low-level GPU communication
CUDA Toolkit - APIs for GPU programming
cuDNN - Deep Neural Network library (optimized kernels)
ONNX Runtime / LLamaSharp - High-level ML frameworks
Your Application - Our C# code

Installation Roadmap

Here's the order we'll install things (order matters!):

graph LR
    A[1. NVIDIA Driver] --> B[2. CUDA Toolkit]
    B --> C[3. cuDNN]
    C --> D[4. Verify Install]
    D --> E[5. Test from C#]

    class A,B,C,D,E steps

    classDef steps stroke:#333,stroke-width:2px

Step 1: NVIDIA Driver

Check Current Driver

Open PowerShell:

nvidia-smi

You should see output like:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 546.33       Driver Version: 546.33       CUDA Version: 12.3    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A4000   WDDM  | 00000000:01:00.0 Off |                  Off |
| 41%   32C    P8    10W / 140W |    345MiB / 16376MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

Key info:

Driver Version: Should be 545.xx or newer
CUDA Version: This is the MAX CUDA version supported, not what's installed
Memory-Usage: Currently used / total VRAM

Update Driver if Needed

If nvidia-smi doesn't work or your driver is old:

Go to NVIDIA Driver Downloads
Select:
- Product Type: RTX/Quadro
- Product Series: RTX Axxxx Series
- Product: RTX A4000
- Operating System: Windows 11 (or your version)
- Download Type: Studio Driver (more stable than Game Ready)
Install and reboot
Verify again with nvidia-smi

Step 2: CUDA Toolkit

CUDA provides the programming interface for GPU acceleration.

Version Selection

Critical: We need CUDA 12.x for modern models. Specifically:

CUDA 12.1+ - Good balance of compatibility and features (at the time of writing)
Latest 12.x - Check compatibility with your libraries
NOT CUDA 11.x - Missing features newer models need

Download & Install

Go to CUDA Toolkit Archive
Select CUDA Toolkit 12.1 (or 12.3 if you're adventurous)
Choose: Windows → x86_64 → 11 → exe (local)
Download (~3GB)

Installation Options

Run the installer. When prompted:

Installation Type: Custom (Advanced)

Select Components:
✅ CUDA Toolkit
✅ CUDA Documentation
✅ CUDA Samples
✅ CUDA Visual Studio Integration
❌ GeForce Experience (not needed)
❌ NVIDIA Driver (already installed)

Why Custom?: We don't want to downgrade our driver or install gaming software.

Installation Path

Default path is fine:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1

But note this - we'll need it for environment variables!

Set Environment Variables

The installer usually sets these, but verify:

Check in PowerShell:

$env:CUDA_PATH
# Should show: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1

$env:PATH -split ';' | Select-String CUDA
# Should show CUDA bin and libnvvp paths

If missing, add manually:

Open Environment Variables:
- Right-click This PC → Properties → Advanced System Settings → Environment Variables
Under System Variables, verify/add:
- CUDA_PATH = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
- CUDA_PATH_V12_1 = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1

In Path variable, ensure these exist:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp

Restart your terminal for changes to take effect

Verify CUDA Installation

# Check CUDA compiler
nvcc --version

# Should output:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2023 NVIDIA Corporation
# Built on Tue_Feb__7_19:32:13_Pacific_Standard_Time_2023
# Cuda compilation tools, release 12.1, V12.1.66

# Check path resolves
where.exe nvcc
# Should show: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc.exe

Step 3: cuDNN (CUDA Deep Neural Network library)

cuDNN provides optimized implementations of deep learning operations.

Version Compatibility

Critical: cuDNN version must match CUDA version!

For CUDA 12.x, we need cuDNN 8.9+ for CUDA 12.x (at the time of writing, 8.9.7 or later)

Download cuDNN

Go to cuDNN Archive
You'll need an NVIDIA Developer account (free)
Find: Download cuDNN v8.9.7 (December 5th, 2023), for CUDA 12.x
Select: Local Installer for Windows (Zip)
Download (~800MB)

Extract and Install cuDNN

cuDNN is just a set of files you copy into your CUDA installation.

Extract the ZIP, you'll see:

cudnn-windows-x86_64-8.9.7.29_cuda12-archive\
    bin\
        cudnn64_8.dll
        cudnn_adv_infer64_8.dll
        cudnn_adv_train64_8.dll
        ... (more DLLs)
    include\
        cudnn.h
        ... (header files)
    lib\
        x64\
            cudnn.lib
            ... (lib files)

Copy files to CUDA installation:

# Assuming you extracted to Downloads and CUDA is in default location
# Run PowerShell as Administrator

$cudnnPath = "$env:USERPROFILE\Downloads\cudnn-windows-x86_64-8.9.7.29_cuda12-archive"
$cudaPath = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1"

# Copy DLLs
Copy-Item "$cudnnPath\bin\*.dll" -Destination "$cudaPath\bin\"

# Copy headers
Copy-Item "$cudnnPath\include\*.h" -Destination "$cudaPath\include\"

# Copy libs
Copy-Item "$cudnnPath\lib\x64\*.lib" -Destination "$cudaPath\lib\x64\"

Or do it manually:

Copy all files from bin\ to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\
Copy all files from include\ to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include\
Copy all files from lib\x64\ to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64\

Verify cuDNN Installation

# Check DLLs exist
Test-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\cudnn64_8.dll"
# Should return: True

# List all cuDNN DLLs
Get-ChildItem "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\cudnn*.dll"

Step 4: Complete Verification

Let's make sure everything works together.

Hardware Check

nvidia-smi

Should show your GPU with no errors.

CUDA Check

nvcc --version

Should show CUDA 12.1 (or whatever version you installed).

System Variables Check

# All these should return paths
$env:CUDA_PATH
$env:CUDA_PATH_V12_1

# Check PATH includes CUDA bin
$env:PATH -split ';' | Select-String CUDA

CUDA Samples Test (Optional but Recommended)

The CUDA Toolkit includes sample programs. Let's compile and run one.

Navigate to samples:

cd "C:\ProgramData\NVIDIA Corporation\CUDA Samples\v12.1"

Find deviceQuery:

cd "1_Utilities\deviceQuery"

Compile it (requires Visual Studio):

# If you have VS 2022
"C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Current\Bin\MSBuild.exe" deviceQuery_vs2022.vcxproj /p:Configuration=Release /p:Platform=x64

Run it:

.\x64\Release\deviceQuery.exe

You should see output like:

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA RTX A4000"
  CUDA Driver Version / Runtime Version          12.3 / 12.1
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 16376 MBytes (17174683648 bytes)
  (048) Multiprocessors, (128) CUDA Cores/MP:    6144 CUDA Cores
  GPU Max Clock rate:                            1560 MHz (1.56 GHz)
  Memory Clock rate:                             7001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  ...
  Result = PASS

The key line: Result = PASS

If you see this, your GPU + CUDA + cuDNN stack is working!

Step 5: Testing from C#

Now the fun part - let's actually use CUDA from C#!

Create Test Project

mkdir CudaTest
cd CudaTest
dotnet new console -n CudaTest
cd CudaTest

Add ONNX Runtime with CUDA

ONNX Runtime is the easiest way to use CUDA from C#.

dotnet add package Microsoft.ML.OnnxRuntime.Gpu  # Latest version

Why this package?

Microsoft.ML.OnnxRuntime - CPU only
Microsoft.ML.OnnxRuntime.Gpu - Includes CUDA support

Test Code: GPU Detection

Create Program.cs:

using Microsoft.ML.OnnxRuntime;
using System;
using System.Linq;

namespace CudaTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("=== CUDA Test from C# ===\n");

            // Test 1: Can we create a CUDA execution provider?
            Console.WriteLine("Test 1: CUDA Execution Provider");
            try
            {
                var cudaProviderOptions = new OrtCUDAProviderOptions();
                var sessionOptions = new SessionOptions();
                sessionOptions.AppendExecutionProvider_CUDA(cudaProviderOptions);

                Console.WriteLine("✅ CUDA execution provider created successfully");
                Console.WriteLine($"   Device ID: {cudaProviderOptions.DeviceId}");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"❌ Failed to create CUDA provider: {ex.Message}");
                return;
            }

            // Test 2: Check available providers
            Console.WriteLine("\nTest 2: Available Execution Providers");
            var providers = OrtEnv.Instance().GetAvailableProviders();
            foreach (var provider in providers)
            {
                Console.WriteLine($"   - {provider}");
            }

            if (providers.Contains("CUDAExecutionProvider"))
            {
                Console.WriteLine("✅ CUDA provider is available");
            }
            else
            {
                Console.WriteLine("❌ CUDA provider NOT available");
            }

            // Test 3: Get CUDA device count and info
            Console.WriteLine("\nTest 3: CUDA Device Information");
            try
            {
                // ONNX Runtime doesn't expose deviceQuery directly,
                // but we can test by trying to create a session
                var opts = new SessionOptions();
                opts.AppendExecutionProvider_CUDA(0); // Device 0

                Console.WriteLine("✅ Successfully configured for CUDA device 0");
                Console.WriteLine("   (Full device info requires native CUDA calls)");
            }
            catch (Exception ex)
            {
                Console.WriteLine($"❌ CUDA device configuration failed: {ex.Message}");
            }

            Console.WriteLine("\n=== Test Complete ===");
        }
    }
}

Code breakdown:

OrtCUDAProviderOptions - Configures CUDA execution
- DeviceId - Which GPU to use (0 for first GPU)
- Can set memory limits, optimization level, etc.
SessionOptions.AppendExecutionProvider_CUDA() - Tells ONNX Runtime to use GPU
- If this succeeds, CUDA is working
- If it fails, something in the stack is broken
OrtEnv.Instance().GetAvailableProviders() - Lists all available execution providers
- Should include "CUDAExecutionProvider" if everything is installed correctly
- Also shows "CPUExecutionProvider" as fallback

Run the Test

dotnet run

Expected output:

=== CUDA Test from C# ===

Test 1: CUDA Execution Provider
✅ CUDA execution provider created successfully
   Device ID: 0

Test 2: Available Execution Providers
   - CUDAExecutionProvider
   - CPUExecutionProvider
✅ CUDA provider is available

Test 3: CUDA Device Information
✅ Successfully configured for CUDA device 0
   (Full device info requires native CUDA calls)

=== Test Complete ===

Troubleshooting Common Errors

Error: "DllNotFoundException: Unable to load DLL 'onnxruntime'"

Cause: ONNX Runtime can't find CUDA DLLs.

Fix:

# Make sure CUDA bin is in PATH
$env:PATH += ";C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin"

# Verify cudnn DLL exists
Test-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\cudnn64_8.dll"

# Try running again
dotnet run

Error: "CUDAExecutionProvider not found"

Cause: Using wrong ONNX Runtime package (CPU-only).

Fix:

dotnet remove package Microsoft.ML.OnnxRuntime
dotnet add package Microsoft.ML.OnnxRuntime.Gpu --version 1.16.3

Error: "CUDA error code: 35 (CUDA driver version is insufficient)"

Cause: Driver is too old for CUDA 12.x.

Fix: Update NVIDIA driver to 545.xx or newer.

Advanced Test: Actual Inference

Let's do something real - run a tiny neural network on GPU vs CPU and compare speeds.

Download a Test Model

We'll use a simple MNIST digit recognition model (ONNX format).

# Download sample model
Invoke-WebRequest -Uri "https://github.com/onnx/models/raw/main/vision/classification/mnist/model/mnist-8.onnx" -OutFile "mnist.onnx"

Inference Test Code

Update Program.cs:

using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;
using System;
using System.Diagnostics;
using System.Linq;

namespace CudaTest
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("=== GPU vs CPU Inference Test ===\n");

            // Create dummy input (28x28 image flattened to 784 floats)
            var inputData = Enumerable.Range(0, 784).Select(i => (float)i / 784).ToArray();
            var tensor = new DenseTensor<float>(inputData, new[] { 1, 1, 28, 28 });
            var inputs = new List<NamedOnnxValue>
            {
                NamedOnnxValue.CreateFromTensor("Input3", tensor)
            };

            // Test 1: CPU Inference
            Console.WriteLine("Test 1: CPU Inference");
            var cpuTime = TestInference(inputs, useCuda: false, iterations: 100);
            Console.WriteLine($"   Average time: {cpuTime:F2}ms\n");

            // Test 2: GPU Inference
            Console.WriteLine("Test 2: GPU Inference");
            var gpuTime = TestInference(inputs, useCuda: true, iterations: 100);
            Console.WriteLine($"   Average time: {gpuTime:F2}ms\n");

            // Compare
            Console.WriteLine("Comparison:");
            Console.WriteLine($"   CPU: {cpuTime:F2}ms");
            Console.WriteLine($"   GPU: {gpuTime:F2}ms");
            Console.WriteLine($"   Speedup: {cpuTime / gpuTime:F2}x faster on GPU");
        }

        static double TestInference(List<NamedOnnxValue> inputs, bool useCuda, int iterations)
        {
            var options = new SessionOptions();

            if (useCuda)
            {
                options.AppendExecutionProvider_CUDA(0);
            }

            using var session = new InferenceSession("mnist.onnx", options);

            // Warmup run (first run is always slower)
            session.Run(inputs);

            // Timed runs
            var sw = Stopwatch.StartNew();

            for (int i = 0; i < iterations; i++)
            {
                using var results = session.Run(inputs);
                // Force evaluation
                var output = results.First().AsEnumerable<float>().ToArray();
            }

            sw.Stop();

            return sw.Elapsed.TotalMilliseconds / iterations;
        }
    }
}

Code explanation:

DenseTensor - ONNX Runtime's way of representing multi-dimensional arrays
- Shape [1, 1, 28, 28] = batch_size=1, channels=1, height=28, width=28
NamedOnnxValue - Binds a tensor to an input name
- "Input3" is the input name in the MNIST model
Warmup run - First inference is always slower (model loading, optimization)
- We run once before timing to get accurate measurements
Timing methodology - Average over 100 iterations for stable results

Run Performance Test

dotnet run

Expected output (your numbers will vary):

=== GPU vs CPU Inference Test ===

Test 1: CPU Inference
   Average time: 0.42ms

Test 2: GPU Inference
   Average time: 0.15ms

Comparison:
   CPU: 0.42ms
   GPU: 0.15ms
   Speedup: 2.80x faster on GPU

Why such a small speedup?

MNIST is tiny (only 100KB model)
Overhead of GPU data transfer dominates
For larger models (1GB+), you'll see 10-100x speedups

Key Takeaway

We've proven the entire stack works: ✅ Driver installed correctly ✅ CUDA Toolkit accessible ✅ cuDNN integrated ✅ ONNX Runtime finds CUDA ✅ C# can run GPU-accelerated inference

Understanding Performance

Memory Transfer Bottleneck

Here's what happens during inference:

sequenceDiagram
    participant App as C# Application
    participant CPU as CPU Memory
    participant GPU as GPU Memory
    participant Compute as GPU Cores

    App->>CPU: Create input tensor
    CPU->>GPU: Transfer input (PCIe)
    Note over GPU: Slow! ~16GB/s
    GPU->>Compute: Execute model
    Note over Compute: Fast! TFLOPS
    Compute->>GPU: Write output
    GPU->>CPU: Transfer output (PCIe)
    Note over GPU: Slow again!
    CPU->>App: Return results

Performance lessons:

Batch processing - Transfer many inputs at once to amortize overhead
Keep data on GPU - Chain operations to avoid transfers
Larger models benefit more - Fixed transfer cost, scaling compute

When to Use GPU vs CPU

graph TD
    A[Inference Request] --> B{Model Size}
    B -->|< 100MB| C{Batch Size}
    B -->|> 100MB| D[Use GPU]

    C -->|Single Item| E[Use CPU]
    C -->|Batch > 10| D

    D --> F[10-100x Faster]
    E --> G[Lower Latency for Single]

    class D,E choice

    classDef choice stroke:#333,stroke-width:2px

Rule of thumb:

GPU - Large models, batch processing, embedding generation
CPU - Small models, single requests, latency-critical

For our blog writing assistant:

Embeddings - Batch process 100+ chunks → GPU
LLM inference - Large model (7B params) → GPU
Vector search - Depends on DB choice, often CPU-bound

Summary

We've successfully:

✅ Installed NVIDIA driver (545.xx+)
✅ Installed CUDA Toolkit 12.1
✅ Integrated cuDNN 8.9.7
✅ Verified with nvidia-smi and nvcc
✅ Tested GPU access from C# with ONNX Runtime
✅ Ran performance comparison (GPU vs CPU)

Our development environment is now ready for AI workloads!

What's Next?

In Part 3, we'll dive deep into embeddings and vector databases:

What are embeddings and how do they enable semantic search?
Choosing between Qdrant, pgvector, and other options
Generating embeddings locally with ONNX Runtime
Storing and querying millions of vectors efficiently
Building our first semantic search prototype

We'll finally start working with actual blog content and seeing RAG in action!

Troubleshooting Reference

Quick Diagnostic Commands

# Check GPU
nvidia-smi

# Check CUDA
nvcc --version
where.exe nvcc

# Check cuDNN
Test-Path "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\cudnn64_8.dll"

# Check environment
$env:CUDA_PATH
$env:PATH -split ';' | Select-String CUDA

# Test from C#
dotnet run

Common Issues

Error	Cause	Fix
`nvidia-smi` not found	Driver not installed	Install NVIDIA driver
`nvcc` not found	CUDA not in PATH	Add CUDA bin to PATH
DLL load failed	cuDNN missing	Copy cuDNN files to CUDA dir
CUDA provider not found	Wrong NuGet package	Use `Microsoft.ML.OnnxRuntime.Gpu`
Driver version insufficient	Old driver	Update to 545.xx+

Version Compatibility Matrix

CUDA Version	cuDNN Version	ONNX Runtime	Driver Required
12.1	8.9.7	1.16.x	545.xx+
12.3	9.0.0	1.17.x	546.xx+
11.8	8.9.2	1.15.x	520.xx+

Part 1: Introduction & Architecture
Part 2: GPU Setup & CUDA in C# (this post)
Part 3: Understanding Embeddings & Vector Databases
Part 4: Building the Ingestion Pipeline
Part 5: The Windows Client
Part 6: Local LLM Integration
Part 7: Content Generation & Prompt Engineering
Part 8: Advanced Features & Production Deployment

Resources

See you in Part 3, where we finally start building the semantic search engine!