website

What LLM Can I Run Locally?

If you want to run a Large Language Model (LLM) on your own computer, you need to make sure your hardware can handle it. The VRAM Calculator at apxml.com/tools/vram-calculator helps you figure out what models will work for you and how to optimize your setup. Here’s a beginner-friendly guide to the choices you’ll see:

1. Model Selection

2. Inference Quantization

3. KV Cache Quantization

4. Hardware Configuration

5. Number of GPUs

6. Batch Size

7. Sequence Length

8. Concurrent Users


How to Use the Calculator

  1. Enter your hardware details (GPU, VRAM).
  2. Pick a model and set quantization options.
  3. Adjust batch size, sequence length, and users as needed.
  4. The calculator shows if your setup can run the model, how much VRAM is used, and how fast it will be.

What to Look For


Summary Table

Choice What It Does Beginner Tip
Model Pick size & type of LLM Start small, upgrade later
Quantization Controls memory vs. quality FP16 is a good balance
KV Cache Quantization Context memory precision FP16/BF16 is usually fine
GPU/VRAM Your hardware limits Check your GPU specs
Batch Size Inputs per step Use 1 for home use
Sequence Length Max context window 1024+ for chat, lower for Q&A
Concurrent Users Simultaneous users Usually 1 for personal use

By understanding these options, you can pick the best LLM for your hardware and needs. The VRAM Calculator makes it easy to experiment and see what works before you download or run anything.