Choosing the Right Model

The model you choose dramatically affects training time, quality, and hardware requirements. This guide helps you make the right choice.

Model Size vs Hardware

The golden rule: A model needs roughly 2x its parameter count in GB of memory for training. A 7B model needs ~14GB VRAM for full training, or ~8GB with LoRA.

Quick Reference

Your Hardware	Max Model Size	Recommended Models
MacBook Air M1 (8GB)	500M - 1B	`google/gemma-3-270m`
MacBook Pro M2 (16GB)	1B - 3B	`google/gemma-2-2b`, `Llama-3.2-1B`
MacBook Pro M3 Max (36-64GB)	7B - 13B	`Llama-3.2-8B`, `Mistral-7B`
RTX 3060/3070 (8-12GB)	1B - 3B	`gemma-2-2b`, `Llama-3.2-3B`
RTX 3090/4090 (24GB)	7B - 13B	`Llama-3.2-8B`, `Mistral-7B`
A100 (40-80GB)	30B - 70B	`Llama-3.1-70B` with quantization

Memory Estimation Formula

Full training:   params × 4 bytes × 4 (model + optimizer + gradients + activations)
With LoRA:       params × 2 bytes + ~2GB
With LoRA + int4: params × 0.5 bytes + ~2GB

Example: 7B model

Full training: 7B × 16 = ~112GB (needs multi-GPU)
With LoRA: 7B × 2 + 2GB = ~16GB
With LoRA + int4: 7B × 0.5 + 2GB = ~6GB

Base vs Instruction-Tuned Models

This is one of the most important decisions you’ll make.

Base Models (Pretrained)

Examples: google/gemma-2-2b, meta-llama/Llama-3.2-1B What they are: Trained on raw text to predict the next word. They know language but don’t know how to be helpful. When to use:

You have lots of training data (10k+ examples)
You want full control over the model’s behavior
You’re training for a specific format (not chat)
You want to create your own instruction style

Example behavior before training:

User: What is the capital of France?
Model: The question was first posed in 1789 when...

Instruction-Tuned Models (IT/Instruct)

Examples: google/gemma-2-2b-it, meta-llama/Llama-3.2-1B-Instruct What they are: Base models that have already been trained to follow instructions and be helpful. When to use:

You have limited training data (100-5k examples)
You want to refine existing helpful behavior
You’re building a chatbot or assistant
You want faster results with less data

Example behavior before training:

User: What is the capital of France?
Model: The capital of France is Paris.

Decision Matrix

Situation	Use Base	Use Instruction-Tuned
Less than 1k examples		✓
1k - 10k examples	Depends	✓
10k+ examples	✓
Chat/assistant use case		✓
Custom format (not chat)	✓
Domain-specific (medical, legal)	✓	✓ (either works)
Code generation		✓
Creative writing	✓	✓ (either works)

Model Families

Google Gemma

Versions: Gemma 2, Gemma 3

Model	Size	Best For
`google/gemma-3-270m`	270M	Testing, learning, CPU/Apple Silicon
`google/gemma-2-2b`	2B	Consumer GPUs, good quality/speed balance
`google/gemma-2-9b`	9B	High quality on good hardware
`google/gemma-2-27b`	27B	Best Gemma quality, needs serious hardware

Strengths: Great for smaller sizes, efficient, good multilingual support Tip: Add -it suffix for instruction-tuned versions

Meta Llama

Versions: Llama 3.1, Llama 3.2

Model	Size	Best For
`meta-llama/Llama-3.2-1B`	1B	Mobile, edge devices
`meta-llama/Llama-3.2-3B`	3B	Consumer hardware
`meta-llama/Llama-3.1-8B`	8B	General purpose, excellent quality
`meta-llama/Llama-3.1-70B`	70B	Production quality, needs cloud GPU

Strengths: Excellent quality, strong reasoning, great community support Note: Requires accepting license on HuggingFace first

Mistral

Model	Size	Best For
`mistralai/Mistral-7B-v0.3`	7B	Great quality/efficiency ratio
`mistralai/Mixtral-8x7B`	8x7B	MoE architecture, fast inference

Strengths: Efficient, fast inference, good at code Tip: Mistral often punches above its weight class

Qwen (Alibaba)

Model	Size	Best For
`Qwen/Qwen2.5-0.5B`	500M	Ultra-small, edge devices
`Qwen/Qwen2.5-3B`	3B	Balanced for consumer hardware
`Qwen/Qwen2.5-7B`	7B	Excellent multilingual, especially Chinese

Strengths: Excellent multilingual, especially Asian languages

Searching for Models

In the wizard, use these commands:

# Search by name
/search llama

# Search by capability
/search code
/search multilingual

# Filter by size
/filter

# Sort options
/sort

Sorting Options

Option	When to Use
Trending	See what’s popular right now
Downloads	Most proven/used models
Likes	Community favorites
Recent	Newest releases

Tips for Choosing

Start small, scale up

Always start with a smaller model like gemma-3-270m. Get your pipeline working, verify your dataset is formatted correctly, then scale up to larger models.

Don't chase the biggest model

A well-trained 3B model often beats a poorly-trained 7B model. Focus on data quality first, then scale the model.

Match model to data

If you only have 500 examples, a 270M-1B model is plenty. Using a 7B model will just memorize your data instead of learning patterns.

Consider inference costs

If you’re deploying the model, remember: larger models cost more to run. A 1B model is 7x cheaper to serve than a 7B model.

Try instruction-tuned first

Unless you have 10k+ high-quality examples, start with an instruction-tuned model. You’ll get better results faster.

Validating Your Choice

After selecting a model, the wizard validates it exists:

✓ Model: google/gemma-3-270m

If it doesn’t exist:

❌ Model 'google/gemma3-270m' not found on HuggingFace Hub.
  Suggestions: Did you mean 'google/gemma-3-270m'?
  Check the model ID at https://huggingface.co/models

Try again with a different model? [Y/n]:

Getting Started

Understanding Choices

Choosing Models

Choosing the Right Model

Model Size vs Hardware

Quick Reference

Memory Estimation Formula

Base vs Instruction-Tuned Models

Base Models (Pretrained)

Instruction-Tuned Models (IT/Instruct)

Decision Matrix

Model Families

Google Gemma

Meta Llama

Mistral

Qwen (Alibaba)

Searching for Models

Sorting Options

Tips for Choosing

Validating Your Choice

Next Steps

Dataset Guide

LoRA for Large Models

Getting Started

Understanding Choices

Documentation Index

​Choosing the Right Model

​Model Size vs Hardware

​Quick Reference

​Memory Estimation Formula

​Base vs Instruction-Tuned Models

​Base Models (Pretrained)

​Instruction-Tuned Models (IT/Instruct)

​Decision Matrix

​Model Families

​Google Gemma

​Meta Llama

​Mistral

​Qwen (Alibaba)

​Searching for Models

​Sorting Options

​Tips for Choosing

​Validating Your Choice

​Next Steps

Dataset Guide

LoRA for Large Models

Choosing the Right Model

Model Size vs Hardware

Quick Reference

Memory Estimation Formula

Base vs Instruction-Tuned Models

Base Models (Pretrained)

Instruction-Tuned Models (IT/Instruct)

Decision Matrix

Model Families

Google Gemma

Meta Llama

Mistral

Qwen (Alibaba)

Searching for Models

Sorting Options

Tips for Choosing

Validating Your Choice

Next Steps