Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Hyperparameters
Hyperparameters control how your model learns. Think of them as the settings on your training.The Essential Three
Learning Rate
How big the steps are when updating the model.- Too high (0.01): Model jumps around, never converges
- Too low (0.00001): Takes forever to train
- Just right (0.00002): Steady improvement
- Fine-tuning: 2e-5 to 5e-5
- Training from scratch: 1e-4 to 1e-3
Batch Size
How many examples to process before updating weights.- Small (8): More updates, less stable, needs less memory
- Large (128): Fewer updates, more stable, needs more memory
- Limited GPU: 8-16
- Good GPU: 32-64
- Multiple GPUs: 128+
Epochs
How many times to go through your entire dataset.- Too few (1): Underfitting, model hasn’t learned enough
- Too many (100): Overfitting, memorized training data
- Just right (3-10): Good balance
Secondary Settings
Warmup Steps
Gradually increase learning rate at the start.Weight Decay
Regularization that prevents weights from getting too large.- Default: 0.0 (for LLM fine-tuning)
- No regularization: 0
- Strong regularization: 0.1
Gradient Accumulation
Simulate larger batches on limited hardware.Task-Specific Defaults
Text Classification
Language Model Fine-tuning
Image Classification
When to Adjust
Learning rate too high?- Loss explodes or becomes NaN
- Accuracy jumps around wildly
- Never converges
- Loss barely decreases
- Training takes forever
- Stuck at poor performance
- Out of memory → reduce batch size
- Training unstable → increase batch size
- Use gradient accumulation if memory limited
Quick Start Values
Not sure where to start? Try these:Evaluation Settings
Control when and how your model is evaluated during training:| Parameter | Description | Default |
|---|---|---|
eval_strategy | When to evaluate (epoch, steps, no) | epoch |
eval_batch_size | Batch size for evaluation | 8 |
use_enhanced_eval | Enable advanced metrics (BLEU, ROUGE, etc.) | False |
eval_metrics | Metrics to compute (comma-separated) | perplexity |
eval_save_predictions | Save model predictions | False |
eval_benchmark | Run standard benchmark (mmlu, hellaswag, arc, truthfulqa) | None |
Pro Tips
- Start with defaults - Don’t overthink initially
- Change one at a time - Easier to see what helps
- Log everything - Track what works for your data
- Use validation set - Monitor overfitting
Next Steps
Evaluation Metrics
Measure your success
How Training Works
Understand the process