Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
LoRA & PEFT
Parameter-Efficient Fine-Tuning lets you train large models with less memory.What is LoRA?
LoRA (Low-Rank Adaptation) adds small trainable matrices to the model while keeping base weights frozen. This dramatically reduces memory usage and training time.Quick Start
Python API
Parameters
| Parameter | Description | Default |
|---|---|---|
peft | Enable LoRA | False |
lora_r | Rank (size of adapters) | 16 |
lora_alpha | Scaling factor | 32 |
lora_dropout | Dropout rate | 0.05 |
target_modules | Modules to adapt | all-linear |
Rank (lora_r)
Higher rank = more parameters = more capacity:| Rank | Use Case |
|---|---|
| 8 | Simple tasks, very memory constrained |
| 16 | Standard (recommended) |
| 32-64 | Complex tasks, more memory available |
| 128+ | Near full fine-tuning capacity |
Alpha
The alpha/rank ratio affects learning:Target Modules
By default, LoRA targets all linear layers (all-linear). You can customize:
With Quantization
Combine LoRA with quantization for maximum memory savings:Memory Comparison
| Model | Full Fine-tune | LoRA | LoRA + 4bit |
|---|---|---|---|
| 1B | 8 GB | 4 GB | 3 GB |
| 7B | 56 GB | 16 GB | 8 GB |
| 13B | 104 GB | 32 GB | 16 GB |
Merging Adapters
By default, LoRA adapters are automatically merged into the base model after training. This makes inference simpler - you get a single model file ready to use.Default Behavior (Merged)
Save Adapters Only
To save only the adapter files (smaller, but requires base model for inference):Manual Merge Later
You must specify either
--output-folder to save locally or --push-to-hub to upload to Hugging Face Hub.Merge Tool Parameters
| Parameter | Description | Required |
|---|---|---|
--base-model-path | Base model to merge adapter into | Yes |
--adapter-path | Path to LoRA adapter | Yes |
--output-folder | Local output directory | One of these |
--push-to-hub | Push to Hugging Face Hub | required |
--token | Hugging Face token (for hub push) | No |
--pad-to-multiple-of | Pad vocab size | No |
Convert to Kohya Format
Convert LoRA adapters to Kohya-compatible.safetensors format:
Loading Adapters
Use adapters without merging:Best Practices
Training
- Use higher learning rate (2e-4 to 1e-3)
- LoRA benefits from longer training
- Consider targeting all linear layers for complex tasks
Memory
- Start with
lora_r=16 - Add quantization if needed
- Use gradient checkpointing (on by default)
Quality
- Higher rank generally = better quality
- Test on your specific task
- Compare with full fine-tuning if memory allows
Next Steps
Quantization
Further memory reduction
DPO Training
Preference optimization