Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
LLM Training
Theaitraining llm command trains large language models with support for multiple trainers and techniques.
Quick Start
Available Trainers
| Trainer | Description |
|---|---|
default / sft / generic | Supervised fine-tuning |
dpo | Direct Preference Optimization |
orpo | Odds Ratio Preference Optimization |
ppo | Proximal Policy Optimization |
grpo | Group Relative Policy Optimization (custom environments) |
reward | Reward model training |
distillation | Knowledge distillation |
generic is an alias for default. All three (default, sft, generic) produce the same behavior.Parameter Groups
Parameters are organized into logical groups:Basic Parameters
| Parameter | Description | Default |
|---|---|---|
--model | Base model to fine-tune | google/gemma-3-270m |
--data-path | Path to training data | data |
--project-name | Output directory name | project-name |
--train-split | Training data split | train |
--valid-split | Validation data split | None |
Always specify these parameters: While
--model, --data-path, and --project-name have defaults, you should always explicitly set them for your use case. The --project-name parameter sets the output folder - use a path like --project-name ./models/my-experiment to control where the trained model is saved.Training Configuration
| Parameter | Description | Default |
|---|---|---|
--trainer | Training method | default |
--epochs | Number of training epochs | 1 |
--batch-size | Training batch size | 2 |
--lr | Learning rate | 3e-5 |
--mixed-precision | fp16/bf16/None | None |
--gradient-accumulation | Accumulation steps | 4 |
--warmup-ratio | Warmup ratio | 0.1 |
--optimizer | Optimizer | adamw_torch |
--scheduler | LR scheduler | linear |
--weight-decay | Weight decay | 0.0 |
--max-grad-norm | Max gradient norm | 1.0 |
--seed | Random seed | 42 |
Checkpointing & Evaluation
| Parameter | Description | Default |
|---|---|---|
--eval-strategy | When to evaluate (epoch, steps, no) | epoch |
--save-strategy | When to save (epoch, steps, no) | epoch |
--save-steps | Save every N steps (if save-strategy=steps) | 500 |
--save-total-limit | Max checkpoints to keep | 1 |
--logging-steps | Log every N steps (-1 for auto) | -1 |
--resume-from-checkpoint | Resume from checkpoint path, or auto to detect latest | None |
Performance & Memory
| Parameter | Description | Default |
|---|---|---|
--auto-find-batch-size | Automatically find optimal batch size | False |
--disable-gradient-checkpointing | Disable memory optimization | False |
--unsloth | Use Unsloth for faster training (SFT only, llama/mistral/gemma/qwen2) | False |
--use-sharegpt-mapping | Use Unsloth’s ShareGPT mapping | False |
--use-flash-attention-2 | Use Flash Attention 2 for faster training | False |
--attn-implementation | Attention implementation (eager, sdpa, flash_attention_2) | None |
Unsloth Requirements: Unsloth only works with
sft/default trainers and specific model architectures (llama, mistral, gemma, qwen2). See Unsloth Integration for details.Backend & Distribution
| Parameter | Description | Default |
|---|---|---|
--backend | Where to run (local, spaces) | local |
--distributed-backend | Distribution backend (ddp, deepspeed) | None |
--ddp-timeout | DDP/NCCL timeout in seconds | 7200 |
Multi-GPU Behavior: With multiple GPUs and
--distributed-backend not set, DDP is used automatically. Set --distributed-backend deepspeed for DeepSpeed Zero-3 optimization. Training is launched via Accelerate.PEFT/LoRA Parameters
| Parameter | Description | Default |
|---|---|---|
--peft | Enable LoRA training | False |
--lora-r | LoRA rank | 16 |
--lora-alpha | LoRA alpha | 32 |
--lora-dropout | LoRA dropout | 0.05 |
--target-modules | Modules to target | all-linear |
--quantization | int4/int8 quantization | None |
--merge-adapter | Merge LoRA after training | True |
Data Processing
| Parameter | Description | Default |
|---|---|---|
--text-column | Text column name | text |
--block-size | Max sequence length | -1 (model default) |
--model-max-length | Maximum model input length | Auto-detect from model |
--padding | Padding side (left or right) | right |
--add-eos-token | Append EOS token | True |
--chat-template | Chat template to use | Auto by trainer |
--packing | Enable sequence packing (requires flash attention) | None |
--auto-convert-dataset | Auto-detect and convert dataset format | False |
--max-samples | Limit dataset size for testing | None |
--save-processed-data | Save processed data: auto, local, hub, both, none | auto |
Chat Template Auto-Selection: SFT/DPO/ORPO/Reward trainers default to
tokenizer (model’s built-in template). Use --chat-template none for plain text training.Processed Data Saving: By default (
auto), processed data is saved locally to {project}/data_processed/. If the source dataset was from the Hub, it’s also pushed as a private dataset. Original columns are renamed to _original_* to prevent conflicts.Training Examples
SFT with LoRA
DPO Training
For DPO, you must specify the column names for prompt, chosen, and rejected responses:ORPO Training
ORPO combines SFT and preference optimization:GRPO Training
Train with Group Relative Policy Optimization using your own reward environment:GRPO generates multiple completions per prompt, scores them via your environment (0-1), and optimizes the policy. See GRPO Training for environment interface details.
Knowledge Distillation
Train a smaller model to mimic a larger one:Distillation defaults:
--distill-temperature 3.0, --distill-alpha 0.7, --distill-max-teacher-length 512Logging & Monitoring
Weights & Biases (Default)
W&B logging with LEET visualizer is enabled by default. The LEET visualizer shows real-time training metrics directly in your terminal.TensorBoard
Push to Hugging Face Hub
Upload your trained model:The repository is created as private by default. By default, the repo will be named
{username}/{project-name}.Custom Repository Name or Organization
Use--repo-id to push to a specific repository, useful for:
- Pushing to an organization instead of your personal account
- Using a different repo name than your local
project-name
| Parameter | Description | Default |
|---|---|---|
--push-to-hub | Enable pushing to Hub | False |
--hub-private / --no-hub-private | Create repo as private or public | True (private) |
--username | HF username (for default repo naming) | None |
--token | HF API token | None |
--repo-id | Full repo ID (e.g., org/model-name) | {username}/{project-name} |
Advanced Options
Hyperparameter Sweeps
Enhanced Evaluation
View All Parameters
See all parameters for a specific trainer:Next Steps
YAML Configs
Use configuration files
DPO Training
Deep dive into DPO
LoRA/PEFT
Efficient fine-tuning
Distillation
Knowledge distillation
GRPO Training
RL with custom environments