Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Processamento em Lote
Execute múltiplos experimentos de treinamento de forma sistemática.
Múltiplas Configurações
Execuções Sequenciais
Execute diferentes configurações em sequência:
for config in configs/*.yaml; do
echo "Running $config..."
aitraining --config "$config"
done
Execuções Paralelas
Execute em diferentes GPUs simultaneamente:
CUDA_VISIBLE_DEVICES=0 aitraining --config config1.yaml &
CUDA_VISIBLE_DEVICES=1 aitraining --config config2.yaml &
wait
Varreduras de Parâmetros
Varredura Manual
for lr in 1e-5 2e-5 5e-5; do
for bs in 4 8 16; do
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name "exp-lr${lr}-bs${bs}" \
--lr $lr \
--batch-size $bs
done
done
Varreduras Integradas
Use o recurso de varredura de hiperparâmetros:
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name sweep-experiment \
--use-sweep \
--sweep-backend optuna \
--sweep-n-trials 20
Scripts de Experimentos
Script Básico
#!/bin/bash
# experiments.sh
MODELS=(
"google/gemma-3-270m"
"google/gemma-2-2b"
)
TRAINERS=(
"sft"
"dpo"
)
for model in "${MODELS[@]}"; do
for trainer in "${TRAINERS[@]}"; do
name=$(basename $model)-$trainer
aitraining llm --train \
--model $model \
--data-path ./data \
--trainer $trainer \
--project-name "$name"
done
done
#!/bin/bash
# run_experiments.sh
LOG_DIR="logs/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$LOG_DIR"
run_experiment() {
local config=$1
local name=$(basename "$config" .yaml)
echo "[$(date)] Starting $name"
aitraining --config "$config" 2>&1 | tee "$LOG_DIR/$name.log"
echo "[$(date)] Finished $name"
}
for config in experiments/*.yaml; do
run_experiment "$config"
done
echo "All experiments complete. Logs in $LOG_DIR"
Gerenciamento de Trabalhos
Trabalhos em Segundo Plano
# Start in background
nohup aitraining --config config.yaml > training.log 2>&1 &
echo $! > training.pid
# Check status
ps -p $(cat training.pid)
# Stop job
kill $(cat training.pid)
tmux Sessions
# Create session
tmux new-session -d -s training
# Run training
tmux send-keys -t training "aitraining --config config.yaml" Enter
# Attach to see output
tmux attach -t training
# Detach: Ctrl+B, D
Coleta de Resultados
Agregar Métricas
import json
from pathlib import Path
results = []
for exp_dir in Path("experiments").glob("*/"):
# Training state is saved in trainer_state.json
state_file = exp_dir / "trainer_state.json"
if state_file.exists():
with open(state_file) as f:
state = json.load(f)
results.append({
"experiment": exp_dir.name,
"best_metric": state.get("best_metric"),
"global_step": state.get("global_step"),
"epoch": state.get("epoch"),
})
# Sort by best_metric (typically eval_loss)
results.sort(key=lambda x: x.get("best_metric") or float("inf"))
# Print best
print("Best experiment:", results[0]["experiment"])
Ao usar --log wandb, todos os experimentos são rastreados. Defina o projeto W&B via variável de ambiente:
# Set W&B project for all runs
export WANDB_PROJECT=my-experiments
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name exp-1 \
--log wandb
Visualize comparações no painel W&B.
Próximos Passos
Automação de Pipeline
Construir pipelines de treinamento
Logging e Debug
Monitorar e depurar treinamento