Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Procesamiento por Lotes
Ejecuta múltiples experimentos de entrenamiento sistemáticamente.
Múltiples Configuraciones
Ejecuciones Secuenciales
Ejecuta diferentes configuraciones en secuencia:
for config in configs/*.yaml; do
echo "Running $config..."
aitraining --config "$config"
done
Ejecuciones Paralelas
Ejecuta en diferentes GPUs simultáneamente:
CUDA_VISIBLE_DEVICES=0 aitraining --config config1.yaml &
CUDA_VISIBLE_DEVICES=1 aitraining --config config2.yaml &
wait
Barridos de Parámetros
Barrido Manual
for lr in 1e-5 2e-5 5e-5; do
for bs in 4 8 16; do
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name "exp-lr${lr}-bs${bs}" \
--lr $lr \
--batch-size $bs
done
done
Barridos Integrados
Usa la característica de barrido de hiperparámetros:
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name sweep-experiment \
--use-sweep \
--sweep-backend optuna \
--sweep-n-trials 20
Scripts de Experimentos
Script Básico
#!/bin/bash
# experiments.sh
MODELS=(
"google/gemma-3-270m"
"google/gemma-2-2b"
)
TRAINERS=(
"sft"
"dpo"
)
for model in "${MODELS[@]}"; do
for trainer in "${TRAINERS[@]}"; do
name=$(basename $model)-$trainer
aitraining llm --train \
--model $model \
--data-path ./data \
--trainer $trainer \
--project-name "$name"
done
done
Con Logging
#!/bin/bash
# run_experiments.sh
LOG_DIR="logs/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$LOG_DIR"
run_experiment() {
local config=$1
local name=$(basename "$config" .yaml)
echo "[$(date)] Starting $name"
aitraining --config "$config" 2>&1 | tee "$LOG_DIR/$name.log"
echo "[$(date)] Finished $name"
}
for config in experiments/*.yaml; do
run_experiment "$config"
done
echo "All experiments complete. Logs in $LOG_DIR"
Gestión de Trabajos
Trabajos en Segundo Plano
# Start in background
nohup aitraining --config config.yaml > training.log 2>&1 &
echo $! > training.pid
# Check status
ps -p $(cat training.pid)
# Stop job
kill $(cat training.pid)
tmux Sessions
# Create session
tmux new-session -d -s training
# Run training
tmux send-keys -t training "aitraining --config config.yaml" Enter
# Attach to see output
tmux attach -t training
# Detach: Ctrl+B, D
Recopilación de Resultados
Agregar Métricas
import json
from pathlib import Path
results = []
for exp_dir in Path("experiments").glob("*/"):
# Training state is saved in trainer_state.json
state_file = exp_dir / "trainer_state.json"
if state_file.exists():
with open(state_file) as f:
state = json.load(f)
results.append({
"experiment": exp_dir.name,
"best_metric": state.get("best_metric"),
"global_step": state.get("global_step"),
"epoch": state.get("epoch"),
})
# Sort by best_metric (typically eval_loss)
results.sort(key=lambda x: x.get("best_metric") or float("inf"))
# Print best
print("Best experiment:", results[0]["experiment"])
Comparar con W&B
Al usar --log wandb, todos los experimentos se rastrean. Establece el proyecto W&B mediante variable de entorno:
# Set W&B project for all runs
export WANDB_PROJECT=my-experiments
aitraining llm --train \
--model google/gemma-3-270m \
--data-path ./data \
--project-name exp-1 \
--log wandb
Visualiza comparaciones en el panel de W&B.
Próximos Pasos
Automatización de Pipeline
Construir pipelines de entrenamiento
Logging y Debug
Monitorear y depurar entrenamiento