Pular para o conteúdo principal

Documentation Index

Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt

Use this file to discover all available pages before exploring further.

Processamento em Lote

Execute múltiplos experimentos de treinamento de forma sistemática.

Múltiplas Configurações

Execuções Sequenciais

Execute diferentes configurações em sequência:
for config in configs/*.yaml; do
  echo "Running $config..."
  aitraining --config "$config"
done

Execuções Paralelas

Execute em diferentes GPUs simultaneamente:
CUDA_VISIBLE_DEVICES=0 aitraining --config config1.yaml &
CUDA_VISIBLE_DEVICES=1 aitraining --config config2.yaml &
wait

Varreduras de Parâmetros

Varredura Manual

for lr in 1e-5 2e-5 5e-5; do
  for bs in 4 8 16; do
    aitraining llm --train \
      --model google/gemma-3-270m \
      --data-path ./data \
      --project-name "exp-lr${lr}-bs${bs}" \
      --lr $lr \
      --batch-size $bs
  done
done

Varreduras Integradas

Use o recurso de varredura de hiperparâmetros:
aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name sweep-experiment \
  --use-sweep \
  --sweep-backend optuna \
  --sweep-n-trials 20

Scripts de Experimentos

Script Básico

#!/bin/bash
# experiments.sh

MODELS=(
  "google/gemma-3-270m"
  "google/gemma-2-2b"
)

TRAINERS=(
  "sft"
  "dpo"
)

for model in "${MODELS[@]}"; do
  for trainer in "${TRAINERS[@]}"; do
    name=$(basename $model)-$trainer
    aitraining llm --train \
      --model $model \
      --data-path ./data \
      --trainer $trainer \
      --project-name "$name"
  done
done

Com Logging

#!/bin/bash
# run_experiments.sh

LOG_DIR="logs/$(date +%Y%m%d_%H%M%S)"
mkdir -p "$LOG_DIR"

run_experiment() {
  local config=$1
  local name=$(basename "$config" .yaml)

  echo "[$(date)] Starting $name"
  aitraining --config "$config" 2>&1 | tee "$LOG_DIR/$name.log"
  echo "[$(date)] Finished $name"
}

for config in experiments/*.yaml; do
  run_experiment "$config"
done

echo "All experiments complete. Logs in $LOG_DIR"

Gerenciamento de Trabalhos

Trabalhos em Segundo Plano

# Start in background
nohup aitraining --config config.yaml > training.log 2>&1 &
echo $! > training.pid

# Check status
ps -p $(cat training.pid)

# Stop job
kill $(cat training.pid)

tmux Sessions

# Create session
tmux new-session -d -s training

# Run training
tmux send-keys -t training "aitraining --config config.yaml" Enter

# Attach to see output
tmux attach -t training

# Detach: Ctrl+B, D

Coleta de Resultados

Agregar Métricas

import json
from pathlib import Path

results = []
for exp_dir in Path("experiments").glob("*/"):
    # Training state is saved in trainer_state.json
    state_file = exp_dir / "trainer_state.json"
    if state_file.exists():
        with open(state_file) as f:
            state = json.load(f)
        results.append({
            "experiment": exp_dir.name,
            "best_metric": state.get("best_metric"),
            "global_step": state.get("global_step"),
            "epoch": state.get("epoch"),
        })

# Sort by best_metric (typically eval_loss)
results.sort(key=lambda x: x.get("best_metric") or float("inf"))

# Print best
print("Best experiment:", results[0]["experiment"])

Comparar com W&B

Ao usar --log wandb, todos os experimentos são rastreados. Defina o projeto W&B via variável de ambiente:
# Set W&B project for all runs
export WANDB_PROJECT=my-experiments

aitraining llm --train \
  --model google/gemma-3-270m \
  --data-path ./data \
  --project-name exp-1 \
  --log wandb
Visualize comparações no painel W&B.

Próximos Passos

Automação de Pipeline

Construir pipelines de treinamento

Logging e Debug

Monitorar e depurar treinamento