Documentation Index
Fetch the complete documentation index at: https://docs.monostate.ai/llms.txt
Use this file to discover all available pages before exploring further.
Generation Parameters
Adjust these settings to control model output.Key Parameters
Temperature
Controls randomness in responses.| Value | Effect | Use Case |
|---|---|---|
| 0.0 - 0.3 | Very consistent, deterministic | Factual answers, code |
| 0.5 - 0.7 | Balanced | General conversation |
| 0.8 - 1.0 | More varied, creative | Creative writing |
| 1.0+ | Very random | Brainstorming |
Max Tokens
Maximum length of the response.| Value | Typical Use |
|---|---|
| 50-100 | Short answers |
| 256 | Standard responses |
| 512-1024 | Detailed explanations |
| 2048+ | Long-form content |
Longer max tokens = longer generation time.
Top-p (Nucleus Sampling)
Limits token selection to a cumulative probability.- 0.95 (UI default) - Consider tokens until 95% probability mass
- 0.9 - Slightly more focused
- 0.5 - Very focused
Top-k
Limits to the k most likely tokens.- 50 (default) - Consider top 50 tokens
- 10 - Very focused
- 100 - More variety
Parameter Combinations
Factual Q&A
Creative Writing
Code Generation
Conversation
Finding the Right Settings
Start with Defaults
Default settings work for most cases:- temperature: 0.7
- max_tokens: 256
- top_p: 0.95
- top_k: 50
- do_sample: true
UI Slider Ranges
The chat interface provides these parameter ranges:| Parameter | Min | Max | Step | Default |
|---|---|---|---|---|
| Temperature | 0 | 2 | 0.1 | 0.7 |
| Max Tokens | 50 | 2048 | 50 | 256 |
| Top P | 0 | 1 | 0.05 | 0.95 |
| Top K | 0 | 100 | 5 | 50 |
Adjust One at a Time
- If responses are too random → lower temperature
- If responses are too repetitive → raise temperature
- If responses are cut off → increase max_tokens
- If responses are too long → decrease max_tokens
Test Systematically
For important applications:- Pick 5-10 test prompts
- Try each parameter setting
- Compare outputs
- Document what works
Advanced Parameters
Repetition Penalty
Reduces repeated phrases.- 1.0 - No penalty
- 1.1 - Mild penalty (recommended)
- 1.3+ - Strong penalty
Stop Sequences
End generation when these tokens appear.- Useful for structured output
- Example:
["\n\n", "User:"]
Do Sample
Controls whether to use sampling or greedy decoding.- true (default) - Use sampling with temperature/top-p/top-k
- false - Greedy decoding (always pick most likely token)
System Prompt
Set a system message to guide model behavior. Available in the chat interface settings panel. Example system prompts:- “You are a helpful coding assistant. Provide concise code examples.”
- “You are a creative writing partner. Be imaginative and descriptive.”
- “You are a technical documentation expert. Be precise and thorough.”
Parameter Effects Summary
| Parameter | Low Value | High Value |
|---|---|---|
| temperature | Consistent, focused | Random, creative |
| max_tokens | Short responses | Long responses |
| top_p | Focused | Varied |
| top_k | Very focused | More options |
| repetition_penalty | May repeat | Avoids repetition |
Next Steps
CLI Training
Train models with CLI
Python API
Programmatic control