Generation Parameters

Adjust these settings to control model output.

Key Parameters

Temperature

Controls randomness in responses.

Value	Effect	Use Case
0.0 - 0.3	Very consistent, deterministic	Factual answers, code
0.5 - 0.7	Balanced	General conversation
0.8 - 1.0	More varied, creative	Creative writing
1.0+	Very random	Brainstorming

Low temperature (0.3):  "The capital of France is Paris."
High temperature (1.2): "Paris, the city of lights, serves as France's bustling capital!"

Max Tokens

Maximum length of the response.

Value	Typical Use
50-100	Short answers
256	Standard responses
512-1024	Detailed explanations
2048+	Long-form content

Longer max tokens = longer generation time.

Top-p (Nucleus Sampling)

Limits token selection to a cumulative probability.

0.95 (UI default) - Consider tokens until 95% probability mass
0.9 - Slightly more focused
0.5 - Very focused

Top-k

Limits to the k most likely tokens.

50 (default) - Consider top 50 tokens
10 - Very focused
100 - More variety

Parameter Combinations

Factual Q&A

temperature: 0.3
max_tokens: 256
top_p: 0.9

Consistent, accurate answers.

Creative Writing

temperature: 0.9
max_tokens: 1024
top_p: 0.95

Varied, creative output.

Code Generation

temperature: 0.2
max_tokens: 512
top_p: 0.95

Precise, syntactically correct code.

Conversation

temperature: 0.7
max_tokens: 256
top_p: 0.9

Natural, varied responses.

Finding the Right Settings

Start with Defaults

Default settings work for most cases:

temperature: 0.7
max_tokens: 256
top_p: 0.95
top_k: 50
do_sample: true

UI Slider Ranges

The chat interface provides these parameter ranges:

Parameter	Min	Max	Step	Default
Temperature	0	2	0.1	0.7
Max Tokens	50	2048	50	256
Top P	0	1	0.05	0.95
Top K	0	100	5	50

Adjust One at a Time

If responses are too random → lower temperature
If responses are too repetitive → raise temperature
If responses are cut off → increase max_tokens
If responses are too long → decrease max_tokens

Test Systematically

For important applications:

Pick 5-10 test prompts
Try each parameter setting
Compare outputs
Document what works

Advanced Parameters

Repetition Penalty

Reduces repeated phrases.

1.0 - No penalty
1.1 - Mild penalty (recommended)
1.3+ - Strong penalty

Stop Sequences

End generation when these tokens appear.

Useful for structured output
Example: ["\n\n", "User:"]

Do Sample

Controls whether to use sampling or greedy decoding.

true (default) - Use sampling with temperature/top-p/top-k
false - Greedy decoding (always pick most likely token)

System Prompt

Set a system message to guide model behavior. Available in the chat interface settings panel. Example system prompts:

“You are a helpful coding assistant. Provide concise code examples.”
“You are a creative writing partner. Be imaginative and descriptive.”
“You are a technical documentation expert. Be precise and thorough.”

The system prompt is prepended to the conversation context and influences how the model responds throughout the session.

Parameter Effects Summary

Parameter	Low Value	High Value
temperature	Consistent, focused	Random, creative
max_tokens	Short responses	Long responses
top_p	Focused	Varied
top_k	Very focused	More options
repetition_penalty	May repeat	Avoids repetition

Getting Started

Using Chat

Generation Parameters

Generation Parameters

Key Parameters

Temperature

Max Tokens

Top-p (Nucleus Sampling)

Top-k

Parameter Combinations

Factual Q&A

Creative Writing

Code Generation

Conversation

Finding the Right Settings

Start with Defaults

UI Slider Ranges

Adjust One at a Time

Test Systematically

Advanced Parameters

Repetition Penalty

Stop Sequences

Do Sample

System Prompt

Parameter Effects Summary

Next Steps

CLI Training

Python API

​Generation Parameters

​Key Parameters

​Temperature

​Max Tokens

​Top-p (Nucleus Sampling)

​Top-k

​Parameter Combinations

​Factual Q&A

​Creative Writing

​Code Generation

​Conversation

​Finding the Right Settings

​Start with Defaults

​UI Slider Ranges

​Adjust One at a Time

​Test Systematically

​Advanced Parameters

​Repetition Penalty

​Stop Sequences

​Do Sample

​System Prompt

​Parameter Effects Summary

​Next Steps

CLI Training

Python API

Generation Parameters

Key Parameters

Temperature

Max Tokens

Top-p (Nucleus Sampling)

Top-k

Parameter Combinations

Factual Q&A

Creative Writing

Code Generation

Conversation

Finding the Right Settings

Start with Defaults

UI Slider Ranges

Adjust One at a Time

Test Systematically

Advanced Parameters

Repetition Penalty

Stop Sequences

Do Sample

System Prompt

Parameter Effects Summary

Next Steps