Understanding AI Chat Settings
A Complete Guide to Large Language Model Parameters
A step-by-step guide to understanding and customizing AI chatbot settings for researchers and users.
Introduction
AI chatbots are highly configurable and can be customized for different applications. This guide explains each setting in **simple terms** to help users optimize responses for **accuracy, creativity, and efficiency**.
General Parameters
These settings control how the AI behaves when generating responses. They influence factors like response flow, creativity, consistency, and stopping conditions.
Stream Chat Response
This controls how the AI delivers its responses—either word-by-word in real time or as a fully-formed message.
OFF: The model **processes the full response silently** and then displays the entire answer all at once.
Function Calling
Some advanced LLMs can interact with **external functions or APIs** to fetch real-world data or execute commands.
Disabled: The AI only generates responses based on its internal knowledge, without external actions.
Seed
A **seed value** ensures that the model generates the **same response** for the same input, making results **predictable and reproducible**.
Without a Seed: The response may vary slightly each time.
Stop Sequence
Defines **special words or characters** that tell the AI when to **stop generating text**.
Temperature
Temperature controls **how creative or predictable** the AI’s responses are.
Low (e.g., 0.2): The AI **sticks to safe, factual** responses.
Reasoning Effort
This setting controls **how much time** the AI spends **thinking before responding**.
Lower effort: The AI **responds quickly**, but answers may be **simpler**.
Mirostat Parameters (Balance Control)
Mirostat is an **advanced dynamic control mechanism** designed to **adjust randomness** in AI-generated responses. It helps maintain a balance between **coherent, predictable text** and **creative, varied responses**.
Mirostat
Mirostat dynamically adjusts **temperature and randomness** to **keep responses balanced**. It prevents the model from becoming **too random or too deterministic** during a conversation.
Disabled: You need to manually control randomness using **Temperature, Top-K, and Top-P**.
Mirostat Eta
This parameter controls **how fast** the AI adapts to randomness and adjusts its response style.
Lower values (e.g., 0.1): The AI **adjusts gradually**, making slower changes.
Mirostat Tau (t)
Mirostat Tau defines the **level of "surprise"** in the AI’s word selection. A lower value makes responses **more controlled**, while a higher value introduces **more variety**.
Higher values (e.g., 8.0): AI allows for **more surprising and diverse** responses.
Word Choice Controls
These parameters control how the AI **chooses words** during text generation. They determine whether responses are **precise and focused** or **diverse and creative**.
Top-K
Top-K limits the **vocabulary pool** by selecting words from the **K most likely choices**. The lower the value, the **more deterministic** the AI’s response.
Higher K (e.g., 100): AI picks from a **larger set** more diverse but potentially random wording.
Top-P (Nucleus Sampling)
Instead of selecting a **fixed number** of words (like Top-K), Top-P selects words **whose cumulative probability adds up to a given threshold**.
Higher P (e.g., 0.9): AI **considers a broader range of words** richer, more unpredictable responses.
Min-P
Ensures that even **low-probability words have a chance** to be selected, **preventing repetition** of the most common words.
Lower Min-P: AI **stays safe**, sticking to predictable words.
Frequency Penalty
Prevents AI from **repeating the same words too often**, promoting more **varied** responses.
Lower Value (e.g., 0.2): Minimal penalty AI **may reuse the same words more frequently**.
Repeat Last N
Limits how often AI **repeats recently used phrases** by remembering the last N words.
Higher N: AI **remembers more**, avoiding word-for-word repeats.
Tfs Z (Tail-Free Sampling)
Ensures that uncommon words **don’t sneak in too frequently** while still keeping responses **diverse**.
Higher Tfs Z: AI allows **more rare words**, which can add depth or make responses sound odd.
Memory and Processing
These settings control how much **conversation history the AI remembers** and how it processes text. They impact the **quality, continuity, and efficiency** of AI-generated responses.
Context Length
Determines how many **previous words or messages** the AI remembers when generating responses. A longer context helps maintain continuity in long conversations.
Long Context (e.g., 4,000+ tokens): AI **remembers more conversation history**, improving coherence.
Batch Size
Controls **how many words** the AI processes **at once** before generating a response.
Large Batch Size (e.g., 16+): AI **processes more words at once**, improving **speed but consuming more memory**.
Tokens To Keep On Context Refresh
Defines how many **tokens (words or characters)** are preserved when the AI **refreshes its memory**. Affects how well the AI maintains **long-term consistency** in responses.
High (e.g., 2,000+ tokens): AI **remembers key parts of past conversations**, improving continuity.
Max Tokens
Sets a **hard limit** on how many **words the AI can generate** in a single response.
High Limit (e.g., 500+ tokens): AI provides **longer, more detailed answers**.
Computer Performance Settings
These settings determine **how efficiently the AI model runs** on your machine. They help optimize memory usage, **CPU and GPU performance**, and overall system stability.
use_mmap
Enables **memory-mapped file access**, allowing AI models to use disk storage instead of **RAM** for large computations. This helps systems with **low RAM capacity** run large models more efficiently.
Disabled: Keeps everything in RAM, which may cause **out-of-memory crashes**.
use_mlock
Keeps the AI model **locked in RAM**, preventing the operating system from **swapping** it to disk. This can **improve response speed** but uses more memory.
Disabled: The system may **swap data to disk**, slowing down performance.
num_thread
Sets how many **CPU threads** are used to process AI computations. A higher number improves **speed**, but excessive use may slow down **other tasks** on your computer.
High (e.g., max available threads): **Faster AI** but may cause **lag** on other applications.
num_gpu
Defines how many **GPU cores** are used for AI acceleration. More GPU cores speed up processing, but **only works if the model supports GPU inference**.
1+ (GPU-enabled): Runs AI on **graphics card**, significantly increasing speed.
Quick Tips for AI Optimization
- Lower **Temperature** (0.2-0.3) Reduces randomness.
- Higher **Reasoning Effort** AI thinks deeper before answering.
- Moderate **Context Length** Retains past details without memory overload.
- Higher **Temperature** (0.7-1.0) Allows for creative variations.
- Higher **Top-P** (0.9) Ensures diverse vocabulary usage.
- Longer **Context Length** Helps AI maintain storytelling consistency.
- Enable **Mirostat** Keeps randomness under control.
- Use **Stop Sequences** Prevents excessive text generation.
- Lower **Frequency Penalty** Allows the AI to reuse phrases naturally.
- Enable **use_mmap** Saves RAM by streaming models from disk.
- Enable **use_mlock** Keeps models in RAM for quick access.
- Adjust **num_thread & num_gpu** based on your hardware.