Llm

LLM Modules

Overview

The LLM layer is entirely defined by LLMInterface (src/interfaces/base_interfaces.py). The brain calls llm.chat() and never knows which concrete provider is behind it. Providers are instantiated in main.py based on the --llm-provider flag or config.json.

TERMINAL

src/modules/llm/
├── gemini_llm.py      Google Gemini (google-genai SDK)
├── openai_llm.py      OpenAI / any OpenAI-compatible API
├── groq_llm.py        Groq (OpenAI-compatible, fast inference)
└── glm_llm.py         GLM-4.7 by Z.AI (OpenAI-compatible)

Interface Contract

TERMINAL

class LLMInterface(ABC):
    def chat(user_input: str, system_prompt: Optional[str] = None, history: list = None) -> (mood, message, metadata)
    def chat_audio(audio_path: str, system_prompt: Optional[str] = None, history: list = None) -> (mood, message, metadata)
    def generate_json(user_input: str, system_prompt: Optional[str] = None, history: list = None) -> dict
    def reload_config(config: BrainConfig) -> None

Every LLM must return a structured tuple:

mood (str) — one of the defined mood IDs (normal, angry, bored, cry, ew, love, shock)
message (str) — the spoken text response
metadata (dict) — any extra fields from the JSON response

Expected LLM Response Format

The system prompt instructs the AI to always reply in JSON:

TERMINAL

{
  "mood": "normal",
  "message": "The spoken response text."
}

All providers use src/utils/llm_utils.parse_llm_json() to robustly extract this JSON from the raw response, handling:

Fenced code blocks (```json ... ```)
Raw JSON strings
Nested braces (finds first balanced { } block)

If parsing fails, mood defaults to "normal" and the raw string is used as the message.

Providers

Gemini (`gemini_llm.py`)

SDK: google-genai
Config key: gemini_model (default: gemini-3-flash-preview)
Env var: GEMINI_API_KEY

Supports multimodal input — can accept both text and audio inline (base64 bytes). This is the only provider that natively supports chat_audio() without requiring a separate STT step.

TERMINAL

llm = GeminiLLM(api_key="...", model_name="gemini-3-flash-preview")
mood, message, meta = llm.chat("Hello!", system_prompt="...", history=[...])

If the API key changed, the client is re-initialized.
If the model name changed, it is updated in place.

No restart needed after a config update.

Adding a New LLM Provider

Create src/modules/llm/my_llm.py and extend LLMInterface.
Implement chat(), chat_audio(), generate_json(), reload_config().
In main.py, add a branch to the provider selection block:

TERMINAL

elif config.llm_provider == "my_provider":
    from src.modules.llm.my_llm import MyLLM
    llm = MyLLM(api_key=config.my_key, model_name=config.my_model)

Add "my_provider" to the --llm-provider choices in parse_args().

Llm

LLM Modules

Overview

Interface Contract

Expected LLM Response Format

Providers

Gemini (`gemini_llm.py`)

OpenAI (`openai_llm.py`)

Groq (`groq_llm.py`)

GLM-4.7 (`glm_llm.py`)

Hot Reload

Adding a New LLM Provider