Model Issues

You may encounter various issues when configuring and using LLM (Large Language Model) providers. This page covers troubleshooting methods for authentication, connection, rate limiting, and other common failures.

Authentication Failed

Symptom: 401 Unauthorized or Authentication failed error.

Cause: API Key invalid, expired, or not configured correctly.

Solutions:

# 1. Re-run interactive configuration wizard
hermes model

# 2. Manually check API Key (be careful not to leak it)
cat ~/.hermes/config.yaml | grep api_key

# 3. Verify API Key format
# OpenRouter:  sk-or-v1-xxxxx
# Anthropic:   sk-ant-xxxxx
# OpenAI:      sk-xxxxx
# DeepSeek:    sk-xxxxx
# Kimi:        sk-xxxxx

If using OpenRouter, confirm you’re using an OpenRouter Key, not the original Anthropic or OpenAI Key:

llm:
  provider: openrouter
  model: anthropic/claude-sonnet-4-20250514
  api_key: sk-or-v1-...   # Must be OpenRouter Key

Get OpenRouter API Key: openrouter.ai/keys

Connection Timeout

Symptom: Connection timeout, Network unreachable, or error after long unresponsive period.

Cause: Network unreachable, proxy not set, or provider service outage.

Solutions:

# 1. Test network connectivity
curl -I https://openrouter.ai/api/v1/models
curl -I https://api.anthropic.com
curl -I https://api.openai.com

# 2. If proxy needed
export HTTPS_PROXY=http://127.0.0.1:7890
export HTTP_PROXY=http://127.0.0.1:7890

# 3. Windows PowerShell proxy settings
# $env:HTTPS_PROXY="http://127.0.0.1:7890"

# 4. Switch to local-accessible provider
hermes model    # Choose DeepSeek / Kimi / SiliconFlow / Qwen

Rate Limiting

Symptom: 429 Too Many Requests or Rate limit exceeded error.

Cause: Too many requests sent in short time, or free tier quota exhausted.

Solutions:

Wait and retry — Usually can continue after 60 seconds
Upgrade plan — Upgrade to higher limit plan in provider console
Configure auto-retry and fallback models:

llm:
  provider: openrouter
  model: anthropic/claude-sonnet-4-20250514
  retry:
    max_retries: 3
    retry_delay: 5
  fallback_models:
    - google/gemini-2.5-pro
    - deepseek/deepseek-chat

Use OpenRouter — It aggregates 100+ models, automatically routing to alternatives when one hits limits

Model Unavailable

Symptom: Model not found or selected model is offline.

Cause: Model identifier misspelled, provider discontinued the model, or using non-existent model name.

Solutions:

# 1. View available model list
hermes model    # Interactive selection, only shows available models

# 2. Query using OpenRouter API (requires API Key)
curl https://openrouter.ai/api/v1/models \
  -H "Authorization: Bearer $OPENROUTER_API_KEY"

# 3. Note model ID format
# Models on OpenRouter need provider prefix:
# ✅ anthropic/claude-sonnet-4-20250514
# ✅ google/gemini-2.5-pro
# ❌ claude-sonnet-4-20250514  (missing prefix)

Common model identifiers:

Provider	Model ID Example
Anthropic	`anthropic/claude-sonnet-4-20250514`
OpenAI	`openai/gpt-4o`
Google	`google/gemini-2.5-pro`
DeepSeek	`deepseek/deepseek-chat`
Meta	`meta-llama/llama-3.3-70b-instruct`
Mistral	`mistral/mistral-large-latest`
Kimi	`moonshot/moonshot-v1`
Qwen	`qwen/qwen-3-235b-a22b`

Poor Output Quality or Hallucinations

Symptom: Agent answers inaccurately, makes up information, or has confused logic.

Cause: Selected weak model, context too long causing information loss, or unclear prompts.

Solutions:

Upgrade model — Use stronger models (like Claude Sonnet 4, GPT-4o)
Optimize prompts — Refer to Prompt Guide
Control context length — Regularly use /clear to clear conversation history
Use system prompt guidance — Add role description in configuration:

agent:
  system_prompt: |
    You are a precise, rigorous assistant.
    Confirm facts before answering, clearly state when uncertain.
    Respond in English.

Local Model (Ollama / vLLM) Issues

Symptom: Connection to local model fails.

Cause: Local service not started, incorrect port, or model not downloaded.

Solutions:

# Ollama
ollama serve                    # Start Ollama service
ollama pull llama3.3           # Download model
curl http://localhost:11434/api/tags  # Verify service running

# vLLM
python -m vllm.entrypoints.openai.api_server \
  --model meta-llama/Llama-3.3-70B-Instruct \
  --port 8000

# Configure Hermes to connect to local model
hermes model    # Select Ollama or vLLM

# Or manually configure
llm:
  provider: ollama
  model: llama3.3
  base_url: http://localhost:11434

Token Limit Exceeded

Symptom: context_length_exceeded or maximum context length error.

Cause: Conversation history too long, exceeding model’s context window.

Solutions:

# 1. Clear current conversation
/clear

# 2. Start new session
hermes chat --new

# 3. Set auto-summary compression in config

agent:
  auto_summarize: true
  summarize_threshold: 80000   # Token threshold

Common Errors — Common runtime errors
Cost Optimization — Strategies to reduce token consumption
Prompt Guide — Write better prompts
hermes doctor — Automatic diagnostic tool

Model Issues

Authentication Failed

Connection Timeout

Rate Limiting

Model Unavailable

Poor Output Quality or Hallucinations

Local Model (Ollama / vLLM) Issues

Token Limit Exceeded

Related Pages