Model Issues
You may encounter various issues when configuring and using LLM (Large Language Model) providers. This page covers troubleshooting methods for authentication, connection, rate limiting, and other common failures.
Authentication Failed
Section titled “Authentication Failed”Symptom: 401 Unauthorized or Authentication failed error.
Cause: API Key invalid, expired, or not configured correctly.
Solutions:
# 1. Re-run interactive configuration wizardhermes model
# 2. Manually check API Key (be careful not to leak it)cat ~/.hermes/config.yaml | grep api_key
# 3. Verify API Key format# OpenRouter: sk-or-v1-xxxxx# Anthropic: sk-ant-xxxxx# OpenAI: sk-xxxxx# DeepSeek: sk-xxxxx# Kimi: sk-xxxxxIf using OpenRouter, confirm you’re using an OpenRouter Key, not the original Anthropic or OpenAI Key:
llm: provider: openrouter model: anthropic/claude-sonnet-4-20250514 api_key: sk-or-v1-... # Must be OpenRouter KeyGet OpenRouter API Key: openrouter.ai/keys
Connection Timeout
Section titled “Connection Timeout”Symptom: Connection timeout, Network unreachable, or error after long unresponsive period.
Cause: Network unreachable, proxy not set, or provider service outage.
Solutions:
# 1. Test network connectivitycurl -I https://openrouter.ai/api/v1/modelscurl -I https://api.anthropic.comcurl -I https://api.openai.com
# 2. If proxy neededexport HTTPS_PROXY=http://127.0.0.1:7890export HTTP_PROXY=http://127.0.0.1:7890
# 3. Windows PowerShell proxy settings# $env:HTTPS_PROXY="http://127.0.0.1:7890"
# 4. Switch to local-accessible providerhermes model # Choose DeepSeek / Kimi / SiliconFlow / QwenRate Limiting
Section titled “Rate Limiting”Symptom: 429 Too Many Requests or Rate limit exceeded error.
Cause: Too many requests sent in short time, or free tier quota exhausted.
Solutions:
- Wait and retry — Usually can continue after 60 seconds
- Upgrade plan — Upgrade to higher limit plan in provider console
- Configure auto-retry and fallback models:
llm: provider: openrouter model: anthropic/claude-sonnet-4-20250514 retry: max_retries: 3 retry_delay: 5 fallback_models: - google/gemini-2.5-pro - deepseek/deepseek-chat- Use OpenRouter — It aggregates 100+ models, automatically routing to alternatives when one hits limits
Model Unavailable
Section titled “Model Unavailable”Symptom: Model not found or selected model is offline.
Cause: Model identifier misspelled, provider discontinued the model, or using non-existent model name.
Solutions:
# 1. View available model listhermes model # Interactive selection, only shows available models
# 2. Query using OpenRouter API (requires API Key)curl https://openrouter.ai/api/v1/models \ -H "Authorization: Bearer $OPENROUTER_API_KEY"
# 3. Note model ID format# Models on OpenRouter need provider prefix:# ✅ anthropic/claude-sonnet-4-20250514# ✅ google/gemini-2.5-pro# ❌ claude-sonnet-4-20250514 (missing prefix)Common model identifiers:
| Provider | Model ID Example |
|---|---|
| Anthropic | anthropic/claude-sonnet-4-20250514 |
| OpenAI | openai/gpt-4o |
google/gemini-2.5-pro | |
| DeepSeek | deepseek/deepseek-chat |
| Meta | meta-llama/llama-3.3-70b-instruct |
| Mistral | mistral/mistral-large-latest |
| Kimi | moonshot/moonshot-v1 |
| Qwen | qwen/qwen-3-235b-a22b |
Poor Output Quality or Hallucinations
Section titled “Poor Output Quality or Hallucinations”Symptom: Agent answers inaccurately, makes up information, or has confused logic.
Cause: Selected weak model, context too long causing information loss, or unclear prompts.
Solutions:
- Upgrade model — Use stronger models (like Claude Sonnet 4, GPT-4o)
- Optimize prompts — Refer to Prompt Guide
- Control context length — Regularly use
/clearto clear conversation history - Use system prompt guidance — Add role description in configuration:
agent: system_prompt: | You are a precise, rigorous assistant. Confirm facts before answering, clearly state when uncertain. Respond in English.Local Model (Ollama / vLLM) Issues
Section titled “Local Model (Ollama / vLLM) Issues”Symptom: Connection to local model fails.
Cause: Local service not started, incorrect port, or model not downloaded.
Solutions:
# Ollamaollama serve # Start Ollama serviceollama pull llama3.3 # Download modelcurl http://localhost:11434/api/tags # Verify service running
# vLLMpython -m vllm.entrypoints.openai.api_server \ --model meta-llama/Llama-3.3-70B-Instruct \ --port 8000
# Configure Hermes to connect to local modelhermes model # Select Ollama or vLLM# Or manually configurellm: provider: ollama model: llama3.3 base_url: http://localhost:11434Token Limit Exceeded
Section titled “Token Limit Exceeded”Symptom: context_length_exceeded or maximum context length error.
Cause: Conversation history too long, exceeding model’s context window.
Solutions:
# 1. Clear current conversation/clear
# 2. Start new sessionhermes chat --new
# 3. Set auto-summary compression in configagent: auto_summarize: true summarize_threshold: 80000 # Token thresholdRelated Pages
Section titled “Related Pages”- Common Errors — Common runtime errors
- Cost Optimization — Strategies to reduce token consumption
- Prompt Guide — Write better prompts
- hermes doctor — Automatic diagnostic tool