Codex use local openai api

From UVOO Tech Wiki
Jump to navigation Jump to search

To point the Codex CLI to your local llama-server, you primarily need to override the environment variables that the tool uses to locate the OpenAI API.

Since the Codex CLI (and most OpenAI-compatible tools) follows standard OpenAI SDK conventions, you can redirect its traffic by pointing OPENAI_BASE_URL to your local endpoint.

1. Set the Environment Variables

You need to set these in your shell configuration (e.g., ~/.zshrc, ~/.bashrc) or for the specific session where you run the command.

  • OPENAI_BASE_URL: Set this to your llama-server address, ensuring you include the /v1 path.
  • OPENAI_API_KEY: Even if your llama-server doesn't require authentication, the CLI usually expects this variable to exist. You can set it to any dummy string (e.g., sk-no-key).

Example for your shell configuration:

export OPENAI_BASE_URL="http://localhost:8080/v1"
export OPENAI_API_KEY="sk-not-needed"

After saving this, run source ~/.zshrc (or your relevant shell config file) to apply the changes.


2. Configure via config.toml (Optional/Permanent)

If you prefer a more permanent configuration—or if you need to use a specific model name that llama-server might not report by default—you can define a provider profile in the Codex CLI configuration file, typically located at ~/.codex/config.toml.

Add a custom provider section like this:

# In ~/.codex/config.toml

[model_providers.local]
name = "Llama Server"
base_url = "http://localhost:8080/v1"
env_key = "OPENAI_API_KEY"


3. Verify the Connection

Once the environment variables are set, test the connection by running a simple command:

codex "What is the capital of Utah?"

Important Tips for Local Models

  • Model Name: If the CLI complains that the "model is not found," you may need to explicitly specify the model name that llama-server is reporting. You can check what models are available by hitting the /v1/models endpoint directly: curl http://localhost:8080/v1/models
  • Compatibility: Some "agentic" features of high-end CLI tools rely on OpenAI-specific function calling or vision capabilities. Depending on the model you are running in llama.cpp (e.g., a standard Llama 3 vs. a specialized coding model), some advanced agentic behaviors might have varying success compared to using a proprietary model like gpt-4o.
  • SSL/HTTPS: Ensure your llama-server is running on http (or that you handle any certificate issues if you've enabled HTTPS), as CLI tools often fail with self-signed local certificates.