Codex use local openai api
To point the Codex CLI to your local llama-server, you primarily need to override the environment variables that the tool uses to locate the OpenAI API.
Since the Codex CLI (and most OpenAI-compatible tools) follows standard OpenAI SDK conventions, you can redirect its traffic by pointing OPENAI_BASE_URL to your local endpoint.
1. Set the Environment Variables
You need to set these in your shell configuration (e.g., ~/.zshrc, ~/.bashrc) or for the specific session where you run the command.
OPENAI_BASE_URL: Set this to yourllama-serveraddress, ensuring you include the/v1path.OPENAI_API_KEY: Even if yourllama-serverdoesn't require authentication, the CLI usually expects this variable to exist. You can set it to any dummy string (e.g.,sk-no-key).
Example for your shell configuration:
export OPENAI_BASE_URL="http://localhost:8080/v1" export OPENAI_API_KEY="sk-not-needed"
After saving this, run source ~/.zshrc (or your relevant shell config file) to apply the changes.
2. Configure via config.toml (Optional/Permanent)
If you prefer a more permanent configuration—or if you need to use a specific model name that llama-server might not report by default—you can define a provider profile in the Codex CLI configuration file, typically located at ~/.codex/config.toml.
Add a custom provider section like this:
# In ~/.codex/config.toml [model_providers.local] name = "Llama Server" base_url = "http://localhost:8080/v1" env_key = "OPENAI_API_KEY"
3. Verify the Connection
Once the environment variables are set, test the connection by running a simple command:
codex "What is the capital of Utah?"
Important Tips for Local Models
- Model Name: If the CLI complains that the "model is not found," you may need to explicitly specify the model name that
llama-serveris reporting. You can check what models are available by hitting the/v1/modelsendpoint directly:curl http://localhost:8080/v1/models - Compatibility: Some "agentic" features of high-end CLI tools rely on OpenAI-specific function calling or vision capabilities. Depending on the model you are running in
llama.cpp(e.g., a standard Llama 3 vs. a specialized coding model), some advanced agentic behaviors might have varying success compared to using a proprietary model likegpt-4o. - SSL/HTTPS: Ensure your
llama-serveris running onhttp(or that you handle any certificate issues if you've enabled HTTPS), as CLI tools often fail with self-signed local certificates.