<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://tech.uvoo.io/index.php?action=history&amp;feed=atom&amp;title=Llama_1</id>
	<title>Llama 1 - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://tech.uvoo.io/index.php?action=history&amp;feed=atom&amp;title=Llama_1"/>
	<link rel="alternate" type="text/html" href="https://tech.uvoo.io/index.php?title=Llama_1&amp;action=history"/>
	<updated>2026-05-14T18:16:55Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.35.2</generator>
	<entry>
		<id>https://tech.uvoo.io/index.php?title=Llama_1&amp;diff=5689&amp;oldid=prev</id>
		<title>Busk: Created page with &quot;`model.gguf` was just a placeholder. You need to download an actual `.gguf` file, such as:  ```text gpt-oss-20b-mxfp4.gguf ```  For GPT-OSS 20B, the Hugging Face repo is:  ```...&quot;</title>
		<link rel="alternate" type="text/html" href="https://tech.uvoo.io/index.php?title=Llama_1&amp;diff=5689&amp;oldid=prev"/>
		<updated>2026-05-13T16:32:10Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;`model.gguf` was just a placeholder. You need to download an actual `.gguf` file, such as:  ```text gpt-oss-20b-mxfp4.gguf ```  For GPT-OSS 20B, the Hugging Face repo is:  ```...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;`model.gguf` was just a placeholder. You need to download an actual `.gguf` file, such as:&lt;br /&gt;
&lt;br /&gt;
```text&lt;br /&gt;
gpt-oss-20b-mxfp4.gguf&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
For GPT-OSS 20B, the Hugging Face repo is:&lt;br /&gt;
&lt;br /&gt;
```text&lt;br /&gt;
ggml-org/gpt-oss-20b-GGUF&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
The model page shows `gpt-oss-20b-mxfp4.gguf` as the file used by `llama.cpp` / `llama-cpp-python`, and it also supports direct `llama-server -hf ggml-org/gpt-oss-20b-GGUF`. ([Hugging Face][1])&lt;br /&gt;
&lt;br /&gt;
## Easiest: let `llama.cpp` download it&lt;br /&gt;
&lt;br /&gt;
From your `llama.cpp` directory:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
cd ~/llama.cpp&lt;br /&gt;
&lt;br /&gt;
./build/bin/llama-server \&lt;br /&gt;
  -hf ggml-org/gpt-oss-20b-GGUF \&lt;br /&gt;
  -ngl 999 \&lt;br /&gt;
  -c 4096 \&lt;br /&gt;
  --host 0.0.0.0 \&lt;br /&gt;
  --port 8080&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
The official quick start for this repo is basically:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
llama-server -hf ggml-org/gpt-oss-20b-GGUF -c 0 --jinja&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
([Hugging Face][2])&lt;br /&gt;
&lt;br /&gt;
For your P40, I would start with:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
-c 4096&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
rather than unlimited/context auto mode.&lt;br /&gt;
&lt;br /&gt;
## Manual download to a real file&lt;br /&gt;
&lt;br /&gt;
Install the Hugging Face CLI:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
sudo apt install -y python3-pip&lt;br /&gt;
python3 -m pip install --user -U huggingface_hub&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Make sure your user-local Python bin path is active:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
export PATH=&amp;quot;$HOME/.local/bin:$PATH&amp;quot;&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Create a model directory:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
mkdir -p ~/models/gpt-oss-20b&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Download the GGUF file:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
huggingface-cli download ggml-org/gpt-oss-20b-GGUF \&lt;br /&gt;
  gpt-oss-20b-mxfp4.gguf \&lt;br /&gt;
  --local-dir ~/models/gpt-oss-20b \&lt;br /&gt;
  --local-dir-use-symlinks False&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
Then run it:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
cd ~/llama.cpp&lt;br /&gt;
&lt;br /&gt;
./build/bin/llama-server \&lt;br /&gt;
  -m ~/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf \&lt;br /&gt;
  -ngl 999 \&lt;br /&gt;
  -c 4096 \&lt;br /&gt;
  --host 0.0.0.0 \&lt;br /&gt;
  --port 8080&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
## Test it&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
curl http://127.0.0.1:8080/v1/chat/completions \&lt;br /&gt;
  -H 'Content-Type: application/json' \&lt;br /&gt;
  -d '{&lt;br /&gt;
    &amp;quot;model&amp;quot;: &amp;quot;gpt-oss-20b&amp;quot;,&lt;br /&gt;
    &amp;quot;messages&amp;quot;: [&lt;br /&gt;
      {&lt;br /&gt;
        &amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;,&lt;br /&gt;
        &amp;quot;content&amp;quot;: &amp;quot;Write a minimal Go HTTP health check server.&amp;quot;&lt;br /&gt;
      }&lt;br /&gt;
    ],&lt;br /&gt;
    &amp;quot;temperature&amp;quot;: 0.2&lt;br /&gt;
  }'&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
## For Qwen later&lt;br /&gt;
&lt;br /&gt;
Same idea, but choose a Qwen GGUF repo instead. For example, Qwen’s docs show running Qwen models through `llama.cpp` using GGUF files. ([Hugging Face][3])&lt;br /&gt;
&lt;br /&gt;
For now, get GPT-OSS 20B working first with:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
-hf ggml-org/gpt-oss-20b-GGUF&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
or with the downloaded file:&lt;br /&gt;
&lt;br /&gt;
```bash&lt;br /&gt;
-m ~/models/gpt-oss-20b/gpt-oss-20b-mxfp4.gguf&lt;br /&gt;
```&lt;br /&gt;
&lt;br /&gt;
[1]: https://huggingface.co/ggml-org/gpt-oss-20b-GGUF?utm_source=chatgpt.com &amp;quot;ggml-org/gpt-oss-20b-GGUF&amp;quot;&lt;br /&gt;
[2]: https://huggingface.co/ggml-org/gpt-oss-20b-GGUF/resolve/main/README.md?download=true&amp;amp;utm_source=chatgpt.com &amp;quot;285 Bytes&amp;quot;&lt;br /&gt;
[3]: https://huggingface.co/docs/inference-endpoints/engines/llama_cpp?utm_source=chatgpt.com &amp;quot;llama.cpp&amp;quot;&lt;/div&gt;</summary>
		<author><name>Busk</name></author>
	</entry>
</feed>