README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133

# Claude-Compatible API Proxy for Ollama

This project provides an Anthropic Claude-compatible API server built in Go, designed to handle requests from the official Claude CLI (e.g., `claude -p`, `claude --repl`) and route them to local Ollama LLMs such as `codellama:34b-instruct`, `deepseek-coder`, `starcoder2`, and more.

---

## ✅ Features

- 🧠 Claude-style `/v1/messages` endpoint with SSE support
- 🔀 Automatic routing to local Ollama models via `ollama run`
- 🧵 Streaming responses over Server-Sent Events (SSE)
- ⚙️ Configurable base model per request (`model` param)
- 🐳 Compatible with `/etc/hosts` override for `api.anthropic.com`

---

## 🔧 Requirements

- [Go](https://golang.org/dl/) 1.21+
- [Ollama](https://ollama.com/) installed and running locally
- 16–64 GB RAM (or NVIDIA GPU if running large models)

---

## 🚀 Getting Started

### 1. Clone and build

```bash
git clone https://github.com/xlgmokha/claude-proxy
cd claude-proxy
go build -o claude-proxy
```

---

### 2. Update `/etc/hosts`

Override Claude's default API domain to point to your local server:

```bash
sudo vim /etc/hosts
```

Add:

```
127.0.0.1 api.anthropic.com
```

---

### 3. Run the server

```bash
./claude-proxy
```

The server listens on `http://127.0.0.1:8080` and emulates the Claude Messages API.

---

### 4. Use the Claude CLI

Point the Claude CLI to your local proxy (no code modification needed):

```bash
claude --repl --model codellama:34b-instruct
```

> 🔁 You can also use prompts directly:
> ```bash
> claude -p "Refactor this Go function for better error handling."
> ```

---

## 🧠 Recommended Models (via Ollama)

| Model Name                  | Purpose                      | Notes                              |
|----------------------------|------------------------------|-------------------------------------|
| `codellama:34b-instruct`   | Best for code generation     | Large — may require cloud GPU      |
| `deepseek-coder:33b`       | General-purpose coding       | Strong on reasoning                |
| `starcoder2:15b`           | Fast, lightweight alternative| Works well locally with 16GB+ RAM  |
| `wizardcoder-python:34b`   | Python-focused dev work      | Excellent for backend/API tasks    |

---

## ⚙️ Environment Variables (Optional)

| Variable           | Description                         | Default       |
|--------------------|-------------------------------------|---------------|
| `PORT`             | Port to run the proxy server on     | `8080`        |
| `OLLAMA_MODEL`     | Default fallback model name         | `codellama`   |
| `OLLAMA_HOST`      | Base URL of the Ollama server       | `http://localhost:11434` |

---

## 📦 Example Request Payload (Claude Messages API)

```json
POST /v1/messages
{
  "model": "codellama:34b-instruct",
  "messages": [
    { "role": "user", "content": "Write a Go function to reverse a string." }
  ],
  "stream": true
}
```

---

## 🛠️ To Do

- [ ] Add `/v1/complete` legacy support
- [ ] Add support for tools/functions
- [ ] Add model caching and parallel queueing
- [ ] Auth/token gate the proxy

---

## 🙏 Credits

- Inspired by the Anthropic Claude API
- Powered by [Ollama](https://ollama.com) and [Go](https://golang.org)
- CodeLLama, DeepSeek, Starcoder — thanks to open-source model developers

---

## 📜 License

MIT License