# Claude-Compatible API Proxy for Ollama

This project provides an Anthropic Claude-compatible API server built in Go, designed to handle requests from the official Claude CLI (e.g., `claude -p`, `claude --repl`) and route them to local Ollama LLMs such as `codellama:34b-instruct`, `deepseek-coder`, `starcoder2`, and more.

---

## ✅ Features

- 🧠 Claude-style `/v1/messages` endpoint with SSE support
- 🔀 Automatic routing to local Ollama models via `ollama run`
- 🧵 Streaming responses over Server-Sent Events (SSE)
- ⚙️ Configurable base model per request (`model` param)
- 🐳 Compatible with `/etc/hosts` override for `api.anthropic.com`

---

## 🔧 Requirements

- [Go](https://golang.org/dl/) 1.21+
- [Ollama](https://ollama.com/) installed and running locally
- 16–64 GB RAM (or NVIDIA GPU if running large models)

---

## 🚀 Getting Started

### 1. Clone and build

```bash
git clone https://github.com/xlgmokha/claude-proxy
cd claude-proxy
go build -o claude-proxy
```

---

### 2. Update `/etc/hosts`

Override Claude's default API domain to point to your local server:

```bash
sudo vim /etc/hosts
```

Add:

```
127.0.0.1 api.anthropic.com
```

---

### 3. Run the server

```bash
./claude-proxy
```

The server listens on `http://127.0.0.1:8080` and emulates the Claude Messages API.

---

### 4. Use the Claude CLI

Point the Claude CLI to your local proxy (no code modification needed):

```bash
claude --repl --model codellama:34b-instruct
```

> 🔁 You can also use prompts directly:
> ```bash
> claude -p "Refactor this Go function for better error handling."
> ```

---

## 🧠 Recommended Models (via Ollama)

| Model Name                  | Purpose                      | Notes                              |
|----------------------------|------------------------------|-------------------------------------|
| `codellama:34b-instruct`   | Best for code generation     | Large — may require cloud GPU      |
| `deepseek-coder:33b`       | General-purpose coding       | Strong on reasoning                |
| `starcoder2:15b`           | Fast, lightweight alternative| Works well locally with 16GB+ RAM  |
| `wizardcoder-python:34b`   | Python-focused dev work      | Excellent for backend/API tasks    |

---

## ⚙️ Environment Variables (Optional)

| Variable           | Description                         | Default       |
|--------------------|-------------------------------------|---------------|
| `PORT`             | Port to run the proxy server on     | `8080`        |
| `OLLAMA_MODEL`     | Default fallback model name         | `codellama`   |
| `OLLAMA_HOST`      | Base URL of the Ollama server       | `http://localhost:11434` |

---

## 📦 Example Request Payload (Claude Messages API)

```json
POST /v1/messages
{
  "model": "codellama:34b-instruct",
  "messages": [
    { "role": "user", "content": "Write a Go function to reverse a string." }
  ],
  "stream": true
}
```

---

## 🛠️ To Do

- [ ] Add `/v1/complete` legacy support
- [ ] Add support for tools/functions
- [ ] Add model caching and parallel queueing
- [ ] Auth/token gate the proxy

---

## 🙏 Credits

- Inspired by the Anthropic Claude API
- Powered by [Ollama](https://ollama.com) and [Go](https://golang.org)
- CodeLLama, DeepSeek, Starcoder — thanks to open-source model developers

---

## 📜 License

MIT License