1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
|
# Maildir MCP Server Design
## Overview
The Maildir MCP Server provides secure access to email archives stored in maildir format. It enables AI assistants to search, analyze, and extract insights from email data while maintaining strict privacy controls and security boundaries.
## Problem Statement
Email archives contain valuable personal and professional context that could enhance AI assistant capabilities:
- **Communication Patterns**: Understanding relationships and interaction frequency
- **Context Retrieval**: Finding relevant past conversations for current tasks
- **Contact Management**: Extracting and organizing contact information
- **Content Analysis**: Analyzing communication styles, topics, and sentiment
- **Timeline Reconstruction**: Understanding project history through email threads
However, email data is highly sensitive and requires:
- Strong privacy controls and access restrictions
- Efficient parsing of various email formats (MIME, HTML, plain text)
- Respect for email threading and conversation structure
- Metadata preservation while content filtering
## Architecture
### Maildir Format Support
The server will support the standard maildir format:
```
/path/to/maildir/
├── INBOX/
│ ├── cur/ # Read messages
│ ├── new/ # Unread messages
│ └── tmp/ # Temporary files
├── Sent/
├── Drafts/
├── Trash/
└── [Custom Folders]/
```
**Filename Format**: `{timestamp}.{process_id}_{delivery_id}.{hostname},{unique_id}:2,{flags}`
- Flags: `S` (Seen), `R` (Replied), `F` (Flagged), `T` (Trashed), `D` (Draft)
### Core Components
#### 1. Maildir Scanner
- Recursively scan maildir directory structure
- Index folder hierarchy and message counts
- Track maildir state and changes
- Support for both Maildir and Maildir++ formats
#### 2. Email Parser
- Parse RFC 2822 email messages
- Extract headers (From, To, Subject, Date, Message-ID, etc.)
- Handle MIME multipart messages
- Extract plain text and HTML content
- Preserve thread relationships (In-Reply-To, References)
- Support various character encodings
#### 3. Content Processor
- Convert HTML to markdown for AI consumption
- Extract and clean plain text content
- Parse email signatures and quotes
- Identify forwarded messages and replies
- Extract attachments metadata (without content for security)
#### 4. Search Engine
- Full-text search across message content
- Metadata filtering (date ranges, senders, folders)
- Thread-aware search results
- Fuzzy matching for contact names and subjects
- Boolean search operators
#### 5. Privacy Filter
- Configurable PII detection and masking
- Exclude sensitive folders (e.g., banking, legal)
- Content sanitization options
- Whitelist/blacklist for contact domains
## MCP Tools
### 1. `maildir_scan_folders`
**Description**: Scan and list available maildir folders with message counts.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {
"type": "string",
"description": "Path to the maildir root directory"
},
"include_counts": {
"type": "boolean",
"default": true,
"description": "Include message counts for each folder"
}
},
"required": ["maildir_path"]
}
```
**Output**: List of folders with metadata (path, message count, unread count)
### 2. `maildir_list_messages`
**Description**: List messages in a folder with pagination and filtering.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"folder": {"type": "string", "default": "INBOX"},
"limit": {"type": "integer", "default": 50, "maximum": 200},
"offset": {"type": "integer", "default": 0},
"date_from": {"type": "string", "format": "date"},
"date_to": {"type": "string", "format": "date"},
"sender": {"type": "string"},
"subject_contains": {"type": "string"},
"unread_only": {"type": "boolean", "default": false}
},
"required": ["maildir_path"]
}
```
**Output**: Paginated list of message headers and metadata
### 3. `maildir_read_message`
**Description**: Read full message content with optional content filtering.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"message_id": {"type": "string"},
"include_html": {"type": "boolean", "default": false},
"include_headers": {"type": "boolean", "default": true},
"sanitize_content": {"type": "boolean", "default": true}
},
"required": ["maildir_path", "message_id"]
}
```
**Output**: Full message with headers, content, and metadata
### 4. `maildir_search_messages`
**Description**: Full-text search across email content with advanced filtering.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"query": {"type": "string"},
"folders": {"type": "array", "items": {"type": "string"}},
"date_from": {"type": "string", "format": "date"},
"date_to": {"type": "string", "format": "date"},
"senders": {"type": "array", "items": {"type": "string"}},
"limit": {"type": "integer", "default": 50, "maximum": 200},
"sort_by": {"type": "string", "enum": ["date", "relevance"], "default": "relevance"}
},
"required": ["maildir_path", "query"]
}
```
**Output**: Ranked search results with snippets and relevance scores
### 5. `maildir_get_thread`
**Description**: Retrieve complete email thread/conversation.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"message_id": {"type": "string"},
"max_depth": {"type": "integer", "default": 50}
},
"required": ["maildir_path", "message_id"]
}
```
**Output**: Thread structure with all related messages in chronological order
### 6. `maildir_analyze_contacts`
**Description**: Extract and analyze contact information and communication patterns.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"date_from": {"type": "string", "format": "date"},
"date_to": {"type": "string", "format": "date"},
"min_messages": {"type": "integer", "default": 2},
"include_frequency": {"type": "boolean", "default": true}
},
"required": ["maildir_path"]
}
```
**Output**: Contact list with email frequency, last contact date, and relationship strength
### 7. `maildir_get_statistics`
**Description**: Generate email usage statistics and insights.
**Input Schema**:
```json
{
"type": "object",
"properties": {
"maildir_path": {"type": "string"},
"period": {"type": "string", "enum": ["week", "month", "year"], "default": "month"},
"include_charts": {"type": "boolean", "default": false}
},
"required": ["maildir_path"]
}
```
**Output**: Statistics on email volume, top contacts, response times, etc.
## Security & Privacy
### Access Control
- Restrict access to explicitly authorized maildir paths
- Validate all path operations to prevent directory traversal
- Support for read-only access mode
- Configurable folder exclusions
### Content Filtering
- Optional PII detection and masking (phone numbers, SSNs, etc.)
- Email address anonymization options
- Subject line sanitization
- Attachment content exclusion (metadata only)
### Configuration
- User-defined sensitivity levels
- Whitelist/blacklist for contact domains
- Excluded folder patterns
- Content filtering rules
### Example Security Config
```json
{
"allowed_paths": ["/home/user/.local/share/mail"],
"excluded_folders": ["Banking", "Legal", "Medical"],
"pii_masking": true,
"contact_anonymization": false,
"max_content_length": 10000,
"excluded_extensions": [".exe", ".zip", ".pdf"]
}
```
## Implementation Details
### File System Operations
- Efficient directory traversal with caching
- Watch for maildir changes (new messages)
- Handle corrupted or malformed email files gracefully
- Support for compressed maildir archives
### Email Parsing
- Use Go's built-in `net/mail` package for basic parsing
- Additional MIME parsing for multipart messages
- Handle various character encodings (UTF-8, Latin-1, etc.)
- Extract metadata while preserving original structure
### Search Implementation
- In-memory inverted index for fast text search
- Bloom filters for efficient negative lookups
- Fuzzy string matching for contact names
- Regular expression support for advanced queries
### Threading Algorithm
- Parse References and In-Reply-To headers
- Subject line normalization (Re:, Fwd: removal)
- Handle broken threading gracefully
- Support for multiple threading strategies
## Performance Considerations
### Caching Strategy
- Cache folder structure and message counts
- Index commonly accessed messages
- Lazy loading of message content
- TTL-based cache invalidation
### Memory Management
- Stream large messages to avoid memory issues
- Pagination for large result sets
- Configurable limits on search result size
- Efficient string operations for content processing
### Scalability
- Support for maildir archives with millions of messages
- Incremental indexing for new messages
- Background processing for expensive operations
- Rate limiting for resource-intensive queries
## Error Handling
### Graceful Degradation
- Continue processing despite corrupted messages
- Handle permission errors gracefully
- Provide meaningful error messages for invalid queries
- Fallback options for unsupported email formats
### Logging & Monitoring
- Structured logging for all operations
- Performance metrics collection
- Error rate tracking
- Privacy-safe audit logging
## Testing Strategy
### Unit Tests
- Email parsing with various MIME types
- Maildir scanning with different folder structures
- Search functionality with edge cases
- Security validation for path traversal attempts
### Integration Tests
- Real maildir processing with sample data
- Performance testing with large archives
- Security testing with malicious inputs
- Cross-platform compatibility testing
### Test Data
- Synthetic email corpus for testing
- Various maildir layouts and formats
- Corrupted email samples
- Edge cases (empty folders, special characters)
## Future Enhancements
### Advanced Features
- Email sentiment analysis
- Automatic categorization and tagging
- Smart contact grouping
- Email scheduling analysis
- Conversation summarization
### Integration Options
- Export to various formats (JSON, CSV, mbox)
- Integration with external search engines
- Contact synchronization with address books
- Calendar event extraction from emails
### Machine Learning
- Spam/ham classification
- Important message detection
- Automatic reply suggestions
- Writing style analysis
## Compliance & Legal
### Data Protection
- GDPR compliance for EU users
- Data retention policies
- Right to be forgotten implementation
- Consent management for contact analysis
### Export/Import
- Standard mailbox format support (mbox, EML)
- Backup and restore functionality
- Cross-platform migration tools
- Format conversion utilities
This design provides a comprehensive, secure, and privacy-conscious approach to email analysis while maintaining the flexibility needed for AI assistant integration.
|