torrent-gateway/TECHNICAL_OVERVIEW.md
enki 76979d055b
Some checks are pending
CI Pipeline / Run Tests (push) Waiting to run
CI Pipeline / Lint Code (push) Waiting to run
CI Pipeline / Security Scan (push) Waiting to run
CI Pipeline / Build Docker Images (push) Blocked by required conditions
CI Pipeline / E2E Tests (push) Blocked by required conditions
Transcoding and Nip71 update
2025-08-21 19:32:26 -07:00

552 lines
19 KiB
Markdown

# BitTorrent Gateway - Technical Overview
This document provides a comprehensive technical overview of the BitTorrent Gateway architecture, implementation details, and system design decisions.
## System Architecture
### High-Level Architecture
The BitTorrent Gateway is built as a unified system with multiple specialized components working together to provide intelligent content distribution:
```
┌─────────────────────────────────────────────────────────────┐
│ BitTorrent Gateway │
├─────────────────────┬─────────────────────┬─────────────────┤
│ Gateway Server │ Blossom Server │ DHT Node │
│ (Port 9877) │ (Port 8082) │ (Port 6883) │
│ │ │ │
│ • HTTP API │ • Blob Storage │ • Peer Discovery│
│ • WebSeed │ • Nostr Protocol │ • DHT Protocol │
│ • Rate Limiting │ • Content Address │ • Bootstrap │
│ • Abuse Prevention │ • LRU Caching │ • Announce │
│ • Video Transcoding │ │ │
└─────────────────────┴─────────────────────┴─────────────────┘
┌────────────┴────────────┐
│ Built-in Tracker │
│ │
│ • Announce/Scrape │
│ • Peer Management │
│ • Client Compatibility │
│ • Statistics Tracking │
└─────────────────────────┘
┌────────────┴────────────┐
│ P2P Coordinator │
│ │
│ • Unified Peer Discovery│
│ • Smart Peer Ranking │
│ • Load Balancing │
│ • Health Monitoring │
└─────────────────────────┘
```
### Core Components
#### 1. Gateway HTTP Server (internal/api/)
**Purpose**: Main API server and WebSeed implementation
**Port**: 9877
**Key Features**:
- RESTful API for file operations
- WebSeed (BEP-19) implementation for BitTorrent clients
- Smart proxy for reassembling chunked content
- Advanced LRU caching system
- Rate limiting and abuse prevention
- Integrated video transcoding engine
**Implementation Details**:
- Built with Gorilla Mux router
- Comprehensive middleware stack (security, rate limiting, CORS)
- WebSeed with concurrent piece loading and caching
- Client-specific optimizations (qBittorrent, Transmission, etc.)
#### 2. Blossom Server (internal/blossom/)
**Purpose**: Content-addressed blob storage
**Port**: 8082
**Key Features**:
- Nostr-compatible blob storage protocol
- SHA-256 content addressing
- Direct storage for files <100MB
- Rate limiting and authentication
**Implementation Details**:
- Implements Blossom protocol specification
- Integration with gateway storage backend
- Efficient blob retrieval and caching
- Nostr event signing and verification
#### 3. DHT Node (internal/dht/)
**Purpose**: Distributed peer discovery
**Port**: 6883 (UDP)
**Key Features**:
- Full Kademlia DHT implementation
- Bootstrap connectivity to major DHT networks
- Automatic torrent announcement
- Peer discovery and sharing
**Implementation Details**:
- Custom DHT implementation with routing table management
- Integration with BitTorrent mainline DHT
- Bootstrap nodes include major public trackers
- Periodic maintenance and peer cleanup
#### 4. Built-in BitTorrent Tracker (internal/tracker/)
**Purpose**: BitTorrent announce/scrape server
**Key Features**:
- Full BitTorrent tracker protocol
- Peer management and statistics
- Client compatibility optimizations
- Abuse detection and prevention
**Implementation Details**:
- Standards-compliant announce/scrape handling
- Support for both compact and dictionary peer formats
- Client detection and protocol adjustments
- Geographic proximity-based peer selection
#### 5. P2P Coordinator (internal/p2p/)
**Purpose**: Unified management of all P2P components
**Key Features**:
- Aggregates peers from tracker, DHT, and WebSeed
- Smart peer ranking algorithm
- Load balancing across peer sources
- Health monitoring and alerting
**Implementation Details**:
- Sophisticated peer scoring system
- Geographic proximity calculation
- Performance-based peer ranking
- Automatic failover and redundancy
#### 6. Video Transcoding Engine (internal/transcoding/)
**Purpose**: Automatic video conversion for web compatibility
**Key Features**:
- H.264/AAC MP4 conversion using FFmpeg
- Background processing with priority queuing
- Smart serving (transcoded when ready, original as fallback)
- Progress tracking and status API endpoints
- Configurable quality profiles and resource limits
**Implementation Details**:
- Queue-based job processing with worker pools
- Database tracking of transcoding status and progress
- File reconstruction for chunked torrents
- Intelligent priority system based on file size
- Error handling and retry mechanisms
## Storage Architecture
### Intelligent Storage Strategy
The system uses a dual-strategy approach based on file size:
```
File Upload → Size Analysis → Storage Decision → Video Processing
│ │
┌───────┴───────┐ │
│ │ │
< 100MB ≥ 100MB │
│ │ │
┌───────▼───────┐ ┌────▼────┐ │
│ Blob Storage │ │ Chunked │ │
│ │ │ Storage │ │
│ • Direct blob │ │ │ │
│ • Immediate │ │ • 2MB │ │
│ access │ │ chunks│ │
│ • No P2P │ │ • Torrent│ │
│ overhead │ │ + DHT │ │
└───────────────┘ └─────────┘ │
│ │
┌──────┴─────────────────────▼──┐
│ Video Analysis │
│ │
│ • Format Detection │
│ • Transcoding Queue │
│ • Priority Assignment │
│ • Background Processing │
└───────────────────────────────┘
```
### Storage Backends
#### Metadata Database (SQLite)
```sql
-- File metadata
CREATE TABLE files (
hash TEXT PRIMARY KEY,
filename TEXT,
size INTEGER,
storage_type TEXT, -- 'blob' or 'chunked'
created_at DATETIME,
user_id TEXT
);
-- Torrent information
CREATE TABLE torrents (
info_hash TEXT PRIMARY KEY,
file_hash TEXT,
piece_length INTEGER,
pieces_count INTEGER,
magnet_link TEXT,
FOREIGN KEY(file_hash) REFERENCES files(hash)
);
-- Chunk mapping for large files
CREATE TABLE chunks (
file_hash TEXT,
chunk_index INTEGER,
chunk_hash TEXT,
chunk_size INTEGER,
PRIMARY KEY(file_hash, chunk_index)
);
-- Transcoding job tracking
CREATE TABLE transcoding_status (
file_hash TEXT PRIMARY KEY,
status TEXT NOT NULL,
error_message TEXT,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP,
updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
#### Blob Storage
- Direct file storage in `./data/blobs/`
- SHA-256 content addressing
- Efficient for small files and frequently accessed content
- No P2P overhead - immediate availability
#### Chunk Storage
- Large files split into 2MB pieces in `./data/chunks/`
- BitTorrent-compatible piece structure
- Enables parallel downloads and partial file access
- Each chunk independently content-addressed
#### Transcoded Storage
- Processed video files stored in `./data/transcoded/`
- Organized by original file hash subdirectories
- H.264/AAC MP4 format for universal web compatibility
- Smart serving prioritizes transcoded versions when available
### Caching System
#### LRU Piece Cache
```go
type PieceCache struct {
cache map[string]*CacheEntry
lru *list.List
mutex sync.RWMutex
maxSize int64
currentSize int64
}
type CacheEntry struct {
Key string
Data []byte
Size int64
AccessTime time.Time
Element *list.Element
}
```
**Features**:
- Configurable cache size limits
- Least Recently Used eviction
- Concurrent access with read-write locks
- Cache hit ratio tracking and optimization
## Video Transcoding System
### Architecture Overview
The transcoding system provides automatic video conversion for web compatibility:
```go
type TranscodingEngine struct {
// Core Components
Transcoder *Transcoder // FFmpeg integration
Manager *Manager // Job coordination
WorkerPool chan Job // Background processing
Database *sql.DB // Status tracking
// Configuration
ConcurrentJobs int // Parallel workers
WorkDirectory string // Processing workspace
QualityProfiles []Quality // Output formats
}
```
### Processing Pipeline
1. **Upload Detection**: Video files automatically identified during upload
2. **Queue Decision**: Files 50MB queued for transcoding with priority based on size
3. **File Reconstruction**: Chunked torrents reassembled into temporary files
4. **FFmpeg Processing**: H.264/AAC conversion with web optimization flags
5. **Smart Serving**: Transcoded versions served when ready, originals as fallback
### Transcoding Manager
```go
func (tm *Manager) QueueVideoForTranscoding(fileHash, fileName, filePath string, fileSize int64) {
// Check if already processed
if tm.HasTranscodedVersion(fileHash) {
return
}
// Analyze file format
needsTranscoding, err := tm.transcoder.NeedsTranscoding(filePath)
if !needsTranscoding {
tm.markAsWebCompatible(fileHash)
return
}
// Create prioritized job
job := Job{
ID: fmt.Sprintf("transcode_%s", fileHash),
InputPath: filePath,
OutputDir: filepath.Join(tm.transcoder.workDir, fileHash),
Priority: tm.calculatePriority(fileSize),
Callback: tm.jobCompletionHandler,
}
tm.transcoder.SubmitJob(job)
tm.markTranscodingQueued(fileHash)
}
```
### Smart Priority System
- **High Priority** (8): Files < 500MB for faster user feedback
- **Medium Priority** (5): Standard processing queue
- **Low Priority** (2): Files > 5GB to prevent resource monopolization
### Status API Integration
Users can track transcoding progress via authenticated endpoints:
- `/api/users/me/files/{hash}/transcoding-status` - Real-time status and progress
- Response includes job status, progress percentage, and transcoded file availability
## P2P Integration & Coordination
### Unified Peer Discovery
The P2P coordinator aggregates peers from multiple sources:
1. **BitTorrent Tracker**: Authoritative peer list from announces
2. **DHT Network**: Distributed peer discovery across the network
3. **WebSeed**: Gateway itself as a reliable seed source
### Smart Peer Ranking Algorithm
```go
func (pr *PeerRanker) RankPeers(peers []PeerInfo, clientLocation *Location) []RankedPeer {
var ranked []RankedPeer
for _, peer := range peers {
score := pr.calculatePeerScore(peer, clientLocation)
ranked = append(ranked, RankedPeer{
Peer: peer,
Score: score,
Reason: pr.getScoreReason(peer, clientLocation),
})
}
// Sort by score (highest first)
sort.Slice(ranked, func(i, j int) bool {
return ranked[i].Score > ranked[j].Score
})
return ranked
}
```
**Scoring Factors**:
- **Geographic Proximity** (30%): Distance-based scoring
- **Source Reliability** (25%): Tracker > DHT > WebSeed fallback
- **Historical Performance** (20%): Past connection success rates
- **Load Balancing** (15%): Distribute load across available peers
- **Freshness** (10%): Recently seen peers preferred
### Health Monitoring System
#### Component Health Scoring
```go
type HealthStatus struct {
IsHealthy bool `json:"is_healthy"`
Score int `json:"score"` // 0-100
Issues []string `json:"issues"`
LastChecked time.Time `json:"last_checked"`
ResponseTime int64 `json:"response_time"` // milliseconds
Details map[string]interface{} `json:"details"`
}
```
**Weighted Health Calculation**:
- WebSeed: 40% (most critical for availability)
- Tracker: 35% (important for peer discovery)
- DHT: 25% (supplemental peer source)
#### Automatic Alerting
- Health scores below configurable threshold trigger alerts
- Multiple alert mechanisms (logs, callbacks, future integrations)
- Component-specific and overall system health monitoring
## WebSeed Implementation (BEP-19)
### Standards Compliance
The WebSeed implementation follows BEP-19 specification:
- **URL-based seeding**: BitTorrent clients can fetch pieces via HTTP
- **Range request support**: Efficient partial file downloads
- **Piece boundary alignment**: Proper handling of piece boundaries
- **Error handling**: Appropriate HTTP status codes for BitTorrent clients
### Advanced Features
#### Concurrent Request Optimization
```go
type ConcurrentRequestTracker struct {
activeRequests map[string]*RequestInfo
mutex sync.RWMutex
maxConcurrent int
}
```
- Prevents duplicate piece loads
- Manages concurrent request limits
- Request deduplication and waiting
#### Client-Specific Optimizations
```go
func (h *Handler) detectClient(userAgent string) ClientType {
switch {
case strings.Contains(userAgent, "qbittorrent"):
return ClientQBittorrent
case strings.Contains(userAgent, "transmission"):
return ClientTransmission
case strings.Contains(userAgent, "webtorrent"):
return ClientWebTorrent
// ... additional client detection
}
}
```
**Per-Client Optimizations**:
- **qBittorrent**: Standard intervals, no special handling needed
- **Transmission**: Prefers shorter announce intervals (≤30 min)
- **WebTorrent**: Short intervals for web compatibility (≤5 min)
- **uTorrent**: Minimum interval enforcement to prevent spam
## Nostr Integration
### Content Announcements
When files are uploaded, they're announced to configured Nostr relays:
```go
func (g *Gateway) announceToNostr(fileInfo *FileInfo, torrentInfo *TorrentInfo) error {
event := nostr.Event{
Kind: 2003, // NIP-35 torrent announcement kind
Content: fmt.Sprintf("New torrent: %s", fileInfo.Filename),
CreatedAt: time.Now(),
Tags: []nostr.Tag{
{"magnet", torrentInfo.MagnetLink},
{"size", fmt.Sprintf("%d", fileInfo.Size)},
{"name", fileInfo.Filename},
{"webseed", g.getWebSeedURL(fileInfo.Hash)},
},
}
return g.nostrClient.PublishEvent(event)
}
```
### Decentralized Discovery
- Content announced to multiple Nostr relays for redundancy
- Other nodes can discover content via Nostr event subscriptions
- Enables fully decentralized content network
- No central authority or single point of failure
## Performance Optimizations
### Concurrent Processing
#### Parallel Piece Loading
```go
func (ws *WebSeedHandler) loadPieces(pieces []PieceRequest) error {
const maxConcurrency = 10
semaphore := make(chan struct{}, maxConcurrency)
var wg sync.WaitGroup
for _, piece := range pieces {
wg.Add(1)
go func(p PieceRequest) {
defer wg.Done()
semaphore <- struct{}{} // Acquire
defer func() { <-semaphore }() // Release
ws.loadSinglePiece(p)
}(piece)
}
wg.Wait()
return nil
}
```
#### Connection Pooling
- HTTP client connection reuse
- Database connection pooling
- BitTorrent connection management
- Resource cleanup and lifecycle management
## Monitoring & Observability
### Comprehensive Statistics
#### System Statistics
```go
type SystemStats struct {
Files struct {
Total int64 `json:"total"`
BlobFiles int64 `json:"blob_files"`
Torrents int64 `json:"torrents"`
TotalSize int64 `json:"total_size"`
} `json:"files"`
P2P struct {
TrackerPeers int `json:"tracker_peers"`
DHTNodes int `json:"dht_nodes"`
ActiveTorrents int `json:"active_torrents"`
} `json:"p2p"`
Performance struct {
CacheHitRatio float64 `json:"cache_hit_ratio"`
AvgResponseTime int64 `json:"avg_response_time"`
RequestsPerSec float64 `json:"requests_per_sec"`
} `json:"performance"`
}
```
### Diagnostic Endpoints
- `/api/stats` - Overall system statistics
- `/api/p2p/stats` - Detailed P2P statistics
- `/api/health` - Component health status
- `/api/diagnostics` - Comprehensive system diagnostics
- `/api/webseed/health` - WebSeed-specific health
- `/api/users/me/files/{hash}/transcoding-status` - Video transcoding progress
## Conclusion
The BitTorrent Gateway represents a comprehensive solution for decentralized content distribution, combining the best aspects of traditional web hosting with peer-to-peer networks and modern video processing capabilities. Its modular architecture, intelligent routing, automatic transcoding, and production-ready features make it suitable for both small-scale deployments and large-scale content distribution networks.
The system's emphasis on standards compliance, security, performance, and user experience ensures reliable operation while maintaining the decentralized principles of the BitTorrent protocol. Through its unified approach to peer discovery, intelligent caching, automatic video optimization, and comprehensive monitoring, it provides a robust foundation for modern multimedia content distribution needs.