torrent-gateway/docs/performance.md
enki 76979d055b
Some checks are pending
CI Pipeline / Run Tests (push) Waiting to run
CI Pipeline / Lint Code (push) Waiting to run
CI Pipeline / Security Scan (push) Waiting to run
CI Pipeline / Build Docker Images (push) Blocked by required conditions
CI Pipeline / E2E Tests (push) Blocked by required conditions
Transcoding and Nip71 update
2025-08-21 19:32:26 -07:00

458 lines
9.6 KiB
Markdown

# Performance Tuning Guide
## Overview
This guide covers optimizing Torrent Gateway performance for different workloads and deployment sizes, including video transcoding workloads.
## Database Optimization
### Indexes
The migration script applies performance indexes automatically:
```sql
-- File lookup optimization
CREATE INDEX idx_files_owner_pubkey ON files(owner_pubkey);
CREATE INDEX idx_files_storage_type ON files(storage_type);
CREATE INDEX idx_files_access_level ON files(access_level);
CREATE INDEX idx_files_size ON files(size);
CREATE INDEX idx_files_last_access ON files(last_access);
-- Chunk optimization
CREATE INDEX idx_chunks_chunk_hash ON chunks(chunk_hash);
-- User statistics
CREATE INDEX idx_users_storage_used ON users(storage_used);
-- Transcoding status optimization
CREATE INDEX idx_transcoding_status ON transcoding_status(status);
CREATE INDEX idx_transcoding_updated ON transcoding_status(updated_at);
```
### Database Maintenance
```bash
# Run regular maintenance
./scripts/migrate.sh
# Manual optimization
sqlite3 data/metadata.db "VACUUM;"
sqlite3 data/metadata.db "ANALYZE;"
```
### Connection Pooling
Configure connection limits in your application:
```go
// In production config
MaxOpenConns: 25
MaxIdleConns: 5
ConnMaxLifetime: 300 * time.Second
```
## Application Tuning
### Memory Management
**Go Runtime Settings:**
```bash
# Set garbage collection target
export GOGC=100
# Set memory limit
export GOMEMLIMIT=2GB
```
**Container Limits:**
```yaml
services:
gateway:
deploy:
resources:
limits:
memory: 2G
reservations:
memory: 1G
```
### File Handling
**Large File Optimization:**
- Files >10MB use torrent storage (chunked)
- Files <10MB use blob storage (single file)
- Chunk size: 256KB (configurable)
**Storage Path Optimization:**
```bash
# Use SSD for database and small files
ln -s /fast/ssd/path data/blobs
# Use HDD for large file chunks
ln -s /bulk/hdd/path data/chunks
```
## Network Performance
### Connection Limits
**Reverse Proxy (nginx):**
```nginx
upstream gateway {
server 127.0.0.1:9876 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
location / {
proxy_pass http://gateway;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_buffering off;
}
}
```
### Rate Limiting
Configure rate limits based on usage patterns:
```yaml
# In docker-compose.prod.yml
environment:
- RATE_LIMIT_UPLOAD=10/minute
- RATE_LIMIT_DOWNLOAD=100/minute
- RATE_LIMIT_API=1000/minute
```
## Storage Performance
### Storage Backend Selection
**Blob Storage (< 10MB files):**
- Best for: Documents, images, small media
- Performance: Direct file system access
- Scaling: Limited by file system performance
**Torrent Storage (> 10MB files):**
- Best for: Large media, archives, datasets
- Performance: Parallel chunk processing
- Scaling: Horizontal scaling via chunk distribution
### File System Tuning
**For Linux ext4:**
```bash
# Optimize for many small files
tune2fs -o journal_data_writeback /dev/sdb1
mount -o noatime,data=writeback /dev/sdb1 /data
```
**For ZFS:**
```bash
# Optimize for mixed workload
zfs set compression=lz4 tank/data
zfs set atime=off tank/data
zfs set recordsize=64K tank/data
```
## Monitoring and Metrics
### Key Metrics to Watch
**Application Metrics:**
- Request rate and latency
- Error rates by endpoint
- Active connections
- File upload/download rates
- Storage usage growth
**System Metrics:**
- CPU utilization
- Memory usage
- Disk I/O and space
- Network throughput
### Prometheus Queries
**Request Rate:**
```promql
rate(http_requests_total[5m])
```
**95th Percentile Latency:**
```promql
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
```
**Error Rate:**
```promql
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
```
**Storage Growth:**
```promql
increase(storage_bytes_total[24h])
```
### Alert Thresholds
**Critical Alerts:**
- Error rate > 5%
- Response time > 5s
- Disk usage > 90%
- Memory usage > 85%
**Warning Alerts:**
- Error rate > 1%
- Response time > 2s
- Disk usage > 80%
- Memory usage > 70%
## Load Testing
### Running Load Tests
```bash
# Start with integration load test
go test -v -tags=integration ./test/... -run TestLoadTesting -timeout 15m
# Custom load test with specific parameters
go test -v -tags=integration ./test/... -run TestLoadTesting \
-concurrent-users=100 \
-test-duration=300s \
-timeout 20m
```
### Interpreting Results
**Good Performance Indicators:**
- 95th percentile response time < 1s
- Error rate < 0.1%
- Throughput > 100 requests/second
- Memory usage stable over time
**Performance Bottlenecks:**
- High database response times → Add indexes or scale database
- High CPU usage → Scale horizontally or optimize code
- High memory usage → Check for memory leaks or add limits
- High disk I/O → Use faster storage or optimize queries
## Scaling Strategies
### Vertical Scaling
**Increase Resources:**
```yaml
services:
gateway:
deploy:
resources:
limits:
cpus: '2.0'
memory: 4G
```
### Horizontal Scaling
**Multiple Gateway Instances:**
```bash
# Scale to 3 instances
docker-compose -f docker-compose.prod.yml up -d --scale gateway=3
```
**Load Balancer Configuration:**
```nginx
upstream gateway_cluster {
server 127.0.0.1:9876;
server 127.0.0.1:9877;
server 127.0.0.1:9878;
}
```
### Database Scaling
**Read Replicas:**
- Implement read-only database replicas
- Route read queries to replicas
- Use primary for writes only
**Sharding Strategy:**
- Shard by user pubkey hash
- Distribute across multiple databases
- Implement shard-aware routing
## Caching Strategies
### Application-Level Caching
**Redis Configuration:**
```yaml
redis:
image: redis:7-alpine
command: redis-server --maxmemory 1gb --maxmemory-policy allkeys-lru
```
**Cache Patterns:**
- User session data (TTL: 24h)
- File metadata (TTL: 1h)
- API responses (TTL: 5m)
- Authentication challenges (TTL: 10m)
### CDN Integration
For public files, consider CDN integration:
- CloudFlare for global distribution
- AWS CloudFront for AWS deployments
- Custom edge servers for private deployments
## Configuration Tuning
### Environment Variables
**Production Settings:**
```bash
# Application tuning
export MAX_UPLOAD_SIZE=1GB
export CHUNK_SIZE=256KB
export MAX_CONCURRENT_UPLOADS=10
export DATABASE_TIMEOUT=30s
# Performance tuning
export GOMAXPROCS=4
export GOGC=100
export GOMEMLIMIT=2GB
# Logging
export LOG_LEVEL=info
export LOG_FORMAT=json
```
### Docker Compose Optimization
```yaml
services:
gateway:
# Use host networking for better performance
network_mode: host
# Optimize logging
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
# Resource reservations
deploy:
resources:
reservations:
memory: 512M
cpus: '0.5'
```
## Benchmarking
### Baseline Performance Tests
```bash
# API performance
ab -n 1000 -c 10 http://localhost:9876/api/health
# Upload performance
for i in {1..10}; do
time curl -X POST -F "file=@test/testdata/small.txt" http://localhost:9876/api/upload
done
# Download performance
time curl -O http://localhost:9876/api/download/[hash]
```
### Continuous Performance Monitoring
**Setup automated benchmarks:**
```bash
# Add to cron
0 2 * * * /path/to/performance_benchmark.sh
```
**Track performance metrics over time:**
- Response time trends
- Throughput capacity
- Resource utilization patterns
- Error rate trends
## Optimization Checklist
### Application Level
- [ ] Database indexes applied
- [ ] Connection pooling configured
- [ ] Caching strategy implemented
- [ ] Resource limits set
- [ ] Garbage collection tuned
### Infrastructure Level
- [ ] Fast storage for database
- [ ] Adequate RAM allocated
- [ ] Network bandwidth sufficient
- [ ] Load balancer configured
- [ ] CDN setup for static content
### Monitoring Level
- [ ] Performance alerts configured
- [ ] Baseline metrics established
- [ ] Regular load testing scheduled
- [ ] Capacity planning reviewed
- [ ] Performance dashboards created
## Video Transcoding Performance
### Hardware Requirements
**CPU:**
- 4+ cores recommended for concurrent transcoding
- Modern CPU with hardware encoding support (Intel QuickSync, AMD VCE)
- Higher core count = more concurrent jobs
**Memory:**
- 2GB+ RAM per concurrent transcoding job
- Additional 1GB+ for temporary file storage
- Consider SSD swap for large files
**Storage:**
- Fast SSD for work directory (`transcoding.work_dir`)
- Separate from main storage to avoid I/O contention
- Plan for 2-3x video file size temporary space
### Configuration Optimization
```yaml
transcoding:
enabled: true
concurrent_jobs: 4 # Match CPU cores
work_dir: "/fast/ssd/transcoding" # Use fastest storage
max_cpu_percent: 80 # Limit CPU usage
nice_level: 10 # Lower priority than main service
min_file_size: 100MB # Skip small files
```
### Performance Monitoring
**Key Metrics:**
- Queue depth and processing time
- CPU usage during transcoding
- Storage I/O patterns
- Memory consumption per job
- Failed job retry rates
**Alerts:**
- Queue backlog > 50 jobs
- Average processing time > 5 minutes per GB
- Failed job rate > 10%
- Storage space < 20% free
### Optimization Strategies
1. **Priority System**: Smaller files processed first for user feedback
2. **Resource Limits**: Prevent transcoding from affecting main service
3. **Smart Serving**: Original files served while transcoding in progress
4. **Batch Processing**: Group similar formats for efficiency
5. **Hardware Acceleration**: Use GPU encoding when available