Some checks are pending
CI Pipeline / Run Tests (push) Waiting to run
CI Pipeline / Lint Code (push) Waiting to run
CI Pipeline / Security Scan (push) Waiting to run
CI Pipeline / Build Docker Images (push) Blocked by required conditions
CI Pipeline / E2E Tests (push) Blocked by required conditions
458 lines
9.6 KiB
Markdown
458 lines
9.6 KiB
Markdown
# Performance Tuning Guide
|
|
|
|
## Overview
|
|
|
|
This guide covers optimizing Torrent Gateway performance for different workloads and deployment sizes, including video transcoding workloads.
|
|
|
|
## Database Optimization
|
|
|
|
### Indexes
|
|
|
|
The migration script applies performance indexes automatically:
|
|
|
|
```sql
|
|
-- File lookup optimization
|
|
CREATE INDEX idx_files_owner_pubkey ON files(owner_pubkey);
|
|
CREATE INDEX idx_files_storage_type ON files(storage_type);
|
|
CREATE INDEX idx_files_access_level ON files(access_level);
|
|
CREATE INDEX idx_files_size ON files(size);
|
|
CREATE INDEX idx_files_last_access ON files(last_access);
|
|
|
|
-- Chunk optimization
|
|
CREATE INDEX idx_chunks_chunk_hash ON chunks(chunk_hash);
|
|
|
|
-- User statistics
|
|
CREATE INDEX idx_users_storage_used ON users(storage_used);
|
|
|
|
-- Transcoding status optimization
|
|
CREATE INDEX idx_transcoding_status ON transcoding_status(status);
|
|
CREATE INDEX idx_transcoding_updated ON transcoding_status(updated_at);
|
|
```
|
|
|
|
### Database Maintenance
|
|
|
|
```bash
|
|
# Run regular maintenance
|
|
./scripts/migrate.sh
|
|
|
|
# Manual optimization
|
|
sqlite3 data/metadata.db "VACUUM;"
|
|
sqlite3 data/metadata.db "ANALYZE;"
|
|
```
|
|
|
|
### Connection Pooling
|
|
|
|
Configure connection limits in your application:
|
|
```go
|
|
// In production config
|
|
MaxOpenConns: 25
|
|
MaxIdleConns: 5
|
|
ConnMaxLifetime: 300 * time.Second
|
|
```
|
|
|
|
## Application Tuning
|
|
|
|
### Memory Management
|
|
|
|
**Go Runtime Settings:**
|
|
```bash
|
|
# Set garbage collection target
|
|
export GOGC=100
|
|
|
|
# Set memory limit
|
|
export GOMEMLIMIT=2GB
|
|
```
|
|
|
|
**Container Limits:**
|
|
```yaml
|
|
services:
|
|
gateway:
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 2G
|
|
reservations:
|
|
memory: 1G
|
|
```
|
|
|
|
### File Handling
|
|
|
|
**Large File Optimization:**
|
|
- Files >10MB use torrent storage (chunked)
|
|
- Files <10MB use blob storage (single file)
|
|
- Chunk size: 256KB (configurable)
|
|
|
|
**Storage Path Optimization:**
|
|
```bash
|
|
# Use SSD for database and small files
|
|
ln -s /fast/ssd/path data/blobs
|
|
|
|
# Use HDD for large file chunks
|
|
ln -s /bulk/hdd/path data/chunks
|
|
```
|
|
|
|
## Network Performance
|
|
|
|
### Connection Limits
|
|
|
|
**Reverse Proxy (nginx):**
|
|
```nginx
|
|
upstream gateway {
|
|
server 127.0.0.1:9876 max_fails=3 fail_timeout=30s;
|
|
keepalive 32;
|
|
}
|
|
|
|
server {
|
|
location / {
|
|
proxy_pass http://gateway;
|
|
proxy_http_version 1.1;
|
|
proxy_set_header Connection "";
|
|
proxy_buffering off;
|
|
}
|
|
}
|
|
```
|
|
|
|
### Rate Limiting
|
|
|
|
Configure rate limits based on usage patterns:
|
|
```yaml
|
|
# In docker-compose.prod.yml
|
|
environment:
|
|
- RATE_LIMIT_UPLOAD=10/minute
|
|
- RATE_LIMIT_DOWNLOAD=100/minute
|
|
- RATE_LIMIT_API=1000/minute
|
|
```
|
|
|
|
## Storage Performance
|
|
|
|
### Storage Backend Selection
|
|
|
|
**Blob Storage (< 10MB files):**
|
|
- Best for: Documents, images, small media
|
|
- Performance: Direct file system access
|
|
- Scaling: Limited by file system performance
|
|
|
|
**Torrent Storage (> 10MB files):**
|
|
- Best for: Large media, archives, datasets
|
|
- Performance: Parallel chunk processing
|
|
- Scaling: Horizontal scaling via chunk distribution
|
|
|
|
### File System Tuning
|
|
|
|
**For Linux ext4:**
|
|
```bash
|
|
# Optimize for many small files
|
|
tune2fs -o journal_data_writeback /dev/sdb1
|
|
mount -o noatime,data=writeback /dev/sdb1 /data
|
|
```
|
|
|
|
**For ZFS:**
|
|
```bash
|
|
# Optimize for mixed workload
|
|
zfs set compression=lz4 tank/data
|
|
zfs set atime=off tank/data
|
|
zfs set recordsize=64K tank/data
|
|
```
|
|
|
|
## Monitoring and Metrics
|
|
|
|
### Key Metrics to Watch
|
|
|
|
**Application Metrics:**
|
|
- Request rate and latency
|
|
- Error rates by endpoint
|
|
- Active connections
|
|
- File upload/download rates
|
|
- Storage usage growth
|
|
|
|
**System Metrics:**
|
|
- CPU utilization
|
|
- Memory usage
|
|
- Disk I/O and space
|
|
- Network throughput
|
|
|
|
### Prometheus Queries
|
|
|
|
**Request Rate:**
|
|
```promql
|
|
rate(http_requests_total[5m])
|
|
```
|
|
|
|
**95th Percentile Latency:**
|
|
```promql
|
|
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
|
|
```
|
|
|
|
**Error Rate:**
|
|
```promql
|
|
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m])
|
|
```
|
|
|
|
**Storage Growth:**
|
|
```promql
|
|
increase(storage_bytes_total[24h])
|
|
```
|
|
|
|
### Alert Thresholds
|
|
|
|
**Critical Alerts:**
|
|
- Error rate > 5%
|
|
- Response time > 5s
|
|
- Disk usage > 90%
|
|
- Memory usage > 85%
|
|
|
|
**Warning Alerts:**
|
|
- Error rate > 1%
|
|
- Response time > 2s
|
|
- Disk usage > 80%
|
|
- Memory usage > 70%
|
|
|
|
## Load Testing
|
|
|
|
### Running Load Tests
|
|
|
|
```bash
|
|
# Start with integration load test
|
|
go test -v -tags=integration ./test/... -run TestLoadTesting -timeout 15m
|
|
|
|
# Custom load test with specific parameters
|
|
go test -v -tags=integration ./test/... -run TestLoadTesting \
|
|
-concurrent-users=100 \
|
|
-test-duration=300s \
|
|
-timeout 20m
|
|
```
|
|
|
|
### Interpreting Results
|
|
|
|
**Good Performance Indicators:**
|
|
- 95th percentile response time < 1s
|
|
- Error rate < 0.1%
|
|
- Throughput > 100 requests/second
|
|
- Memory usage stable over time
|
|
|
|
**Performance Bottlenecks:**
|
|
- High database response times → Add indexes or scale database
|
|
- High CPU usage → Scale horizontally or optimize code
|
|
- High memory usage → Check for memory leaks or add limits
|
|
- High disk I/O → Use faster storage or optimize queries
|
|
|
|
## Scaling Strategies
|
|
|
|
### Vertical Scaling
|
|
|
|
**Increase Resources:**
|
|
```yaml
|
|
services:
|
|
gateway:
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '2.0'
|
|
memory: 4G
|
|
```
|
|
|
|
### Horizontal Scaling
|
|
|
|
**Multiple Gateway Instances:**
|
|
```bash
|
|
# Scale to 3 instances
|
|
docker-compose -f docker-compose.prod.yml up -d --scale gateway=3
|
|
```
|
|
|
|
**Load Balancer Configuration:**
|
|
```nginx
|
|
upstream gateway_cluster {
|
|
server 127.0.0.1:9876;
|
|
server 127.0.0.1:9877;
|
|
server 127.0.0.1:9878;
|
|
}
|
|
```
|
|
|
|
### Database Scaling
|
|
|
|
**Read Replicas:**
|
|
- Implement read-only database replicas
|
|
- Route read queries to replicas
|
|
- Use primary for writes only
|
|
|
|
**Sharding Strategy:**
|
|
- Shard by user pubkey hash
|
|
- Distribute across multiple databases
|
|
- Implement shard-aware routing
|
|
|
|
## Caching Strategies
|
|
|
|
### Application-Level Caching
|
|
|
|
**Redis Configuration:**
|
|
```yaml
|
|
redis:
|
|
image: redis:7-alpine
|
|
command: redis-server --maxmemory 1gb --maxmemory-policy allkeys-lru
|
|
```
|
|
|
|
**Cache Patterns:**
|
|
- User session data (TTL: 24h)
|
|
- File metadata (TTL: 1h)
|
|
- API responses (TTL: 5m)
|
|
- Authentication challenges (TTL: 10m)
|
|
|
|
### CDN Integration
|
|
|
|
For public files, consider CDN integration:
|
|
- CloudFlare for global distribution
|
|
- AWS CloudFront for AWS deployments
|
|
- Custom edge servers for private deployments
|
|
|
|
## Configuration Tuning
|
|
|
|
### Environment Variables
|
|
|
|
**Production Settings:**
|
|
```bash
|
|
# Application tuning
|
|
export MAX_UPLOAD_SIZE=1GB
|
|
export CHUNK_SIZE=256KB
|
|
export MAX_CONCURRENT_UPLOADS=10
|
|
export DATABASE_TIMEOUT=30s
|
|
|
|
# Performance tuning
|
|
export GOMAXPROCS=4
|
|
export GOGC=100
|
|
export GOMEMLIMIT=2GB
|
|
|
|
# Logging
|
|
export LOG_LEVEL=info
|
|
export LOG_FORMAT=json
|
|
```
|
|
|
|
### Docker Compose Optimization
|
|
|
|
```yaml
|
|
services:
|
|
gateway:
|
|
# Use host networking for better performance
|
|
network_mode: host
|
|
|
|
# Optimize logging
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "10m"
|
|
max-file: "3"
|
|
|
|
# Resource reservations
|
|
deploy:
|
|
resources:
|
|
reservations:
|
|
memory: 512M
|
|
cpus: '0.5'
|
|
```
|
|
|
|
## Benchmarking
|
|
|
|
### Baseline Performance Tests
|
|
|
|
```bash
|
|
# API performance
|
|
ab -n 1000 -c 10 http://localhost:9876/api/health
|
|
|
|
# Upload performance
|
|
for i in {1..10}; do
|
|
time curl -X POST -F "file=@test/testdata/small.txt" http://localhost:9876/api/upload
|
|
done
|
|
|
|
# Download performance
|
|
time curl -O http://localhost:9876/api/download/[hash]
|
|
```
|
|
|
|
### Continuous Performance Monitoring
|
|
|
|
**Setup automated benchmarks:**
|
|
```bash
|
|
# Add to cron
|
|
0 2 * * * /path/to/performance_benchmark.sh
|
|
```
|
|
|
|
**Track performance metrics over time:**
|
|
- Response time trends
|
|
- Throughput capacity
|
|
- Resource utilization patterns
|
|
- Error rate trends
|
|
|
|
## Optimization Checklist
|
|
|
|
### Application Level
|
|
- [ ] Database indexes applied
|
|
- [ ] Connection pooling configured
|
|
- [ ] Caching strategy implemented
|
|
- [ ] Resource limits set
|
|
- [ ] Garbage collection tuned
|
|
|
|
### Infrastructure Level
|
|
- [ ] Fast storage for database
|
|
- [ ] Adequate RAM allocated
|
|
- [ ] Network bandwidth sufficient
|
|
- [ ] Load balancer configured
|
|
- [ ] CDN setup for static content
|
|
|
|
### Monitoring Level
|
|
- [ ] Performance alerts configured
|
|
- [ ] Baseline metrics established
|
|
- [ ] Regular load testing scheduled
|
|
- [ ] Capacity planning reviewed
|
|
- [ ] Performance dashboards created
|
|
|
|
## Video Transcoding Performance
|
|
|
|
### Hardware Requirements
|
|
|
|
**CPU:**
|
|
- 4+ cores recommended for concurrent transcoding
|
|
- Modern CPU with hardware encoding support (Intel QuickSync, AMD VCE)
|
|
- Higher core count = more concurrent jobs
|
|
|
|
**Memory:**
|
|
- 2GB+ RAM per concurrent transcoding job
|
|
- Additional 1GB+ for temporary file storage
|
|
- Consider SSD swap for large files
|
|
|
|
**Storage:**
|
|
- Fast SSD for work directory (`transcoding.work_dir`)
|
|
- Separate from main storage to avoid I/O contention
|
|
- Plan for 2-3x video file size temporary space
|
|
|
|
### Configuration Optimization
|
|
|
|
```yaml
|
|
transcoding:
|
|
enabled: true
|
|
concurrent_jobs: 4 # Match CPU cores
|
|
work_dir: "/fast/ssd/transcoding" # Use fastest storage
|
|
max_cpu_percent: 80 # Limit CPU usage
|
|
nice_level: 10 # Lower priority than main service
|
|
min_file_size: 100MB # Skip small files
|
|
```
|
|
|
|
### Performance Monitoring
|
|
|
|
**Key Metrics:**
|
|
- Queue depth and processing time
|
|
- CPU usage during transcoding
|
|
- Storage I/O patterns
|
|
- Memory consumption per job
|
|
- Failed job retry rates
|
|
|
|
**Alerts:**
|
|
- Queue backlog > 50 jobs
|
|
- Average processing time > 5 minutes per GB
|
|
- Failed job rate > 10%
|
|
- Storage space < 20% free
|
|
|
|
### Optimization Strategies
|
|
|
|
1. **Priority System**: Smaller files processed first for user feedback
|
|
2. **Resource Limits**: Prevent transcoding from affecting main service
|
|
3. **Smart Serving**: Original files served while transcoding in progress
|
|
4. **Batch Processing**: Group similar formats for efficiency
|
|
5. **Hardware Acceleration**: Use GPU encoding when available |