enki/torrent-gateway

Fork 0

enki 76979d055b

CI Pipeline / Run Tests (push) Waiting to run

Details

CI Pipeline / Lint Code (push) Waiting to run

Details

CI Pipeline / Security Scan (push) Waiting to run

Details

CI Pipeline / Build Docker Images (push) Blocked by required conditions

Details

CI Pipeline / E2E Tests (push) Blocked by required conditions

Details

Transcoding and Nip71 update

2025-08-21 19:32:26 -07:00

9.3 KiB

Raw Blame History

Troubleshooting Guide

Common Issues and Solutions

Service Startup Issues

Gateway Won't Start

Symptoms: Container exits immediately or health checks fail

Diagnostic Steps:

# Check container logs
docker-compose -f docker-compose.prod.yml logs gateway

# Check database file
ls -la data/metadata.db

# Test database connection
sqlite3 data/metadata.db "SELECT COUNT(*) FROM files;"

Common Causes & Solutions:

Database permissions:

sudo chown -R $USER:$USER data/
chmod -R 755 data/

Port conflicts:

# Check what's using port 9876
sudo netstat -tulpn | grep 9876
# Kill conflicting process or change port

Insufficient disk space:
```
df -h
# Free up space or add storage
```

Redis Connection Issues

Symptoms: Gateway logs show Redis connection errors

Solutions:

# Check Redis container
docker-compose -f docker-compose.prod.yml logs redis

# Test Redis connection
docker exec -it torrentgateway_redis_1 redis-cli ping

# Restart Redis
docker-compose -f docker-compose.prod.yml restart redis

Performance Issues

High CPU Usage

Diagnostic:

# Check container resource usage
docker stats

# Check system resources
top
htop

Solutions:

Scale gateway instances:

docker-compose -f docker-compose.prod.yml up -d --scale gateway=2

Optimize database:

./scripts/migrate.sh  # Runs VACUUM and ANALYZE

Add resource limits:

services:
  gateway:
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G

High Memory Usage

Diagnostic:

# Check memory usage by container
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"

# Check for memory leaks in logs
docker-compose logs gateway | grep -i "memory\|leak\|oom"

Solutions:

Restart affected containers:

docker-compose -f docker-compose.prod.yml restart gateway

Implement memory limits:

services:
  gateway:
    deploy:
      resources:
        limits:
          memory: 2G

Slow Response Times

Diagnostic:

# Test API response time
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9876/api/health

# Check database performance
sqlite3 data/metadata.db "EXPLAIN QUERY PLAN SELECT * FROM files LIMIT 10;"

Solutions:

Add database indexes:

./scripts/migrate.sh  # Applies performance indexes

Optimize storage:
```
# Check storage I/O
iostat -x 1 5
```

Database Issues

Database Corruption

Symptoms: SQLite errors, integrity check failures

Diagnostic:

# Check database integrity
sqlite3 data/metadata.db "PRAGMA integrity_check;"

# Check database size and structure
sqlite3 data/metadata.db ".schema"
ls -lh data/metadata.db

Recovery:

# Attempt repair
sqlite3 data/metadata.db "VACUUM;"

# If repair fails, restore from backup
./scripts/restore.sh $(ls backups/ | grep gateway_backup | tail -1 | sed 's/gateway_backup_\(.*\).tar.gz/\1/')

Database Lock Issues

Symptoms: "database is locked" errors

Solutions:

# Find processes using database
lsof data/metadata.db

# Force unlock (dangerous - stop gateway first)
docker-compose -f docker-compose.prod.yml stop gateway
rm -f data/metadata.db-wal data/metadata.db-shm

Storage Issues

Disk Space Full

Diagnostic:

# Check disk usage
df -h
du -sh data/*

# Find large files
find data/ -type f -size +100M -exec ls -lh {} \;

Solutions:

Clean up old files:

# Remove files older than 30 days
find data/blobs/ -type f -mtime +30 -delete
find data/chunks/ -type f -mtime +30 -delete

Cleanup orphaned data:

./scripts/migrate.sh  # Removes orphaned chunks

Storage Corruption

Symptoms: File integrity check failures

Diagnostic:

# Run E2E tests to verify storage
./test/e2e/run_all_tests.sh

# Check file system
fsck /dev/disk/by-label/data

Network Issues

API Timeouts

Diagnostic:

# Test network connectivity
curl -v http://localhost:9876/api/health

# Check Docker network
docker network ls
docker network inspect torrentgateway_default

Solutions:

# Restart networking
docker-compose -f docker-compose.prod.yml down
docker-compose -f docker-compose.prod.yml up -d

# Increase timeouts in client
curl --connect-timeout 30 --max-time 60 http://localhost:9876/api/health

Port Binding Issues

Symptoms: "Port already in use" errors

Diagnostic:

# Check port usage
sudo netstat -tulpn | grep :9876
sudo lsof -i :9876

Solutions:

# Kill conflicting process
sudo kill $(sudo lsof -t -i:9876)

# Or change port in docker-compose.yml

Monitoring Issues

Prometheus Not Scraping

Diagnostic:

# Check Prometheus targets
curl -s http://localhost:9090/api/v1/targets

# Check metrics endpoint
curl -s http://localhost:9876/metrics

Solutions:

# Restart Prometheus
docker-compose -f docker-compose.prod.yml restart prometheus

# Check configuration
docker-compose -f docker-compose.prod.yml exec prometheus cat /etc/prometheus/prometheus.yml

Grafana Dashboard Issues

Common Problems:

No data in dashboards:
- Check Prometheus data source configuration
- Verify metrics are being collected
Dashboard import failures:
- Check JSON syntax
- Verify dashboard version compatibility

Log Analysis

Finding Specific Errors

# Gateway application logs
docker-compose -f docker-compose.prod.yml logs gateway | grep -i error

# System logs with timestamps
docker-compose -f docker-compose.prod.yml logs --timestamps

# Follow logs in real-time
docker-compose -f docker-compose.prod.yml logs -f gateway

Log Rotation Issues

# Check log sizes
docker-compose -f docker-compose.prod.yml exec gateway ls -lh /app/logs/

# Manually rotate logs
docker-compose -f docker-compose.prod.yml exec gateway logrotate /etc/logrotate.conf

Emergency Procedures

Complete Service Failure

Stop all services:

docker-compose -f docker-compose.prod.yml down

Check system resources:
```
df -h
free -h
top
```
Restore from backup:
```
./scripts/restore.sh <timestamp>
```

Data Recovery

Create immediate backup:
```
./scripts/backup.sh emergency
```

Assess data integrity:

sqlite3 data/metadata.db "PRAGMA integrity_check;"

Restore if necessary:

./scripts/restore.sh <last_good_backup>

Getting Help

Log Collection

Before reporting issues, collect relevant logs:

# Create diagnostics package
mkdir -p diagnostics
docker-compose -f docker-compose.prod.yml logs > diagnostics/service_logs.txt
./scripts/health_check.sh > diagnostics/health_check.txt 2>&1
cp data/metadata.db diagnostics/ 2>/dev/null || echo "Database not accessible"
tar -czf diagnostics_$(date +%Y%m%d_%H%M%S).tar.gz diagnostics/

Health Check Output

Always include health check results:

./scripts/health_check.sh | tee health_status.txt

System Information

# Collect system info
echo "Docker version: $(docker --version)" > system_info.txt
echo "Docker Compose version: $(docker-compose --version)" >> system_info.txt
echo "System: $(uname -a)" >> system_info.txt
echo "Memory: $(free -h)" >> system_info.txt
echo "Disk: $(df -h)" >> system_info.txt
echo "FFmpeg: $(ffmpeg -version 2>/dev/null | head -1 || echo 'Not installed')" >> system_info.txt

Video Transcoding Issues

FFmpeg Not Found

Symptoms: Transcoding fails with "ffmpeg not found" errors

Solution:

# Install FFmpeg
sudo apt install ffmpeg  # Ubuntu/Debian
sudo yum install ffmpeg  # CentOS/RHEL
brew install ffmpeg      # macOS

# Verify installation
ffmpeg -version

Transcoding Jobs Stuck

Symptoms: Videos remain in "queued" or "processing" status

Diagnostic Steps:

# Check transcoding status
curl -H "Authorization: Bearer $TOKEN" \
  http://localhost:9877/api/users/me/files/$HASH/transcoding-status

# Check process resources
ps aux | grep ffmpeg
top -p $(pgrep ffmpeg)

Common Causes:

Insufficient disk space in work directory
Memory limits exceeded
Invalid video format
Corrupted source file

High Resource Usage

Symptoms: System slow during transcoding, high CPU/memory usage

Solutions:

# Reduce concurrent jobs
transcoding:
  concurrent_jobs: 2        # Lower from 4

# Limit CPU usage
transcoding:
  max_cpu_percent: 50       # Reduce from 80
  nice_level: 15            # Increase from 10

# Increase minimum file size threshold
transcoding:
  min_file_size: 200MB      # Skip more small files

Failed Transcoding Jobs

Symptoms: Jobs marked as "failed" in status API

Diagnostic Steps:

# Check transcoding logs
grep "transcoding" /var/log/torrent-gateway.log

# Check FFmpeg error output
journalctl -u torrent-gateway | grep ffmpeg

Common Solutions:

Verify source file is not corrupted
Check available disk space
Ensure FFmpeg supports input format
Review resource limits

9.3 KiB Raw Blame History

Troubleshooting Guide

Common Issues and Solutions

Service Startup Issues

Gateway Won't Start

Redis Connection Issues

Performance Issues

High CPU Usage

High Memory Usage

Slow Response Times

Database Issues

Database Corruption

Database Lock Issues

Storage Issues

Disk Space Full

Storage Corruption

Network Issues

API Timeouts

Port Binding Issues

Monitoring Issues

Prometheus Not Scraping

Grafana Dashboard Issues

Log Analysis

Finding Specific Errors

Log Rotation Issues

Emergency Procedures

Complete Service Failure

Data Recovery

Getting Help

Log Collection

Health Check Output

System Information

Video Transcoding Issues

FFmpeg Not Found

Transcoding Jobs Stuck

High Resource Usage

Failed Transcoding Jobs

9.3 KiB

Raw Blame History