# Troubleshooting Guide ## Common Issues and Solutions ### Service Startup Issues #### Gateway Won't Start **Symptoms:** Container exits immediately or health checks fail **Diagnostic Steps:** ```bash # Check container logs docker-compose -f docker-compose.prod.yml logs gateway # Check database file ls -la data/metadata.db # Test database connection sqlite3 data/metadata.db "SELECT COUNT(*) FROM files;" ``` **Common Causes & Solutions:** 1. **Database permissions:** ```bash sudo chown -R $USER:$USER data/ chmod -R 755 data/ ``` 2. **Port conflicts:** ```bash # Check what's using port 9876 sudo netstat -tulpn | grep 9876 # Kill conflicting process or change port ``` 3. **Insufficient disk space:** ```bash df -h # Free up space or add storage ``` #### Redis Connection Issues **Symptoms:** Gateway logs show Redis connection errors **Solutions:** ```bash # Check Redis container docker-compose -f docker-compose.prod.yml logs redis # Test Redis connection docker exec -it torrentgateway_redis_1 redis-cli ping # Restart Redis docker-compose -f docker-compose.prod.yml restart redis ``` ### Performance Issues #### High CPU Usage **Diagnostic:** ```bash # Check container resource usage docker stats # Check system resources top htop ``` **Solutions:** 1. **Scale gateway instances:** ```bash docker-compose -f docker-compose.prod.yml up -d --scale gateway=2 ``` 2. **Optimize database:** ```bash ./scripts/migrate.sh # Runs VACUUM and ANALYZE ``` 3. **Add resource limits:** ```yaml services: gateway: deploy: resources: limits: cpus: '1.0' memory: 1G ``` #### High Memory Usage **Diagnostic:** ```bash # Check memory usage by container docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}" # Check for memory leaks in logs docker-compose logs gateway | grep -i "memory\|leak\|oom" ``` **Solutions:** 1. **Restart affected containers:** ```bash docker-compose -f docker-compose.prod.yml restart gateway ``` 2. **Implement memory limits:** ```yaml services: gateway: deploy: resources: limits: memory: 2G ``` #### Slow Response Times **Diagnostic:** ```bash # Test API response time curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9876/api/health # Check database performance sqlite3 data/metadata.db "EXPLAIN QUERY PLAN SELECT * FROM files LIMIT 10;" ``` **Solutions:** 1. **Add database indexes:** ```bash ./scripts/migrate.sh # Applies performance indexes ``` 2. **Optimize storage:** ```bash # Check storage I/O iostat -x 1 5 ``` ### Database Issues #### Database Corruption **Symptoms:** SQLite errors, integrity check failures **Diagnostic:** ```bash # Check database integrity sqlite3 data/metadata.db "PRAGMA integrity_check;" # Check database size and structure sqlite3 data/metadata.db ".schema" ls -lh data/metadata.db ``` **Recovery:** ```bash # Attempt repair sqlite3 data/metadata.db "VACUUM;" # If repair fails, restore from backup ./scripts/restore.sh $(ls backups/ | grep gateway_backup | tail -1 | sed 's/gateway_backup_\(.*\).tar.gz/\1/') ``` #### Database Lock Issues **Symptoms:** "database is locked" errors **Solutions:** ```bash # Find processes using database lsof data/metadata.db # Force unlock (dangerous - stop gateway first) docker-compose -f docker-compose.prod.yml stop gateway rm -f data/metadata.db-wal data/metadata.db-shm ``` ### Storage Issues #### Disk Space Full **Diagnostic:** ```bash # Check disk usage df -h du -sh data/* # Find large files find data/ -type f -size +100M -exec ls -lh {} \; ``` **Solutions:** 1. **Clean up old files:** ```bash # Remove files older than 30 days find data/blobs/ -type f -mtime +30 -delete find data/chunks/ -type f -mtime +30 -delete ``` 2. **Cleanup orphaned data:** ```bash ./scripts/migrate.sh # Removes orphaned chunks ``` #### Storage Corruption **Symptoms:** File integrity check failures **Diagnostic:** ```bash # Run E2E tests to verify storage ./test/e2e/run_all_tests.sh # Check file system fsck /dev/disk/by-label/data ``` ### Network Issues #### API Timeouts **Diagnostic:** ```bash # Test network connectivity curl -v http://localhost:9876/api/health # Check Docker network docker network ls docker network inspect torrentgateway_default ``` **Solutions:** ```bash # Restart networking docker-compose -f docker-compose.prod.yml down docker-compose -f docker-compose.prod.yml up -d # Increase timeouts in client curl --connect-timeout 30 --max-time 60 http://localhost:9876/api/health ``` #### Port Binding Issues **Symptoms:** "Port already in use" errors **Diagnostic:** ```bash # Check port usage sudo netstat -tulpn | grep :9876 sudo lsof -i :9876 ``` **Solutions:** ```bash # Kill conflicting process sudo kill $(sudo lsof -t -i:9876) # Or change port in docker-compose.yml ``` ### Monitoring Issues #### Prometheus Not Scraping **Diagnostic:** ```bash # Check Prometheus targets curl -s http://localhost:9090/api/v1/targets # Check metrics endpoint curl -s http://localhost:9876/metrics ``` **Solutions:** ```bash # Restart Prometheus docker-compose -f docker-compose.prod.yml restart prometheus # Check configuration docker-compose -f docker-compose.prod.yml exec prometheus cat /etc/prometheus/prometheus.yml ``` #### Grafana Dashboard Issues **Common Problems:** 1. **No data in dashboards:** - Check Prometheus data source configuration - Verify metrics are being collected 2. **Dashboard import failures:** - Check JSON syntax - Verify dashboard version compatibility ### Log Analysis #### Finding Specific Errors ```bash # Gateway application logs docker-compose -f docker-compose.prod.yml logs gateway | grep -i error # System logs with timestamps docker-compose -f docker-compose.prod.yml logs --timestamps # Follow logs in real-time docker-compose -f docker-compose.prod.yml logs -f gateway ``` #### Log Rotation Issues ```bash # Check log sizes docker-compose -f docker-compose.prod.yml exec gateway ls -lh /app/logs/ # Manually rotate logs docker-compose -f docker-compose.prod.yml exec gateway logrotate /etc/logrotate.conf ``` ## Emergency Procedures ### Complete Service Failure 1. **Stop all services:** ```bash docker-compose -f docker-compose.prod.yml down ``` 2. **Check system resources:** ```bash df -h free -h top ``` 3. **Restore from backup:** ```bash ./scripts/restore.sh ``` ### Data Recovery 1. **Create immediate backup:** ```bash ./scripts/backup.sh emergency ``` 2. **Assess data integrity:** ```bash sqlite3 data/metadata.db "PRAGMA integrity_check;" ``` 3. **Restore if necessary:** ```bash ./scripts/restore.sh ``` ## Getting Help ### Log Collection Before reporting issues, collect relevant logs: ```bash # Create diagnostics package mkdir -p diagnostics docker-compose -f docker-compose.prod.yml logs > diagnostics/service_logs.txt ./scripts/health_check.sh > diagnostics/health_check.txt 2>&1 cp data/metadata.db diagnostics/ 2>/dev/null || echo "Database not accessible" tar -czf diagnostics_$(date +%Y%m%d_%H%M%S).tar.gz diagnostics/ ``` ### Health Check Output Always include health check results: ```bash ./scripts/health_check.sh | tee health_status.txt ``` ### System Information ```bash # Collect system info echo "Docker version: $(docker --version)" > system_info.txt echo "Docker Compose version: $(docker-compose --version)" >> system_info.txt echo "System: $(uname -a)" >> system_info.txt echo "Memory: $(free -h)" >> system_info.txt echo "Disk: $(df -h)" >> system_info.txt ```