Some checks are pending
CI Pipeline / Run Tests (push) Waiting to run
CI Pipeline / Lint Code (push) Waiting to run
CI Pipeline / Security Scan (push) Waiting to run
CI Pipeline / Build Docker Images (push) Blocked by required conditions
CI Pipeline / E2E Tests (push) Blocked by required conditions
395 lines
7.6 KiB
Markdown
395 lines
7.6 KiB
Markdown
# Troubleshooting Guide
|
|
|
|
## Common Issues and Solutions
|
|
|
|
### Service Startup Issues
|
|
|
|
#### Gateway Won't Start
|
|
|
|
**Symptoms:** Container exits immediately or health checks fail
|
|
|
|
**Diagnostic Steps:**
|
|
```bash
|
|
# Check container logs
|
|
docker-compose -f docker-compose.prod.yml logs gateway
|
|
|
|
# Check database file
|
|
ls -la data/metadata.db
|
|
|
|
# Test database connection
|
|
sqlite3 data/metadata.db "SELECT COUNT(*) FROM files;"
|
|
```
|
|
|
|
**Common Causes & Solutions:**
|
|
|
|
1. **Database permissions:**
|
|
```bash
|
|
sudo chown -R $USER:$USER data/
|
|
chmod -R 755 data/
|
|
```
|
|
|
|
2. **Port conflicts:**
|
|
```bash
|
|
# Check what's using port 9876
|
|
sudo netstat -tulpn | grep 9876
|
|
# Kill conflicting process or change port
|
|
```
|
|
|
|
3. **Insufficient disk space:**
|
|
```bash
|
|
df -h
|
|
# Free up space or add storage
|
|
```
|
|
|
|
#### Redis Connection Issues
|
|
|
|
**Symptoms:** Gateway logs show Redis connection errors
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# Check Redis container
|
|
docker-compose -f docker-compose.prod.yml logs redis
|
|
|
|
# Test Redis connection
|
|
docker exec -it torrentgateway_redis_1 redis-cli ping
|
|
|
|
# Restart Redis
|
|
docker-compose -f docker-compose.prod.yml restart redis
|
|
```
|
|
|
|
### Performance Issues
|
|
|
|
#### High CPU Usage
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check container resource usage
|
|
docker stats
|
|
|
|
# Check system resources
|
|
top
|
|
htop
|
|
```
|
|
|
|
**Solutions:**
|
|
1. **Scale gateway instances:**
|
|
```bash
|
|
docker-compose -f docker-compose.prod.yml up -d --scale gateway=2
|
|
```
|
|
|
|
2. **Optimize database:**
|
|
```bash
|
|
./scripts/migrate.sh # Runs VACUUM and ANALYZE
|
|
```
|
|
|
|
3. **Add resource limits:**
|
|
```yaml
|
|
services:
|
|
gateway:
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '1.0'
|
|
memory: 1G
|
|
```
|
|
|
|
#### High Memory Usage
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check memory usage by container
|
|
docker stats --format "table {{.Container}}\t{{.CPUPerc}}\t{{.MemUsage}}"
|
|
|
|
# Check for memory leaks in logs
|
|
docker-compose logs gateway | grep -i "memory\|leak\|oom"
|
|
```
|
|
|
|
**Solutions:**
|
|
1. **Restart affected containers:**
|
|
```bash
|
|
docker-compose -f docker-compose.prod.yml restart gateway
|
|
```
|
|
|
|
2. **Implement memory limits:**
|
|
```yaml
|
|
services:
|
|
gateway:
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
memory: 2G
|
|
```
|
|
|
|
#### Slow Response Times
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Test API response time
|
|
curl -w "@curl-format.txt" -o /dev/null -s http://localhost:9876/api/health
|
|
|
|
# Check database performance
|
|
sqlite3 data/metadata.db "EXPLAIN QUERY PLAN SELECT * FROM files LIMIT 10;"
|
|
```
|
|
|
|
**Solutions:**
|
|
1. **Add database indexes:**
|
|
```bash
|
|
./scripts/migrate.sh # Applies performance indexes
|
|
```
|
|
|
|
2. **Optimize storage:**
|
|
```bash
|
|
# Check storage I/O
|
|
iostat -x 1 5
|
|
```
|
|
|
|
### Database Issues
|
|
|
|
#### Database Corruption
|
|
|
|
**Symptoms:** SQLite errors, integrity check failures
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check database integrity
|
|
sqlite3 data/metadata.db "PRAGMA integrity_check;"
|
|
|
|
# Check database size and structure
|
|
sqlite3 data/metadata.db ".schema"
|
|
ls -lh data/metadata.db
|
|
```
|
|
|
|
**Recovery:**
|
|
```bash
|
|
# Attempt repair
|
|
sqlite3 data/metadata.db "VACUUM;"
|
|
|
|
# If repair fails, restore from backup
|
|
./scripts/restore.sh $(ls backups/ | grep gateway_backup | tail -1 | sed 's/gateway_backup_\(.*\).tar.gz/\1/')
|
|
```
|
|
|
|
#### Database Lock Issues
|
|
|
|
**Symptoms:** "database is locked" errors
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# Find processes using database
|
|
lsof data/metadata.db
|
|
|
|
# Force unlock (dangerous - stop gateway first)
|
|
docker-compose -f docker-compose.prod.yml stop gateway
|
|
rm -f data/metadata.db-wal data/metadata.db-shm
|
|
```
|
|
|
|
### Storage Issues
|
|
|
|
#### Disk Space Full
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check disk usage
|
|
df -h
|
|
du -sh data/*
|
|
|
|
# Find large files
|
|
find data/ -type f -size +100M -exec ls -lh {} \;
|
|
```
|
|
|
|
**Solutions:**
|
|
1. **Clean up old files:**
|
|
```bash
|
|
# Remove files older than 30 days
|
|
find data/blobs/ -type f -mtime +30 -delete
|
|
find data/chunks/ -type f -mtime +30 -delete
|
|
```
|
|
|
|
2. **Cleanup orphaned data:**
|
|
```bash
|
|
./scripts/migrate.sh # Removes orphaned chunks
|
|
```
|
|
|
|
#### Storage Corruption
|
|
|
|
**Symptoms:** File integrity check failures
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Run E2E tests to verify storage
|
|
./test/e2e/run_all_tests.sh
|
|
|
|
# Check file system
|
|
fsck /dev/disk/by-label/data
|
|
```
|
|
|
|
### Network Issues
|
|
|
|
#### API Timeouts
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Test network connectivity
|
|
curl -v http://localhost:9876/api/health
|
|
|
|
# Check Docker network
|
|
docker network ls
|
|
docker network inspect torrentgateway_default
|
|
```
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# Restart networking
|
|
docker-compose -f docker-compose.prod.yml down
|
|
docker-compose -f docker-compose.prod.yml up -d
|
|
|
|
# Increase timeouts in client
|
|
curl --connect-timeout 30 --max-time 60 http://localhost:9876/api/health
|
|
```
|
|
|
|
#### Port Binding Issues
|
|
|
|
**Symptoms:** "Port already in use" errors
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check port usage
|
|
sudo netstat -tulpn | grep :9876
|
|
sudo lsof -i :9876
|
|
```
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# Kill conflicting process
|
|
sudo kill $(sudo lsof -t -i:9876)
|
|
|
|
# Or change port in docker-compose.yml
|
|
```
|
|
|
|
### Monitoring Issues
|
|
|
|
#### Prometheus Not Scraping
|
|
|
|
**Diagnostic:**
|
|
```bash
|
|
# Check Prometheus targets
|
|
curl -s http://localhost:9090/api/v1/targets
|
|
|
|
# Check metrics endpoint
|
|
curl -s http://localhost:9876/metrics
|
|
```
|
|
|
|
**Solutions:**
|
|
```bash
|
|
# Restart Prometheus
|
|
docker-compose -f docker-compose.prod.yml restart prometheus
|
|
|
|
# Check configuration
|
|
docker-compose -f docker-compose.prod.yml exec prometheus cat /etc/prometheus/prometheus.yml
|
|
```
|
|
|
|
#### Grafana Dashboard Issues
|
|
|
|
**Common Problems:**
|
|
1. **No data in dashboards:**
|
|
- Check Prometheus data source configuration
|
|
- Verify metrics are being collected
|
|
|
|
2. **Dashboard import failures:**
|
|
- Check JSON syntax
|
|
- Verify dashboard version compatibility
|
|
|
|
### Log Analysis
|
|
|
|
#### Finding Specific Errors
|
|
|
|
```bash
|
|
# Gateway application logs
|
|
docker-compose -f docker-compose.prod.yml logs gateway | grep -i error
|
|
|
|
# System logs with timestamps
|
|
docker-compose -f docker-compose.prod.yml logs --timestamps
|
|
|
|
# Follow logs in real-time
|
|
docker-compose -f docker-compose.prod.yml logs -f gateway
|
|
```
|
|
|
|
#### Log Rotation Issues
|
|
|
|
```bash
|
|
# Check log sizes
|
|
docker-compose -f docker-compose.prod.yml exec gateway ls -lh /app/logs/
|
|
|
|
# Manually rotate logs
|
|
docker-compose -f docker-compose.prod.yml exec gateway logrotate /etc/logrotate.conf
|
|
```
|
|
|
|
## Emergency Procedures
|
|
|
|
### Complete Service Failure
|
|
|
|
1. **Stop all services:**
|
|
```bash
|
|
docker-compose -f docker-compose.prod.yml down
|
|
```
|
|
|
|
2. **Check system resources:**
|
|
```bash
|
|
df -h
|
|
free -h
|
|
top
|
|
```
|
|
|
|
3. **Restore from backup:**
|
|
```bash
|
|
./scripts/restore.sh <timestamp>
|
|
```
|
|
|
|
### Data Recovery
|
|
|
|
1. **Create immediate backup:**
|
|
```bash
|
|
./scripts/backup.sh emergency
|
|
```
|
|
|
|
2. **Assess data integrity:**
|
|
```bash
|
|
sqlite3 data/metadata.db "PRAGMA integrity_check;"
|
|
```
|
|
|
|
3. **Restore if necessary:**
|
|
```bash
|
|
./scripts/restore.sh <last_good_backup>
|
|
```
|
|
|
|
## Getting Help
|
|
|
|
### Log Collection
|
|
|
|
Before reporting issues, collect relevant logs:
|
|
|
|
```bash
|
|
# Create diagnostics package
|
|
mkdir -p diagnostics
|
|
docker-compose -f docker-compose.prod.yml logs > diagnostics/service_logs.txt
|
|
./scripts/health_check.sh > diagnostics/health_check.txt 2>&1
|
|
cp data/metadata.db diagnostics/ 2>/dev/null || echo "Database not accessible"
|
|
tar -czf diagnostics_$(date +%Y%m%d_%H%M%S).tar.gz diagnostics/
|
|
```
|
|
|
|
### Health Check Output
|
|
|
|
Always include health check results:
|
|
```bash
|
|
./scripts/health_check.sh | tee health_status.txt
|
|
```
|
|
|
|
### System Information
|
|
|
|
```bash
|
|
# Collect system info
|
|
echo "Docker version: $(docker --version)" > system_info.txt
|
|
echo "Docker Compose version: $(docker-compose --version)" >> system_info.txt
|
|
echo "System: $(uname -a)" >> system_info.txt
|
|
echo "Memory: $(free -h)" >> system_info.txt
|
|
echo "Disk: $(df -h)" >> system_info.txt
|
|
``` |