stirfry-spam-filter/README.md
2024-12-19 21:25:26 +00:00

4.3 KiB

Strfry Nostr Spam Filter

A sophisticated spam filtering system for Nostr relays, designed to protect against various types of spam while maintaining a good experience for legitimate users.

Features

  • Reputation-based filtering system
  • Per-user rate limiting
  • Content similarity detection
  • New account protection
  • Bot behavior detection
  • Progressive penalty system
  • Automatic recovery mechanism
  • Kind-specific rate limits
  • Configurable thresholds

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/strfry-spam-filter.git
  1. Make the filter executable:
chmod +x relay-spam-filter.js
  1. Configure your strfry relay to use the filter:
# Add to your strfry configuration
plugins = [
  {
    exec = "/path/to/relay-spam-filter.js"
  }
]

Configuration

The filter includes many configurable options. Here are the key settings:

Rate Limits

maxRepliesPerMinute: 50,        // Maximum replies per minute per user
maxRepliesPerHour: 400,         // Maximum replies per hour per user

Event-Type Specific Limits

kindSpecificLimits: {
    0: { maxPerHour: 10 },      // Profile updates
    3: { maxPerHour: 5 },       // Contact list updates
    1: {                        // Regular posts
        maxPerMinute: 30,
        maxPerHour: 200
    }
}

Reputation System

reputationConfig: {
    initialScore: 100,          // Starting reputation for new users
    goodEventBonus: 1,          // Points gained for good events
    spamPenalty: -15,           // Base penalty for spam
    recoveryRate: 3,            // Points recovered per hour
    minScore: -100,             // Minimum possible reputation
    maxScore: 1000,             // Maximum possible reputation
    blockThreshold: -50,        // Users blocked at this score
    blockRecoveryThreshold: -25 // Must recover to this to post again
}

New User Protection

newPubkeyReplyThreshold: 60,    // Seconds new users must wait to reply
newPubkeyMaxPostsIn5Min: 10     // Maximum posts in first 5 minutes

Content Analysis

contentSimilarityThreshold: 0.8, // 80% similarity triggers spam detection
fastReplyThreshold: 30,          // Minimum seconds between reply and original post

How It Works

Reputation System

The filter maintains a reputation score for each user:

  1. New users start at 100 points
  2. Good behavior slowly increases reputation
  3. Spam behavior results in penalties:
    • -15 points above 0 reputation
    • -20 points below -25 reputation
    • -25 points below -50 reputation
  4. Recovery rates:
    • 3 points/hour above 0
    • 2 points/hour between -25 and 0
    • 1 point/hour below -50
  5. Users are blocked at -50 until they recover to -25

Spam Detection

The filter checks for several types of spam behavior:

  1. Rate Limiting

    • Monitors post frequency per user
    • Different limits for different event types
    • Limits scale with reputation
  2. Content Analysis

    • Checks for duplicate content across users
    • Detects suspiciously fast replies
    • Identifies bot-like behavior patterns
  3. New Account Protection

    • Stricter limits for new accounts
    • Longer waiting periods for replies
    • Limited initial posting rate
  4. Bot Detection

    • Identifies automated posting patterns
    • Checks for relay URL spam
    • Monitors reply timing patterns

Progressive Enforcement

The system becomes progressively stricter with problematic behavior:

  1. Initial warnings and small penalties
  2. Increased penalties for continued violations
  3. Stricter rate limits for low-reputation users
  4. Eventual blocking for persistent offenders

Bypass Rules

Certain event kinds can bypass specific checks:

allowedKinds: [3, 5, 10001, 10002, 30311],  // Bypass content checks
bypassAllChecksKinds: [38383]                // Bypass all checks

Monitoring

The filter provides detailed logging:

[2024-12-19T10:15:30.123Z] Event abc123 from pubkey xyz789 (reputation: 85.50)
[2024-12-19T10:15:30.124Z] Rate limit exceeded for pubkey xyz789

Performance Considerations

  • Memory usage is managed through periodic cleanup
  • Old events and stats are automatically purged
  • Reputation data persists for 24 hours of inactivity

Contributing

Contributions are welcome!

License

MIT License - See LICENSE file for details