Home
Download

Open Source

Projects
Patches

System Integration

Notes
SRPMs

A technique for filtering spam

This technique uses bogofilter to categorize email as spam or ham on a server, and it uses spamassassin and manual characterization on a client computer to train bogofilter. This assumes that the administrator will use the client computer to review spamassassin's classification decisions and deem missed email as spam where appropriate. Any unread email found in spam-samples was placed there as a result of automatic (spamassassin or bogofilter) classification; read mail was either hand-classified or manually reviewed. The efficacy of this technique requires that the administrator's spam resemble each user's spam.

  1. Install and configure Postfix and bogofilter on your mail server.
  2. Use spamassassin and mutt on a client machine to continuously train bogofilter:
    1. Configure spamassassin:
      required_hits 3.5
      report_safe 0
      
    2. Configure procmail to filter incoming mail using spamassassin and to move the email classified as spam to the folder spam-samples:
      :0:
      * ^X-Bogosity: (Spam|Yes)
      $MAILDIR/spam-samples
      
      # Process with spamassassin unless too big.   
      :0fw: spamassassin.lock
      * < 1048576   
      | spamassassin                           
      
      # Dump spamassassin spam in spam-samples.
      :0:                  
      * ^X-Spam-Status: Yes            
      $MAILDIR/spam-samples
      
    3. Add a cronjob to analyze spam-samples using bogofilter and install the resulting wordlist.db:
      0	0	*	*	*	rm -f ~/mail/wordlist.db
      	&& grep -av '\(^X-Spam[^ ]*:\|^X-Bogosity:\)' ~/mail/spam-samples | bogofilter -d ~/mail -M -s
      	&& grep -av '\(^X-Spam[^ ]*:\|^X-Bogosity:\)' ~/mail/ham-samples  | bogofilter -d ~/mail -M -n
      	&& scp ~/mail/wordlist.db root@example.com:/etc/bogofilter/
      
    4. Configure mutt with hotkeys which manually characterize email as spam or ham and spam index highlights:
      color index black brightred '~h "X-Spam-Flag: YES"'    # Spamassassin.
      color index black brightyellow '~h "X-Bogosity: Spam"' # Bogofilter.
      
      macro index S "\
      <enter-command>set resolve=no<enter>\
      <clear-flag>N\
      <enter-command>set resolve=yes<enter>\
      <save-message>=spam-samples<enter><enter>" "Save to spam-samples"
      
      macro pager S "\
      <enter-command>set resolve=no<enter>\
      <clear-flag>N\
      <enter-command>set resolve=yes<enter>\
      <save-message>=spam-samples<enter><enter>" "Save to spam-samples"
      
      macro index H "\
      <enter-command>set my_resolve=\$resolve resolve=no<enter>\
      <copy-message>=ham-samples<enter><enter>\
      <enter-command>set resolve=\$my_resolve<enter>" "Copy to ham-samples"
      
      macro pager H "\
      <enter-command>set my_resolve=\$resolve resolve=no<enter>\
      <copy-message>=ham-samples<enter><enter>\
      <enter-command>set resolve=\$my_resolve<enter>" "Copy to ham-samples"
      
      macro index B "<shell-escape>rm -f ~/mail/wordlist.db
      && grep -av '\\(\^X-Spam[^ ]*:\\|\^X-Bogosity:\\)' ~/mail/spam-samples | bogofilter -d ~/mail -M -s\
      && grep -av '\\(\^X-Spam[^ ]*:\\|\^X-Bogosity:\\)' ~/mail/ham-samples  | bogofilter -d ~/mail -M -n\
      && scp ~/mail/wordlist.db root@www.flyn.org:/etc/bogofilter/<enter>" "Push bogofilter samples"
      
Email: www@flyn.org — ✉ 6110 Campfire Court; Columbia, Maryland 21045; USA