Your vote?

pydspam

Bayesian Message Filtering for Python
or
Integrating Python with DSPAM

by Stuart D. Gathman
This web page is written by Stuart D. Gathman
and
sponsored by Business Management Systems, Inc.
Last updated Oct 04, 2011

This project provides Python support for fast sophisticated bayesian message filtering. It is based on the excellent DSPAM project provided by Jonathan A. Zdziarski. I have moved RPMS for dspam to a separate project. Neither BMS or Stuart Gathman are affiliated with Jonathan Zdziarski or Network Dweebs, except as enthusiastic users of their free product. Dspam was chosen because it provides a library with a C API in addition to a complete MDA based spam filtering application. Python applications use the C API through an extension module. Using a C library is faster than a pure Python bayesian filter.

What is DSPAM? Here is an excerpt from the DSPAM project README:

DSPAM is an open-source, freely available anti-spam solution designed to combat unsolicited commercial email using Baye's theorem of combined probabilities. The result is an administratively maintenance free system capable of learning each user's email behaviors with very few false positives.
DSPAM can be implemented in one of two ways:

The DSPAM mailer-agent provides server-side spam filtering, quarantine box, and a mechanism for forwarding spams into the system to be automatically analyzed.
Developers may link their projects to the dspam core engine (libdspam) in accordance with the GPL license agreement. This enables developers to incorporate libdspam as a "drop-in" for instant spam filtering within their applications - such as mail clients, other anti-spam tools, and so on.
Many of the ideas incorporated into this agent were contributed by Paul Graham's excellent white paper on combatting SPAM. Many new approaches have also been implemented by DSPAM.

DSPAM RPM support for Python

In dspam-2.6 RPMS, the pydspam project is included as the dspam-python sub-package which is built from the pydspam source for pydspam version prior to 1.1.5. If you don't wish to build the python package, set the build_python macro to 0 at the top of the RPM spec file in the source RPM.

Beginning with pydspam-1.1.5, pydspam is its own RPM which obsoletes dspam-python. The dspam-python or pydspam binary RPM provides a Python module which wraps the dspam core engine (libdspam). Some of the dspam command line tools are reimplemented in Python to illustrate use of the library. (Installed as documentation by the RPM.) A new tool, pydspam_anal.py, shows the contribution each token of a message makes to the total DSPAM score.

In dspam-2.8, pydspam has its own RPM.

Header Triage with Dspam and Python Milter

For a really powerful mail filtering system, combine the DSPAM Python module with sendmail and Python Milter. For instance, here is a simple change to milter-0.5.5 I am testing: Patch to bms.py from milter-0.5.5.

The dictionary is the one maintained by the dspam delivery agent installed with the dspam package. Scanning the headers in the milter allows us to REJECT common spams without a lot of processing.

To show just how bad the spam problem is, here are statistics for our domain with just 6 users. Two users (including me) are published on the web with HTML encoding. I also use my real email when posting to newsgroups. Because my email is acessible, I receive welcome email from fellow techies all over the world.

Statistics for Jul 15
1139 Messages from known spamming domains refused by sendmail.
160 Messages REJECTED by milter because of banned keywords like 'viagra'.
169 Messages REJECTED by milter because of high Dspam scores for headers.
261 Messages quarrantined by Dspam mail delivery agent.
40 Actual email received for 6 users.

Statistics for Jul 15
1139	Messages from known spamming domains refused by sendmail.
160	Messages REJECTED by milter because of banned keywords like 'viagra'.
169	Messages REJECTED by milter because of high Dspam scores for headers.
261	Messages quarrantined by Dspam mail delivery agent.
40	Actual email received for 6 users.

We do not use a black hole list for known spamming IPs / domains. This is because some of our customers use blacklisted ISPs because they are the only broadband available in their area. Black hole lists like to blacklist entire ISPs, including innocent customers who have no other choice (other than dialup) for connectivity. With a little python programming to collect data, DSPAM will allow us to automate building the list of banned IPs / domains.

The header triage feature will be in milter-0.5.6. I envision a complete milter based implementation of dspam which appoints selected email destinations as 'moderators'. The MDA approach currently used by dspam requires all users to diligently classify their email to train the filter. In the new approach, moderators will do this work, and the resulting dspam dictionary used to filter mail for other users in their group.

Downloads

Source and binaries have been moved to the Pymilter project on Sourceforge. Older binaries can still be found below.

DSPAM RPMs

RPMs for dspam have been moved to the libdspam project. I am working on a pydspam RPM for dspam-2.10. Currently, pydspam RPMS require dspam-2.6.5.2.

Activating the DSPAM CGI script

The RPM installs the CGI interface in the /var/www/cgi-bin/dspam directory. A wrapper script is installed as /var/www/cgi-bin/pydspam.cgi. The wrapper script runs the DSPAM CGI interface as the dspam user - which is also a member of the mail group.

To enable the CGI interface, you need to add an authorization entry to /etc/httpd/conf/httpd.conf. For example,

    ScriptAlias /cgi-bin/ "/var/www/cgi-bin/"

    #
    # "/var/www/cgi-bin" should be changed to whatever your ScriptAliased
    # CGI directory exists, if you have that configured.
    #
    <Directory "/var/www/cgi-bin">
	AuthName Dspam
	AuthType Basic
	AuthUserFile /etc/httpd/conf/passwd
	AuthGroupFile /etc/httpd/conf/group
	Require group dspam
        AllowOverride None
        Options None FollowSymLinks
        Order allow,deny
        Allow from all
    </Directory>

You must also modify the script at /var/www/cgi-bin/dspam/dspamcgi.py to change the DOMAIN configuration to your domain at a minimum.

pydspam Binary RPM

Binary RPMs are compiled for python2.3. Goto the Sourceforge site for pydspam-1.1.8 compiled for python2.4.

pydspam-1.1.7-1.i386.rpm Binary RPM for RH7.2.

pydspam-1.1.7-1.i386.rpm Binary RPM for RH7.3.

pydspam-1.1.6-1.i386.rpm Binary RPM for RH7.2.

pydspam-1.1.6-1.i386.rpm Binary RPM for RH7.3.

pydspam-1.1.5-1.i386.rpm Binary RPM for RH7.2.

pydspam Source RPM

pydspam-1.1.7-1.src.rpm Source RPM for RH7.x.

pydspam-1.1.6-1.src.rpm Source RPM for RH7.x.

pydspam-1.1.5-1.src.rpm Source RPM for RH7.x.

Sources

pydspam-1.1.7.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.6.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.5.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.4.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.3.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.2.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.1.1.tar.gz Python interface to libdspam and some dspam utilities in python.

pydspam-1.0.tar.gz Python interface to libdspam and some dspam utilities in python.

Send Spam