Pages

Friday, August 13, 2010

E-mail relay: emailrelay

Imagine you work in a company with dozens of employees, e-mailing each other, e-mailing clients etc.
  1. You'd like to sample correspondences from time to time, you achieve this by storing the messages and attachments in a database (nosql or sql-based);
  2. ...  but you don't want the hassle of setting up a foll mta (mail transfer agent), with foll-blown policies, like postfix and the like.
  3. .... you also want out-going e-mails scanned for spam and viruses, to prevent from getting blacklisted by other ISPs (in case there's an malware infection somewhere in the local network);
  4. ... you'd like to get jiggy with new open protoclis like xmpp and transfer internal e-mails over xmpp as well, with attachments too;
  5. ... recognize and reroute internal e-mails by bypassing the ISP's smtp server and sending it to the Exchange server or what have you;
  6. ... you want to mount the database which stores the e-mails and apply some clustering algorithm on it to gather data while e-mails are in transit... in other words online learning;
  7. ... you also have VOIP over SIP, which you would like to be stored with the e-mails.

Relaying e-mail


How would you do all that?

Personally, as a programmer, my initial reaction was: I'll roll it myself (using erlang)! I don't have the time to actually do it of course. The most I can do with the given time is to write some bash files to glue existing projects together.

I googled a bit and found a nice project called emailrelay. This wee project does what it advertises. Even better, it's licensed under gpl 3.0.

Why reinvent the wheel? When all the components are already here? Spamassassin, clamav, cron, ejabberd, etc.

So what does it do exactly? It can relay e-mails directly, or do it in two steps: collect and relay. Why is this relevant? It's relevant because mailrelay allows you to use third-party software to manipulate the e-mails in-transit.

For instance, during the collect phase we defer the decision to relay them or not to spamassassin and clamav, and during the relay phase compare the recipient address to the local addresses, and thus bypass the ISP MTA by sending it directly to the recipient (by some protocol i.e. smtp, xmpp, what have you).

Another example, imagine you want to send yourself a TODO item via emailrelay and you want this item synchronized with your existing TODO lists. All you need to do is grab some api/library for your favourite scripting language and voila, you're almost done. You still need to write the glue code though.

You can pick your favourite programming language (e.g. bash, java, python, ruby...) and let mailrelay pass a parameter to your application or script (is there a difference?) and compose whatever behaviour or policy you want during the collect or relay phase.


Some benefits of this approach in contrast to a traditional full-blown MTA:
  • You can test each piece of software at any level;
  • You can adapt to any network topology depending on where the bottle-necks are and how your software pieces should operate optimally (i.e. serial or parallel on one or multiple computers, multiple processes, single processes, vmguest or real machine...);
  • You can mix and match pieces of technology to get what you want. See this example of spamassassin.

The drawbacks of this approach:
  • You have to write your own (standard) policies and (standard) behaviour;
  • No monitoring of crashing components (supervisor a la erlang, anyone?) between the cooperating processes;
Non-monolithic composable software. Ain't it grand?

We use couchdb to store the e-mails and attachments etc. and replicate it to other nodes etc.

Here's some code to get you started.

Daemon initialization code:

#!/bin/bash
set -e -x

ROOT=/home/administrator/email
/usr/local/etc/init.d/couchdb restart
if [ -e $ROOT/scripts/pid-emailrelay-server.pid ]; then
  kill `cat $ROOT/scripts/pid-emailrelay-server.pid`
fi
if [ -e $ROOT/scripts/pid-emailrelay-client.pid ]; then                  
  kill `cat $ROOT/scripts/pid-emailrelay-client.pid`
fi
#server
$(which emailrelay) --as-server --port 2022 --spool-dir $ROOT/spool \
  --filter $ROOT/scripts/emailrelay-server-filter.sh --filter-timeout 60 \
  --pid-file $ROOT/scripts/pid-emailrelay-server.pid --verbose
#client
$(which emailrelay) --as-server --no-smtp --forward --forward-to smtp.somewhere.org:smtp \
  --spool-dir $ROOT/spool --poll 60 \
  --pid-file $ROOT/scripts/pid-emailrelay-client.pid --verbose

## -d = --log --close-stderr
##    = -l    -e
## -q = --log --no-syslog --no-daemon --dont-serve --forward --forward-to <host:port>
##    = -l    -n          -t          -x           -f        -o           <host:port>

Filter script:

#!/bin/sh

set -e -x

ROOT=/home/administrator/email
$(which python) $ROOT/scripts/couch.py email $1 # couchdb python client script, first parameter is the couchdb db
clamdscan --no-summary --move=$ROOT/infected - < $1
$(which spamassassin) "$1" > "$1.tmp"
mv "$1.tmp" "$1"

exit 0

No comments:

Post a Comment

Please help to keep this blog clean. Don't litter with spam.