Another grumpy sysadmin

[ Home | RSS 2.0 | ATOM 1.0 ]

Fri, 19 Feb 2021

Spam Filtering on Postfix/Dovecot, pt 1.

I've been starting to get an annoying amount of spam here (with a big thank you to UK Buy-to-Lets and UK Property Offers, you utter spam artists), so it's time to set up some spam filtering. In particular, SpamAssassin.

I've used it on other servers before, but they've been running the Exim mail server. The nice thing about Exim is that it has a SpamAssassin plugin which you can install it and set a single configuration option, and everything automagically Just Works™.

This system is running Postfix, which does not have any special integration with SpamAssassin. However, it does have the ability to run incoming emails through external filters in a couple of different ways, and SpamAssassin can be used as such a filter, so we can work with it that way. Fortunately there are a few tutorials on how to set this up out there, including the SpamAssassin Debian Wiki page.

The most straightforward way to integrate SpamAssassin into Postfix seems to be setting it up as a "content filter".

spamc and spamd.

Looking at the Debian wiki page, the way they recommend to set this up is by setting up the spamd daemon, and to have Postfix call spamc to connect to it to check for spamminess.

I've had a look at the spamd man page and the README, and the reason for the spamc/spamd setup is for performance. SpamAssassin takes a comparatively long time to startup compared to the time it takes to classify an email, so by running a daemon which does all the startup ahead of time, the smaller client can connect to it and ask it to classify an email much more quickly.

However, because there ain't no such thing as a free lunch this comes with a couple of costs. First is that the daemon takes up a certain amount of memory all the time, not just when emails are being classified. Second, there are some small but non-zero security concerns involved (see the spamd README linked above for more info).

Further, email is not an instant messaging system. Email is a system designed to tolerate delays and bad connections, with mails sometimes being held on one system, waiting for a time when they can be forwared to another. Therefore, email delivery is not a time-sensitive operation.

As a result of the above, and because I don't receive that much email on this system, and a general desire on my part to only run services that are strictly necessary, I would prefer to take the tradeoff of accepting a long startup time in exchange for not running the spamd service all the time. I therefore want to configure postfix to call spamassassin directly instead of spamc.

Running spamassassin directly.

Except. Looking at the spamassassin man page, it says:

Please note that SpamAssassin is not designed to scan large messages. Don't feed messages larger than about 500 KB to SpamAssassin, as this will consume a huge amount of memory.

Hmmm... how do I limit the size of the messages that spamassassin processes then? Looking through all the other documentation I can find... I can't. SpamAssassin apparently has no option to say "Do not scan messages above a given maximum size", and Postfix apparently has no way to send messages of different sizes to different content filters. How does the recommended configuration handle this? Well, the spamc man page shows that spamc has a -s/--max-size option to limit the maximum message size to pass to spamd. So spamc is the way to limit the maximum size of a message that is passed to spamassassin.

OK, does spamc have to access spamassassin through spamd? Is there any way to make it run spamasssassin directly, if all the other criteria are met?

...No. (At least, not that I've found.)

So, despite the spamd README saying:

Then, configure your system to run spamd in the background, and where your mailer invokes 'spamassassin' instead invoke 'spamc'.

it appears that the original default setup "where your mailer invokes 'spamassassin'", is no longer feasible if you expect to routinely receive emails larger than 500kB. Which you really should, in this day and age. But given that the author of that README did their performance tests on a 400MHz K2-6 CPU (which was discontinued in 2003) with emails between around 10kB - 100kB in size, I guess I can't really blame the advice for not being particularly relevant 20 years later.

Go with the flow.

After all that, I am left with no option but to enable the spamd daemon and connect with spamc in the recommended fashion.⁰

In some ways, this is a good thing. The recommended configuration is one that most users will be running (duh!) so it gets the most testing and will have the fewest number of surprises. Also, if I do run into a problem and need to ask around, I'm more likely to be able to find someone who had the same problem and already been told how to fix it (e.g. on Server Fault), or, if my problem is new, I'm more likely to be able to find someone with a similar setup to help me troubleshoot.

Fortunately, when I did that and sent myself an email with the Generic Test for Unsolicited Bulk Email (GTUBE) text in it, it came through with the following headers:

    X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on localhost
    X-Spam-Flag: YES
    X-Spam-Level: **************************************************
    X-Spam-Status: Yes, score=999.0 required=5.0 tests=ALL_TRUSTED,GTUBE
     autolearn=no autolearn_force=no version=3.4.2

which is sufficient to set up filters that would put spam in users' Spam folders, instead of their Inbox.

That's great! Unfortunately, it's not quite good enough. To be continued in part 2...


⁰ Actually, that's not entirely true. I could write my own helper program that reads its input and only invokes spamassassin if the input is small enough, or passes it straight to the next stage of processing if it's too large. It wouldn't even be that complicated. Probably. But adding my own programs into the email processing pipeline is taking customisation a bit too far for my liking at this stage.

posted at: 14:11 | path: / | permanent link to this entry

Made with Pyblosxom