In
part 2, having rejected using a Postfix Content Filter to check for
spam because it can't reject emails that are sufficiently spammy, I
set up spam filtering with SpamAssassin/spamass-milter.
However, that doesn't tell the whole story, and I ended up there by
more of a roundabout route than that post lets on.
Spam filtering round 2 - Reject with
spamass-milter
?
I did get to the point of setting up
spamass-milter
. I was most of the way through
configuring it when I re-checked the Postfix
Milter documentation, and came across the following
limitation:
When you use [some other Postfix feature], Milter applications
have access only to the SMTP command information; they have no
access to the message header or body, and cannot make modifications
to the message or to the envelope.
...which kind of breaks spam filtering.
I mean, sure, you can do some spam detection on the
SMTP
command information, such as the sender address, but the data
that's best for determining if an email is spam is the content of
email itself. Without that, your spam classification just isn't
going to be very good at all.
Combining that problem with the "2 daemons
connected by another program" setup that I already wasn't very
happy with, I decided to look into another approach.
Spam filtering round 3 - Before-Queue Filter with
spampd
Looking further through the Postfix documentation, I realised
that I wanted a Before-Queue
Content Filter proxy. This is a program that Postfix passes the
email to, which can filter or transform it as it wants, and then
either tell Postfix to reject the email, or pass it back into the
Postfix queue.
Fortunately, the Spam Proxy Daemon
spampd exists, which uses SpamAssassin to do exactly that.
Also:
spampd was initially designed as a content filter
mechanism for use with the Postfix MTA.
...which is good. But also, from the spampd
man
page, under
Installation it says:
Note that spampd replaces
spamd from the SpamAssassin distribution in
function. You do not need to run spamd in order for
spampd to work.
...and that's great! I installed it, got it up and running, and
started to configure it appropriately to reject unwanted emails.
But then, looking further through the spampd
man page,
right at the bottom, under To
Do it says:
Add configurable option for rejecting mail outright based on
spam score.
So, while Postfix makes it possible for a proxy to reject
emails, this particular one can't. Which is the only thing I wanted
it for.
...well, that's just fantastic. *sigh*
Spam filtering round 4 - Before-Queue Filter with custom
proxy
a.k.a. Fine, I'll do it myself
Having familiarised myself with how the proxy setup works with
spampd
, I thought I might be able to create one myself
without too much effort.
According to my reading of the section How
Postfix talks to the before-queue content filter, it sounded to
me like Postfix just opens a connection to the proxy, and passes
the whole email command queue and message to it in one long stream
of data. Then it expects a single response, and for the proxy to
pass the whole stream back into Postfix as-is if the email is
accepted. That didn't actually sound too hard.
In particular, Linux has programs which mean you don't have to
deal with the complexity of handling network connections between
Postfix and the proxy yourself. There are some which will listen
for incoming connections for you, and pass the data as if it were
coming from a file. Traditionally this was done with one of the
many implementations/replacements of inetd, but more recently
this functionality is part of the systemd service manager, with systemd.socket
activation. Then, there are programs that will accept data as
if you were writing it to a file, and send it on to a network
connection instead, such as the many implementations of netcat.
The plan was: Read the whole email command stream and message
from Postfix, pass it to SpamAssassin, and then based on the spam
score either reject it, or accept it and pass the whole thing back
into Postfix again.
That's about 12 lines of shell script. No
problem.
It always takes longer than you expect...
...even when you take into account Hofstadter's Law
-- Hofstadter's
Law
Having got a (roughly) 12-line shell script prototype up and
running, I got Postfix to connect to it, but it wasn't receiving
any data. Playing around with some ideas, I tried writing the
response immediately, and I started to be able to read the command
stream. If I wrote more data back to the program that was sending
it, I got more of the command stream.
I realised that Postfix actually wanted to talk the full SMTP
protocol to the proxy.
This means sending a "service ready" message as soon as Postfix
connects, and then acknowledge each command Postfix sends me before
it would send another, until it finally sends the email data. This
isn't really that hard (hurrah for text-based protocols!) but it
was more complex than I was expecting. If I'd realised this at the
start, I wouldn't have picked shell scripting as the language to do
it in.
But by the time I'd figured all this out, I'd written the first
90% of the proxy anyway. Given that, I thought I might as well
finish up the second
90% of the work.
The result was spamprox
- The barely SMTP-capable spam proxy. It's kind of a bodge job,
doesn't really understand what it's doing, and would probably fail
badly if it had to deal with anything other than the exact commands
Postfix normally sends. On the plus side, it doesn't have to deal
with anything else, and for the one role it has, it works!
All the ways that other people had come up with to do spam
filtering weren't good enough, but I had managed to create one that
was.
A haughty spirit before a fall
Having got through all that, I started to write everything up.
After lots of writing and editing and changing tone, as I was
getting close to finishing, I went back through all the
documentation so I could link to anything relevant and include
quotes where appropriate. I dug up the Postfix Milter documentation
to get the exact text of the "you can't rely on having the email
contents in milters" limitation and found:
When you use the before-queue content filter for
incoming SMTP mail (see SMTPD_PROXY_README), Milter
applications have access only to the SMTP command
information...
Wait, what?
The "[some other Postfix feature]" that I'd dismissed as being
an internal filter or feature I might have wanted to enable at some
point, turned out to be "an external proxy" which you
might use for something like... a spam filter. i.e. The reason that
a milter wouldn't have the data it needed for spam filtering, would
only be because I had also set up a proxy to do spam filtering?
Except, I'd never need to set up both, so my fear about milters not
working was completely unfounded. I just didn't have the
knowledge I needed to understand that properly when I first read
the docs.
I still wasn't entirely happy with the "2 daemons connected by
another program" milter setup, but given that the alternative was a
home-grown bodge job, written in a language that didn't turn out to
be well-suited for the complexity of the task, I thought that maybe
I ought to reconsider my approach.
The benefit of going on a wild goose chase...
...is that at least you get plenty of exercise.
-- Author unknown, paraphrased
Once I'd got past my
reluctance to throw away the work I'd put into
spamprox
, and remembered that I'd prefer not
to have to maintain my own custom parts in my email setup, that's
when I decided to go back to the spamass-milter
setup
- as described in Part 2.
That decision also involved abandoning a lot of the blog
write-up I'd already done for that work. The tone I'd gone with, of
getting increasingly exasperated with the failure of one approach
after another to try and make existing components work in what
seemed to be the most obvious way, didn't make sense anymore. The
run-up to getting the milter approach working (i.e. Part 2 of the
series) didn't have enough failures to make the point, and once I'd
realised that the chain of failures was due to my own
misunderstanding, I needed to rewrite everything that came after
(i.e. this post) to be a lot more humble.
Still, I learned quite a bit. So it wasn't a complete waste.
Anyway, in between going through 4 different approaches to spam
filtering, including writing my own proxy to be part of that,
writing nearly all of it up before discovering my mistake, wavering
over the decision to abandon that and go back to a previously
dismissed approach, and then finding the motivation to rewrite the
writeup, accounts for a reasonable proportion of the delay between
Part 1 and Part 2.
Hopefully Part 3 will be completed much more promptly. See you
then...