In part 3, I tried to get Dovecot with the dovecot-antispam plugin to enable per-user adaptive spam filtering with CRM114.
So, sometimes I have an idea about how I want to solve a problem, which is apparently different from how everyone else does it.
That tends to be awkward, because all the documentation and write-ups other people have done aren't geared to my use case.
This can be because I don't know what I'm doing, but when I figure that out, the way everyone else does it makes sense and I end up doing that.
This time I tried not to have a strong opinion on how to set up dovecot-antispam
, and instead let the documentation and examples guide me to the typical use case.
Unfortunately, that didn't work out.
Now I have to pick an adaptive filtering system that isn't CRM114
, and make that work instead.
Looking through the Debian package repository for spam filters, it turns out that there are quite a few. I don't know much about most of them.
Fortunately, Debian has the Popularity Contest. It's an opt-in feature that allows Debian users to have their systems report which packages they have installed, so that data can be aggregated to get an idea for how popular each package is. For each spam filter I could find, I had a look at the popcon numbers, and which versions are present in the last three releases of Debian to get an idea of whether they're still under active development:
Package | Popcon | Bullseye | Buster | Jessie | Notes |
---|---|---|---|---|---|
bmf | 6 | 0.9.4 | 0.9.4 | 0.9.4 | |
bogofilter | 43,823 | 1.2.5 | 1.2.4 | 1.2.4 | Dependency of evolution-plugin-bogofilter (popcon 31,690), which is recommended by evolution (popcon 44,217) |
bsfilter | 494 | 1.0.19 | 1.0.19 | 1.0.19 | |
crm114 | 128 | 20100106 | 20100106 | 20100106 | |
ifile | 8 | 1.3.9 | 1.3.9 | 1.3.9 | |
mailfilter | 30 | 0.8.7 | 0.8.6 | 0.8.6 | |
qsf | 8 | - | 1.2.7 | 1.2.7 | |
scmail | 4 | 1.3.4 | 1.3.4 | 1.3.4 | |
spamassassin | 7,496 | 3.4.6 | 3.4.2 | 3.4.2 | |
spambayes | 95 | - | 1.1 | 1.1 | |
spamoracle | 13 | 1.6 | 1.4 | 1.4 | |
spamprobe | 426 | 1.4 | 1.4 | 1.4 | |
sylfilter | 286 | 0.8 | 0.8 | 0.8 |
bogofilter tops the popularity list by a huge margin. So much so in fact that I had to look into why that might have been, and discovered that it works as a plugin with a couple of email clients. Notably, it is installed by default with Evolution, which is very popular and included as part of some of the Desktop Environments shipped by Debian, which accounts for a large majority of its popcon numbers.
As far as non-email-client installs goes, bogofilter
and spamassassin
seem to be in roughly the same ballpark of popularity as each other, and far ahead of all the other adaptive spam filters - including being 50× more popular than CRM114
.
They have also both received an update for the recent Debian bullseye
release.
And I already have spamassassin
installed.
dovecot-antispam
with SpamAssassin's sa-learn
.Although the pipe
backend for dovecot-antispam
is generic and needs to be told how to run the relevant training program, it's not actually that hard to do so.
It also helps that there's a sample of all the required options in the man page, under Configuration.
The entire configuration file I ended up with (/etc/dovecot/90-plugin-antispam
) is:
plugin { ## Generic options antispam_backend = pipe antispam_trash = Trash antispam_spam = Junk ## Pipe plugin options antispam_pipe_program = /usr/bin/sa-learn antispam_pipe_program_args = --siteconfigpath=/etc/spamassassin/delivery;--local;--username=%u antispam_pipe_program_spam_arg = --spam antispam_pipe_program_notspam_arg = --ham antispam_pipe_tmpdir = /tmp }
The only part that might need a bit of explanation there is the --siteconfigpath=/etc/spamassassin/delivery
- because I want the local delivery spam options to be different from the global prefilter options.
In particular, I want the global prefilter to have adaptive filtering disabled, whereas thats the whole point of the local delivery filter.
But also, while I want the global prefilter to make use of online spam tests (AskDNS, SPF, etc...), I need --local
to have the local delivery filter use local tests only - and you can't specify that in the config file!
This post originally had --configpath=/etc/spamassassin/delivery
in the above - it should have been --siteconfigpath
.
There's still one piece missing from the puzzle.
The dovecot-antispam
plugin checks if email is moved into or out of the Junk
folder, and runs the training program accordingly.
But how does it check if incoming email is spam, and move it into the Junk
folder?
It doesn't. You have to set that up separately.
sieve
Second part first - moving incoming spam into the Junk folder.
This isn't too hard, if you already have sieve filtering enabled.
You can configure a sieve_after directory and drop a sieve file in there to look for the SpamAssassin spam header and move the email appropriately.
My /etc/dovecot/sieve-after/99-spamfilter.sieve
script is just:
require ["fileinto","regex"]; if header :regex "X-Spam-Status" "^Yes" { fileinto "Junk"; stop; }
spamc
/spamd
, again.The first part, checking if incoming email is spam with SpamAssassin, is trickier. Because of course it is.
The first wrinkle is that emails might (still) be larger than SpamAssassin's recommended limit of 500kB.
Therefore you can't pass the emails to spamassassin
, but have to pass them to spamc
instead.
The wrinkle turns into an actual problem when you realise that spamc
has no way of passing configuration options (or a configuration path) to spamd
.
That means there's no way of getting spamc
for per-user adaptive filtering to use different options than spamc
for global rejection filtering.
The solution?
You have to run two spamd
daemons, with different configurations, and get the different invocations of spamc
to talk to the right one.
As someone who didn't want to run one spamd
daemon, this is getting annoying.
But I'm far enough down this rabbit hole that I've kind of stopped caring, and don't really have the energy to go back and start evaluating yet another spam filter.
So I set up a second daemon, with a different configuration file, and different command-line options (again, because some options can't be specified in the config file), listening on a different port, using a different PID file (because we really do want to run two instances at the same time), and that seems to be working.
The last part of the last problem is that Dovecot's delivery agent doesn't seem to have a way of passing incoming emails to another program like spamc
to actually perform the spam check.
And I only want to pass emails to spamc
if they're about to be delivered to a local user - not if they're just passing through the email system.
Fortunately, I can still fix that on the Postfix side by changing the delivery command from
mailbox_command = /usr/lib/dovecot/dovecot-lda -f "$SENDER" -a "$RECIPIENT"
to
mailbox_command = /usr/bin/spamc -p 785 -u "$USER" -e /usr/lib/dovecot/dovecot-lda -f "$SENDER" -a "$RECIPIENT"
Pretty much. Yeah. I think I've finally got a setup that does what I originally wanted to do.
This has felt like a tortuous and torturous path. Part of me thinks that this shouldn't be as hard as it has been. Spam filtering is a really important part of dealing with emails these days - and has been for a couple of decades. So why hasn't dealing with it been made more streamlined by now?
Part of me wonders if I'm just doing it completely wrong. That maybe there is a really straightforward way of setting this up that I've just missed entirely.
Even so, I have read some of the documentation for the tools I've worked with here, notably SpamAssassin, and I still have some big-picture questions:
If spamc
is meant to be a drop-in replacement for spamassassin
, why can't spamc
take the same command-line options as spamassassin
?
Why doesn't spamc
have e.g. a --local
option?
If spamc
is meant to be a drop-in replacement for spamassassin
, why can't spamassassin
take the same command-line options as spamc
?
Why doesn't spamassassin
have e.g. a --max-size
option?
If spamassassin
has been written, by the developers, to be able to work with multiple predefined configurations, as evidenced by the existence of the --siteconfigpath
option, why can't spamd
work with multiple predefined configurations?
Why can't all options be specified (or at least have their defaults set) in predefined configuration files?
e.g. there appears to be no config file equivalent of spamd
's --local
option to use local tests only, or its --nouser-config
to prevent trying to load a per-user configuration.
Why hasn't the slow startup time been fixed? Python has __pycache__. Emacs has dump/unexec (for its sins). Most OpenGL implementations have a shader cache.
What's with sa-compile
?
sa-compile uses "re2c" to compile the site-wide parts of the SpamAssassin ruleset. No part of user_prefs or any files included from user_prefs can be built into the compiled set.
This compiled set is then used by the "Mail::SpamAssassin::Plugin::Rule2XSBody" plugin to speed up SpamAssassin's operation
Additionally, "sa-compile" will not restart "spamd"
Congratulations, you've added the complexity of a cache, but without actually fixing the problem enough that you can eliminate any of the other awkward workarounds that are in place.
Look, I'm a software developer. I know that features don't just happen and someone has to write them, and that every proposed feature starts with minus 100 points. But still, SpamAssassin is 20 years old, and it's not like these are new features that no-one is quite sure how they'd work. These are features that either already exist, or that people have already put a fair amount of work on improving. They're just not consistently available, or quite finished yet.
How long does that last layer of polish take though?
posted at: 14:31 | path: / | permanent link to this entry