… or Find the False Positive.
Anyone sending a lot of email has complained about spam filters and false positives at some point. But most people haven’t run a mailbox with no spam filters in front of it in recent years, so don’t have much of a feel for what an unfiltered mailbox looks like, how important filters are and how difficult their job is.
I run no transaction level filters in front of my mailbox, just content filters that route mail to one of several inboxes or a junk folder, so if I want to I can look at what unfiltered email looks like. I took data from all mail that was sent to me yesterday, and put it in a format that really shows the problem filters face and especially the difficulty of spotting which mail in the junk folder is a false positive.
An inbox with no filters looks like this.
Running a spam filter against it, simply categorizing each email as spam (pink) or not-spam (green) looks like this.
Even with the messages categorized as spam vs not-spam it’s hard to work out which messages are important and which aren’t, let alone where the false positives might be.
If I sort the categories by hand you get this – where you can see that out of 1200 or so mails about three quarters were spam. Of the three false positives two were bulk email that I didn’t care that I didn’t receive and only one was email that I considered important.