Author Archive for steve

Delivery Haikus

As we mentioned earlier Habeas is being bought out by ReturnPath.

While they’ve not actually used it for several years the thing that Habeas will be remembered for is their introduction of the Haiku form of poetry into email headers:

winter into spring
brightly anticipated
like Habeas SWE ™

How better to commemorate that than with some email themed Haiku?

Some email delivery folks have provided these to start you off:

Goodbye Habeas.
What have you left? Just footprints
in snow as spring comes.

Commercial Email,
Confirmed and opted-in,
Clicked and opened.

Creative content,
If not blocked or bulked,
Then inbox-receiv’d.

spam is really dumb
it makes our lives really hard
it will never leave

spammers are dumber
especially if client
need to be fired

MickC has one too, a “bye-ku“.

Can you do any better? Bonus points for a 5, 7, 5 syllable pattern and some reference to a time of year…

11 Comments

Why do ISPs limit emails per connection?

A few years ago it was “common knowledge” that if you were sending large amounts of email to an ISP the most polite way to do that, the way that would put the least load on the receiving mailserver, was to open a single SMTP session to the mailserver and then to send all the mail for that ISP down that single connection.

That’s because the receiving mailserver is concerned about two main resources when handling inbound email - the pool of “slots” assigned one per inbound SMTP session, and the bandwidth (network and disk, and related resouces such as memory and CPU) consumed by the inbound mail - and this approach means the sender only uses one slot, and it allows the receiving mailserver to control the bandwidth used simply by accepting data on that one connection at a given rate. It also amortizes all the connection setup costs over multiple emails. It’s a beautiful thing - it just doesn’t get any more efficient than that.

That seems perfect for the receiving ISP - but ISPs don’t encourage bulk senders to do this. Instead many of them have been moving from “one connection, lots of mail through it” to “multiple connections, a few messages through each”. They’re even limiting the number of deliveries permitted over a single connection. Why would that be?

The reason for this is driven by three things. One is that the number of simultaneous inbound SMTP sessions that a mailserver can handle is quite tightly limited by the architecture of most mailservers. Another is that the amount of mail that’s being sent to large ISP mailservers keeps going up and up - so there are sometimes more inbound SMTP sessions asking for access than the mailserver can handle. The third is that ISPs know that there are different categories of email being sent to their users - 1:1 mail from their friends that they want to see as soon as possible, wanted bulk mail that their users want to see when it arrives and spam; lots and lots of spam.

So ISPs want to be able to do things like accept 1:1 mail all the time, while deferring bulk mail and spam to allow them to shed traffic at times of peak load. But they can only make decisions about whether to accept or defer delivery in an efficient way at SMTP connection time - they pick and choose amongst the horde of inbound connection attempts to prioritize some and defer others, letting them keep within the number of inbound sessions that they can handle simultaneously.

But once the ISP lets a bulk mailer connect to deliver their mail, they lose most of the ability to further control that delivery as the sender might send thousands of emails down that connection. (Even if the ISP has the ability to throttle bandwidth - as some do to control obvious spam - that just means that the sender would tie up an expensive inbound delivery slot for longer).

So, in order to allow them to prioritize inbound connections effectively the ISP needs to terminate the session after a few deliveries, and then make that sender start competing with other senders for a connection again.

So ISPs aren’t limiting the number of deliveries per SMTP connection to make things difficult for senders, or because they don’t understand how mail works. They’re doing it because that lets them prioritize wanted email to their users. The same is true when they defer your mail with a 4xx response.

It might be annoying to have to deal with these limits on delivery, but for legitimate bulk mail senders all this throttling and prioritization is a good thing. Your mail may be given less priority than 1:1 mail - but, if you maintain a good reputation, you’re given higher priority than all the spam, higher priority than all the email borne viruses, higher priority than all the junk email, higher priority than the 419 spams. And higher priority than mail from those of your competitors who have a worse reputation than yours.

1 Comment

Why does everyone tell you to avoid .biz in your emails?

… or Why do spam filters sometimes have some very strange ideas?

It’s been dogma for a long time that if you’re doing email marketing you should avoid using a .biz domain in your mails. Even if your main website was in .biz, you should use something different in your messages, perhaps a website you buy solely for use in email that redirects to your real .biz website. Last year I looked at why that was, and what could be done about it.

One main reason for avoiding it has been resolved (so if you’ve been avoiding using .biz URLs in your mail now might be a good time to re-test that decision). And enough time has gone by that I can share the ugly reasons as to why .biz was considered a sure sign of spam without good reason for so long without upsetting everyone.

The simple reason was SpamAssassin. SpamAssassin is very widely used to filter mail, both in it’s open source version and buried anonymously deep inside countless commercial spam filters and filtering appliances. Not only that, but SpamAssassin is readily available, so most people looking to do pre-mailing content checks or looking at why content-based filters are objecting to a particular email will use SpamAssassin as their model. It’s very widely deployed, and influential far beyond the size of it’s deployed base.

SpamAssassin is a score-based spam filter - it checks an email against hundreds of rules, adds up the scores of each rule that matches and, in typical setups, decides the mail is spam if the total score is five or more. Pretty reasonable, but here are a few of the rules and scores (from the 2006 version of SpamAssassin)

  • 1.392 Advance Fee Fraud (Nigerian 419)
  • 0.493 Refers to an erectile drug
  • 1.995 Subject contains G a p p y T e x t
  • 0.496 Message is 40-50% HTML
  • 2.100 From: domain has a series of 7 consonants
  • 1.635 Possible porn - Hardcore Porn
  • 2.013 Contains a URL in the BIZ top level domain
  • 1.273 Contains a URL in the INFO top level domain

You can’t quite treat the scores as SpamAssassins measure of the “spamminess” of a message (”a .biz URL is 23% spammier than hardcore porn” … “The URL microsoft.biz is about as spammy as From: Ignatious T. Aardvark <success@sdfghjkl.com>“) but it’s pretty clear that using a .biz domain in your mail had a huge effect on your SpamAssassin score, and a bad risk to take if you could easily avoid it.

So, was .biz really that spam-ridden? I recall it being pretty bad when it first launched, so it’s reasonable that SpamAssassin has that rule, but was it still bad by 2006? Bad enough to merit a score quite that high? That’s hard to measure, but a reasonable metric is the percentage of domains in each top level domain (.com, .net, .biz etc) that had been spotted as definite spam sign by the folks at SURBL.

Percentage of domains listed in SURBL

So .biz looks just fine - comparable with .com or .net, and certainly a lot better than .info. Why was SpamAssassin still treating it as so spammy?

SpamAssassin developers measure and develop their scores based on several corpuses of recently received email, hand categorised into spam mail and non-spam (”ham”) mail. Like many other spam filters, they stay fairly vague about where exactly these corpuses come from (to avoid people gaming the system) but they seem to be based mostly on the personal mailboxes of developers. Of the five corpuses SpamAssassin were using in 2006, four saw almost no .biz spam, but one saw quite a lot (graph of .biz URLs in spam). More importantly, though, none of them saw more than tiny number of .biz URLs in non-spam(graph of .biz URLs in non-spam).

The algorithm that SpamAssassin uses to assign scores to the rules is complex, but loosely speaking if a rule helps to correctly classify one of the mails in the spam corpus as spam, then the score of that rule will tend to be increased, while if a rule helps to wrongly classify non-spam as spam then the score for that rule will tend to be decreased. In the test corpuses used, .biz URLs hardly ever appear in non-spam, so there’s no pressure to reduce the score assigned to that rule.

So the final answer to the question in the title is:

  1. Long, long ago when .biz was new it was used by a lot of spammers (because it was new, so a lot of good domains were easily available).
  2. SpamAssassin added a rule to recognize .biz URLs, and increase the spam score of mails containing them
  3. SpamAssassin is very influential, even more so than it’s wide deployment makes it.
  4. Legitimate mailers saw that SpamAssassin would punish them for using a .biz URL, so they pretty much all avoid using .biz URLs in their email.
  5. With effectively no legitimate bulk mail using .biz URLs, there’s nothing to keep the SpamAssassin score for the “contains a .biz URL” from creeping up, and being even more punitive to use of .biz URLs.
  6. Go to step 4

This leads to a vicious circle where legitimate mailers don’t use .biz as SpamAssassin would punish them for doing so, and SpamAssassin continues to punish anyone using .biz URLs because they’re not used by legitimate mailers. SpamAssassin eventually broke this particular circle by removing the rule from their latest release, but not until it had had a major effect on use of .biz URLs that still persists.

The .biz issue has since been resolved, but there’s a broader deliverability conclusion to draw from this story. While on a branding and image level you want your messages to stand out from all your competitors’ messages, on a technical level you want your mails to be similar to those of other legitimate mailers. That way, if there’s an oddity in a content filter that makes it classify your mail as spam it’ll likely be classifying lots of other legitimate mail as spam too, and be fixed fairly quickly (probably before it’s deployed into production).

That includes things like the way you use HTML and MIME, the way you register the domain names you use and the way you use them as URLs in messages and a bunch of other things. Being aware of the sort of things that content-filters like SpamAssassin look at is a good place to start.

0 Comments

DKIM “i=” vs “d=” and Reputation

This really should be part seven of a twelve part series or some such as it deals with an aspect of DKIM that’s really important, but is way down in the details of implementation. (dkim.org is a reasonable place to start for a general overview of DKIM).

There’s an apparently endless thread on the DKIM-SSP spec development mailing list at the moment about the differences between two fields in a DKIM signature that could be used to tie a senders reputation to. Several ESP delivery folks asked me to explain what everyone was talking about, and this post is a first cut at that.

“i=” vs “d=”

There are two possible fields in a DKIM signature that could be used to identify the sender of a message, and so to tie a sender history and reputation record to. They are the so-called “i=” and “d=” field, from the syntax used to include them in the signature.

Continue reading ‘DKIM “i=” vs “d=” and Reputation’

0 Comments