<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Word to the Wise &#187; Technical</title>
	<atom:link href="http://blog.wordtothewise.com/tag/technical/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.wordtothewise.com</link>
	<description>Email, Delivery, Spam and more</description>
	<lastBuildDate>Tue, 07 Feb 2012 23:24:18 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.3</generator>
		<item>
		<title>Who leaked my address, and when?</title>
		<link>http://blog.wordtothewise.com/2011/07/who-leaked-when/</link>
		<comments>http://blog.wordtothewise.com/2011/07/who-leaked-when/#comments</comments>
		<pubDate>Thu, 07 Jul 2011 01:41:59 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[breaches]]></category>
		<category><![CDATA[Forensics]]></category>
		<category><![CDATA[iContact]]></category>
		<category><![CDATA[script]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=3136</guid>
		<description><![CDATA[Providing tagged email addresses to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they&#8217;ve leaked to spammers. I&#8217;d really like to know who leaked an email address, and when. All my inbound mail is sorted [...]]]></description>
			<content:encoded><![CDATA[<p>Providing <a title="Tagged Email Addresses" href="http://blog.wordtothewise.com/2010/07/tagged-email-addresses/" target="_blank">tagged email addresses</a> to vendors is fascinating, and at the same time disturbing. It lets me track what a particular email address is used for, but also to see where and when they&#8217;ve leaked to spammers.</p>
<p>I&#8217;d really like to know who leaked an email address, and when.</p>
<p>All my inbound mail is sorted into &#8220;spam&#8221; and &#8220;not-spam&#8221; by a combination of SpamAssassin, some static sieve rules and a learning spam filter in my mail client. That makes it fairly easy for me to look at my &#8220;recent spam&#8221;. That&#8217;s a huge amount of data, though, something like 40,000 pieces of spam a month.</p>
<p>Finding the needle of interesting data in that haystack is going to take some automation. As I&#8217;ve <a title="Analysing a data breach - CheetahMail" href="http://blog.wordtothewise.com/2011/05/analysing-a-breach/" target="_blank">mentioned before</a> you can do quite a lot of useful work with a mix of some little perl scripts and some commandline tools.</p>
<p>I&#8217;m interested in the first time a tagged address started receiving spam, so I start off with a perl script that will take a directory full of emails, one per file, find the ones that were sent to a tagged address and print out that address and the time I received the email. I can&#8217;t rely on the Date: header, as that&#8217;s under the control of the spammer, and often bogus. But I can rely on the timestamp my server adds when it receives the email &#8211; and it records that in the first Received: header in the message.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl</span>
<span style="color: #000000; font-weight: bold;">use</span> strict<span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">use</span> Date<span style="color: #339933;">::</span><span style="color: #006600;">Parse</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$file</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">@</span><span style="color: #000000; font-weight: bold;">ARGV</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">open</span> IF<span style="color: #339933;">,</span> <span style="color: #0000ff;">$file</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span> <span style="color: #ff0000;">&quot;Failed to open '$file': $!<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">@headers</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #009966; font-style: italic;">s/[\r\n]//g</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">last</span> <span style="color: #b1b100;">if</span> <span style="color: #009966; font-style: italic;">/^$/</span><span style="color: #339933;">;</span>
	<span style="color: #000066;">push</span> <span style="color: #0000ff;">@headers</span><span style="color: #339933;">,</span> <span style="color: #ff0000;">''</span> <span style="color: #b1b100;">unless</span> <span style="color: #009966; font-style: italic;">/^\s/</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$headers</span><span style="color: #009900;">&#91;</span><span style="color: #0000ff;">$#headers</span><span style="color: #009900;">&#93;</span> <span style="color: #339933;">.=</span> <span style="color: #0000ff;">$_</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$date</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$timestamp</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$header</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">@headers</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$header</span> <span style="color: #339933;">=~</span> <span style="color: #009966; font-style: italic;">/^Received:.*;([^(]+)/</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	    <span style="color: #0000ff;">$date</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">$1</span><span style="color: #339933;">;</span>
	    <span style="color: #0000ff;">$timestamp</span> <span style="color: #339933;">=</span> str2time<span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$1</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
	    <span style="color: #b1b100;">last</span><span style="color: #339933;">;</span>
	<span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
    <span style="color: #666666; font-style: italic;"># Replace this regex with something that </span>
    <span style="color: #666666; font-style: italic;"># matches your tagged addresses</span>
    <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #000066;">join</span><span style="color: #009900;">&#40;</span><span style="color: #ff0000;">' '</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">@headers</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">=~</span> 
                 <span style="color: #009966; font-style: italic;">/(foo\+[a-z0-9]+\@[a-z.-]+)/</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;$timestamp $1 $date<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Dates and times are annoying to work with on the command line, so I also use the perl Date::Parse module to convert the timestamp in the received header into epoch time &#8211; the number of seconds since January 1st, 1970. I use some unix commandline magic to run this against my two spam mailboxes and dump the results in a file.</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">find</span> spamassassin<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">xargs</span> stamp-address.pl <span style="color: #000000; font-weight: bold;">&gt;&gt;</span>junk.txt
<span style="color: #c20cb9; font-weight: bold;">find</span> junk<span style="color: #000000; font-weight: bold;">/</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">xargs</span> stamp-address.pl <span style="color: #000000; font-weight: bold;">&gt;&gt;</span> junk.txt</pre></div></div>

<p>The end result is one line per email, with the epoch time, the tagged email address and the original format of the date and time. Something like this:</p>

<div class="wp_syntax"><div class="code"><pre class="none" style="font-family:monospace;">1300731078 cpan-tag@addr  Mon, 21 Mar 2011 11:11:18 -0700 
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700 
1300731122 vmware-tag@addr Mon, 21 Mar 2011 11:12:02 -0700 
1300732902 unicorn-tag@addr Mon, 21 Mar 2011 11:41:42 -0700</pre></div></div>

<p>Next, I want to find the first occurrence of each tagged address.</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl</span>
<span style="color: #000000; font-weight: bold;">use</span> strict<span style="color: #339933;">;</span>
&nbsp;
<span style="color: #b1b100;">my</span> <span style="color: #0000ff;">%seen</span><span style="color: #339933;">;</span>
<span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #339933;">&lt;&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">chomp</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">my</span> <span style="color: #009900;">&#40;</span><span style="color: #0000ff;">$stamp</span><span style="color: #339933;">,</span> <span style="color: #0000ff;">$address</span><span style="color: #009900;">&#41;</span> <span style="color: #339933;">=</span> <span style="color: #000066;">split</span> <span style="color: #009966; font-style: italic;">/ /</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">unless</span><span style="color: #009900;">&#40;</span><span style="color: #000066;">exists</span> <span style="color: #0000ff;">$seen</span><span style="color: #009900;">&#123;</span><span style="color: #0000ff;">$address</span><span style="color: #009900;">&#125;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
	<span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;$_<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
	<span style="color: #0000ff;">$seen</span><span style="color: #009900;">&#123;</span><span style="color: #0000ff;">$address</span><span style="color: #009900;">&#125;</span> <span style="color: #339933;">=</span> <span style="color: #cc66cc;">1</span><span style="color: #339933;">;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>I sort the list of addresses numerically, then use this script to display the first time each email address received spam:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;"><span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-n</span> <span style="color: #000000; font-weight: bold;">&lt;</span>junk.txt <span style="color: #000000; font-weight: bold;">|</span> findfirst.pl</pre></div></div>

<p>That reduces the amount of data enough that I can look at it by hand. What did I find? Several interesting things, but I&#8217;m just going to mention one here.</p>

<div class="wp_syntax"><div class="code"><pre class="none" style="font-family:monospace;">1299111914 casemate-tag@addr Wed, 2 Mar 2011 16:25:14 -0800 
1307104954 dell-tag@addr Fri, 3 Jun 2011 05:42:34 -0700 
1307104986 codefast-tag@addr Fri, 3 Jun 2011 05:43:06 -0700</pre></div></div>

<p>Casemate and Codefast have only ever mailed me via iContact, so given <a href="http://blog.wordtothewise.com/2011/06/new-security-focused-services/" target="_blank">iContact&#8217;s history</a> it seems likely that those leaks were via iContact.</p>
<p>Dell, on the other hand, have mailed me directly and through several ESPs &#8211; and I don&#8217;t recall them using iContact. Looking at the timestamps (and the content of the spams) it&#8217;s clear that the Dell and Codefast tagged addresses were both sent spam for the first time as part of the same spamrun &#8211; so it&#8217;s almost certain that they leaked at the same time.</p>
<p>Looking for iContacts bounce domain (icpbounce.com) in my mailbox I do find that Dell used them briefly, on May 4th. So that&#8217;s pretty compelling evidence that iContact leaked all three addresses. (Which means my <a href="http://blog.wordtothewise.com/2011/06/the-real-stor/" target="_blank">previous theory</a> about Dell customer addresses leaking, based on misleading statements from Intervision, was wrong.)</p>
<p>There&#8217;s another thing that&#8217;s interesting&#8230; iContact has had a history of email breaches. The data I have here (and it&#8217;s matched by a couple of older data points, if I recall correctly) shows spam being sent to newly leaked addresses on the 2nd or 3rd of the month.</p>
<p>I wonder if iContact does a batch export to a subcontractor, or an offsite backup or something similar on the first of each month?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/07/who-leaked-when/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Analysing a data breach &#8211; CheetahMail</title>
		<link>http://blog.wordtothewise.com/2011/05/analysing-a-breach/</link>
		<comments>http://blog.wordtothewise.com/2011/05/analysing-a-breach/#comments</comments>
		<pubDate>Thu, 05 May 2011 22:33:41 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[breach]]></category>
		<category><![CDATA[Forensics]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2885</guid>
		<description><![CDATA[I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some &#8220;forensics&#8221; work, analyzing leaked emails or received spam for use as evidence in a case. For large volumes of mail where I might want to dig down in a lot of [...]]]></description>
			<content:encoded><![CDATA[<p>I often find myself having to analyze volumes of email, looking for common factors, source addresses, URLs and so on as part of some &#8220;forensics&#8221; work, analyzing leaked emails or received spam for use as evidence in a case.</p>
<p>For large volumes of mail where I might want to dig down in a lot of detail or generate graphical or statistical reports I tend to use <a title="Abacus" href="http://wordtothewise.com/products/abacus.html" target="_blank">Abacus</a> to slurp in and analyze all the emails, store them in a SQL database in an easy to handle format and then do the ad-hoc work from a SQL commandline. For smaller work, though, you can get a long way with unix commandline tools and some basic perl scripting.</p>
<p>This morning I received Ukrainian bride spam to a <a title="Tagged Email Addresses" href="http://blog.wordtothewise.com/2010/07/tagged-email-addresses/" target="_blank">tagged address </a>that I&#8217;d only given to one vendor, RedEnvelope, so that address has leaked to criminal spammers from somewhere. Looking at a couple of RedEnvelope&#8217;s emails I see they&#8217;re sending from a number of sources, so I decided to dig a little deeper.</p>
<p>I started by searching for all emails to that tagged address in my mail client, then copied all the matching emails to a newly created folder. Then I took a copy of that folder and split it into one file per email using a shell one-liner:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">formail <span style="color: #660033;">-ds</span> <span style="color: #c20cb9; font-weight: bold;">sh</span> <span style="color: #660033;">-c</span> <span style="color: #ff0000;">'cat &gt;msg.$FILENO'</span> <span style="color: #000000; font-weight: bold;">&lt;</span>mbox</pre></div></div>

<p>I&#8217;m interested in the IP address they were sent from, so I write a tiny perl script, getips.pl, that&#8217;ll look for the first Received line, and print out the IP address:</p>

<div class="wp_syntax"><div class="code"><pre class="perl" style="font-family:monospace;"><span style="color: #666666; font-style: italic;">#!/usr/bin/perl</span>
&nbsp;
<span style="color: #b1b100;">foreach</span> <span style="color: #b1b100;">my</span> <span style="color: #0000ff;">$file</span> <span style="color: #009900;">&#40;</span><span style="color: #339933;">@</span><span style="color: #000000; font-weight: bold;">ARGV</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
    <span style="color: #000066;">open</span> IF<span style="color: #339933;">,</span> <span style="color: #0000ff;">$file</span> <span style="color: #b1b100;">or</span> <span style="color: #000066;">die</span><span style="color: #339933;">;</span>
    <span style="color: #b1b100;">while</span><span style="color: #009900;">&#40;</span><span style="color: #009999;">&lt;IF&gt;</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
        <span style="color: #000066;">chomp</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">last</span> <span style="color: #b1b100;">if</span> <span style="color: #009966; font-style: italic;">/^$/</span><span style="color: #339933;">;</span>
        <span style="color: #b1b100;">if</span><span style="color: #009900;">&#40;</span><span style="color: #009966; font-style: italic;">/^Received:.*\[(\d+\.\d+\.\d+\.\d+)\]/</span><span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
            <span style="color: #000066;">print</span> <span style="color: #ff0000;">&quot;$1<span style="color: #000099; font-weight: bold;">\n</span>&quot;</span><span style="color: #339933;">;</span>
            <span style="color: #b1b100;">last</span><span style="color: #339933;">;</span>
        <span style="color: #009900;">&#125;</span>
    <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>I use it, along with the standard tools &#8220;sort&#8221; and &#8220;uniq&#8221; to summarize the sending IP addresses:</p>

<div class="wp_syntax"><div class="code"><pre class="bash" style="font-family:monospace;">.<span style="color: #000000; font-weight: bold;">/</span>getips.pl msg.<span style="color: #000000; font-weight: bold;">*</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">uniq</span> <span style="color: #660033;">-c</span> <span style="color: #000000; font-weight: bold;">|</span> <span style="color: #c20cb9; font-weight: bold;">sort</span> <span style="color: #660033;">-nr</span></pre></div></div>

<p>That takes the list of IP addresses generated by the perl script, then sorts them (so identical IP addresses are adjacent to each other), then counts how many times each discrete email address is found, then sorts them most to fewest. If you want to see how it does that, play around with the command line, removing commands off the end one-by-one to see the intermediate data it produces.</p>
<p>The result looks like this:</p>
<pre>
  27 208.49.63.243
  20 208.49.63.242
  20 208.49.63.240
  15 208.49.63.245
  15 208.49.63.241
  11 208.49.63.244
   3 38.107.108.146
   2 38.107.108.149
   2 209.112.253.83
   1 89.230.132.139
   1 38.107.108.144
   1 209.112.253.90
   1 209.112.253.85
</pre>
<p>The IP address beginning with 89 is the Ukrainian bride spam itself. Of the remainder it&#8217;s easy to see that they come from three main groups of addresses &#8211; 208.49.63.0/24, 38.107.108.0/24 and 209.112.253.0/24.</p>
<p>The 208 and 209 ranges are both IP space owned by RedEnvelope, so that was mail sent by them directly (both transactional and advertising mail). The 38 range is space owned by the ESP Cheetahmail. (I&#8217;ll go into how I worked all that out in a future post).</p>
<p>What does this mean? It means that either a russian bride spammer just happened to guess the email address I&#8217;d created solely to give to RedEnvelope (incredibly unlikely in this particular case, due to how I handle tagged addresses) or they stole it from somewhere. There are four places they could plausibly have stolen it from:</p>
<ol>
<li>My mail client, by compromising my laptop
<li>My mail server, by compromising the machine or the staff who run it
<li>RedEnvelope, by compromising their servers, employees or employee machines
<li>Cheetahmail, by compromising their servers, employees or employee machines
</ol>
<p>I run my own mailserver, so I know exactly which email addresses it handles. There are many hundreds or thousands of email addresses which it handles, and to which mail has been sent legitimately (as well as countless billions of addresses that it would accept email to if you made them up, but which have never been used). If it had been compromised in any way, I would expect many of those email addresses to be sent spam as part of this spam run (it&#8217;s not at all unusual for 40 or 50 of my email addresses to receive copies of any given spam). But only the RedEnvelope-specific email address received the Ukrainian bride spam.</p>
<p>Similarly for my laptop. It has hundreds of email addresses in it&#8217;s mailboxes. If it had been compromised, I&#8217;d have expected to see this spam sent to many of those email addresses, and I don&#8217;t. If someone had stolen multiple email addresses of mine, I&#8217;d expect them to be sending spam to all of them, unless they were doing something clever and deceptive like spear-phishing &#8211; and Ukrainian bride spam isn&#8217;t clever, subtle or targeted.</p>
<p>That leaves just RedEnvelope or CheetahMail as likely sources of the stolen address. Conveniently, Laura also has an account with RedEnvelope, and also uses a tagged address with them. She&#8217;s seen no spam at all to her RedEnvelope-specific address. Doing the same analysis with the legitimate RedEnvelope mail she&#8217;s received to that address I get this:</p>
<pre>
   2 209.112.253.90
   2 209.112.253.85
   2 208.49.63.242
   2 208.49.63.241
   1 209.112.253.83
   1 208.49.63.245
</pre>
<p>There are only three significant differences between Laura&#8217;s account and mine. Mine was created in June 2010, while hers was created in December. Mine has been emailed via CheetahMail, hers hasn&#8217;t. And mine received russian bride spam, hers didn&#8217;t. </p>
<p>One possibility is that RedEnvelope were compromised prior to December, so only my address was taken. But if that were the case I&#8217;d have expected to see that address misused before today. It&#8217;s possible, but not the most likely explanation.</p>
<p>More likely is that CheetahMail were compromised some time in the past few days, and the email address was stolen from there.</p>
<p>This isn&#8217;t conclusive proof by any means, but if I were RedEnvelope or CheetahMail I&#8217;d be looking very closely at other reports of stolen addresses, to see if there are patterns of theft from RedEnvelope lists sent across multiple ESPs or compromises of data from multiple CheetahMail customers.</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/05/analysing-a-breach/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Character encoding</title>
		<link>http://blog.wordtothewise.com/2011/05/character-encoding/</link>
		<comments>http://blog.wordtothewise.com/2011/05/character-encoding/#comments</comments>
		<pubDate>Tue, 03 May 2011 23:36:58 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[mime]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2866</guid>
		<description><![CDATA[This morning, someone asked an interesting question. Last time I worked with the actual HTML design of emails (a long time ago), &#60;head&#62; was not really needed. Is this still true for the most part? Any reason why you still want to include &#60;head&#62; + meta, title tags in emails nowadays? There are several bits [...]]]></description>
			<content:encoded><![CDATA[<p>This morning, someone asked an interesting question.</p>
<blockquote><p>Last time I worked with the actual HTML design of emails (a long time ago), &lt;head&gt; was not really needed. Is this still true for the most part? Any reason why you still want to include &lt;head&gt; + meta, title tags in emails nowadays?</p></blockquote>
<p>There are several bits of information in the &lt;head&gt; part of an HTML document that can affect the rendering of it &#8211; there&#8217;s the doctype, which will control the html rendering model, there&#8217;s often some css which will control the styling, and there&#8217;s often a meta tag that states what character set is used in the document.</p>
<p>That last one is interesting in the case of a piece of HTML that&#8217;s being sent as part of a MIME email &#8211; as MIME already has a perfectly good way of specifying the character set a message has, as part of the Content-Type header. I looked at a few bulk messages I&#8217;d received recently and, sure enough, most of them include the &lt;head&gt; section, and have a meta tag in there that defines the character set. All of them have a character set defined in the Content-Type header. Sometimes those character sets didn&#8217;t match:</p>
<blockquote><p>Content-Type: text/html; charset=us-ascii<br />
Content-Transfer-Encoding: 7Bit</p>
<p>&lt;html&gt;<br />
&lt;head&gt;<br />
&lt;title&gt;&lt;/title&gt;<br />
&lt;meta http-equiv=&#8221;Content-Type&#8221; content=&#8221;text/html; charset=windows-1252&#8243;&gt;<br />
&lt;meta name=&#8221;title&#8221; content=&#8221;New CS5.5 Web Premium&#8221; /&gt;<cite>a snippet from this mornings email</cite></p></blockquote>
<p>What happens when they don&#8217;t match? I don&#8217;t think it&#8217;s defined anywhere. Time for some empirical testing.</p>
<p><strong>Testing! For Science!</strong></p>
<p>I needed to create some test emails which would be visibly different depending on which character set the mail client decided to use. I picked out two character sets &#8211; ISO-8859-15 and ISO-8859-16, as they differ from each other and from ISO-8859-1 enough that I could differentiate them just by the way two characters were rendered.</p>
<p>The byte 0xfd renders as e-with-a-tail (&#281;) in ISO-8859-16 and as y-acute (&yacute;) in the other two character sets, while 0xa4 renders as the generic currency symbol (&curren;) in ISO-8859-1 and as a euro symbol (&euro;) in the other two. I included the characters in two different ways in each test message &#8211; once as a raw character in the body of the message (=a4 or =fd in quoted-printable format), and once as a numeric HTML entity (&amp;#164; or &amp;#253;).</p>
<p>This is what I found:</p>
<table>
<tbody>
<tr>
<th>Mail client</th>
<th>Mime charset</th>
<th>HTML meta charset</th>
<th>Raw character</th>
<th>HTML entity</th>
</tr>
<tr>
<td>Mail.app</td>
<td>-15</td>
<td>-16</td>
<td>-15</td>
<td>-1</td>
</tr>
<tr>
<td>Gmail</td>
<td></td>
<td></td>
<td>-15</td>
<td>-1</td>
</tr>
<tr>
<td>Mail.app</td>
<td>-16</td>
<td>-15</td>
<td>-16</td>
<td>-1</td>
</tr>
<tr>
<td>Gmail</td>
<td></td>
<td></td>
<td>-1</td>
<td>-1</td>
</tr>
<tr>
<td>Mail.app</td>
<td>-15</td>
<td>none</td>
<td>-15</td>
<td>-1</td>
</tr>
<tr>
<td>Gmail</td>
<td></td>
<td></td>
<td>-1</td>
<td>-1</td>
</tr>
<tr>
<td>Mail.app</td>
<td>none</td>
<td>-16</td>
<td>broke</td>
<td>-1</td>
</tr>
<tr>
<td>Gmail</td>
<td></td>
<td></td>
<td>-1</td>
<td>-1</td>
</tr>
<tr>
<td>Mail.app</td>
<td>us-ascii</td>
<td>-16</td>
<td>broke</td>
<td>-1</td>
</tr>
<tr>
<td>Gmail</td>
<td></td>
<td></td>
<td>-1</td>
<td>-1</td>
</tr>
</tbody>
</table>
<p>&nbsp;</p>
<p>There are several things to see from this data. The simple one first &#8211; regardless of which character set I declared, and where I declared it, both mail clients rendered characters written as HTML numeric entities (&#8220;&amp;#164;&#8221;) consistently in ISO-8859-1. (This isn&#8217;t really a surprise, as it&#8217;s how the HTML specs define them.)</p>
<p>Raw characters were much less consistent. Mail.app consistently used the character set declared in the MIME Content-Type header when it was set to something reasonable, and ignored the encoding in the HTML meta tag. Giving it an unreasonable character set in the Content-Type header caused it to render 0xfd as a double dagger (‡), which makes no sense at all in any character set I can find. Gmail managed to render the raw character in ISO-8859-15 correctly, but gave up and fell back to using ISO-8859-1 for everything else.</p>
<p><strong>Conclusions</strong></p>
<p>There are a few things we can conclude from this, I think, even though it really needs some comparisons with different mail clients, and some testing with other character sets (including unicode and some of the asian sets).</p>
<ol>
<li>Don&#8217;t bother with putting HTML meta content-type tags in your HTML</li>
<li>Send your text/html parts as plain 7 bit ascii, using HTML entities for non-ascii characters</li>
<li>It might be less confusing to use named entities such as &amp;copy; rather than numeric ones such as &amp;#169;</li>
<li>If you&#8217;re generating numeric entities from user-generated input, be wary of input that&#8217;s not ISO-8859-1 or Windows-1252</li>
<li>Character set conversion is hard, lets go unicode</li>
</ol>
<p>I&#8217;ve made the test emails I used <a href="http://wordtothewise.com/files/charsetemails.zip">available for download</a>. From a unix prompt, with <a title="swaks" href="http://jetmore.org/john/code/swaks/" target="_blank">swaks</a> installed, you can send them like this:</p>
<p>for i in charset*.eml ; do swaks &#8211;to your@email.address &#8211;from your@email.address &#8211;server your.email.server &#8211;data &#8211; &lt;$i; done</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/05/character-encoding/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Defending against the hackers of 1995</title>
		<link>http://blog.wordtothewise.com/2011/04/defending-against-the-hackers-of-1995/</link>
		<comments>http://blog.wordtothewise.com/2011/04/defending-against-the-hackers-of-1995/#comments</comments>
		<pubDate>Fri, 29 Apr 2011 17:06:36 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[2fa]]></category>
		<category><![CDATA[Authentication]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2850</guid>
		<description><![CDATA[Passwords are convenient for the end user, but it&#8217;s too easy to lose control of them. People share them with other people. People write them down, where they can be read. People send them in email, and that email is easily intercepted. People&#8217;s web browsers store the passwords, so they can log in automatically. Worst [...]]]></description>
			<content:encoded><![CDATA[<p>Passwords are convenient for the end user, but it&#8217;s too easy to lose control of them. People share them with other people. People write them down, where they can be read. People send them in email, and that email is easily intercepted. People&#8217;s web browsers store the passwords, so they can log in automatically. Worst of all, perhaps, people tend to use the same username and password at many different websites. If just one of those websites is compromised (or even run as a password collecting scam) then those passwords can be used to attack accounts at all of the others.</p>
<p>Two factor authentication that uses an uncopyable physical device (such as a cellphone or a security token) as a second factor mitigates most of these threats very effectively. Weaker two factor authentication using digital certificates is a little easier to misuse (as the user can share the certificate with others, or have it copied without them noticing) but still a lot better than a password.</p>
<p><strong>Security problems solved, then?</strong></p>
<blockquote><p>Two-factor authentication isn&#8217;t our savior. It won&#8217;t defend against phishing. It&#8217;s not going to prevent identity theft. It&#8217;s not going to secure online accounts from fraudulent transactions. It solves the security problems we had 10 years ago, not the security problems we have today.<cite><a href="http://www.schneier.com/essay-083.html">Bruce Schneier, April 2005</a></cite></p></blockquote>
<p>Password stealing attacks are still a risk &#8211; especially use of the same password on different services &#8211; but they&#8217;re not the main thrust of modern attacks, and haven&#8217;t been for years. Rather we&#8217;re seeing <a href="http://en.wikipedia.org/wiki/Man-in-the-middle_attack" target="_blank">man-in-the-middle attacks</a> and trojan attacks &#8211; these can be used very effectively as part of a targeted attack initiated by phishing or social engineering.</p>
<p>One form of a man-in-the-middle attack is to create a fake website that looks like your real website, and then to entice one of your users to go to the fake website instead of the real one. Your user then enters their password and the second factor from their securid fob, and the attacker uses that to log in to your website. Done well, the user will never notice &#8211; the attacker either gives them a fake error message and redirects them to your real login page or tunnels their transactions through to your website while also piggybacking their own transactions at the same time.</p>
<p>A trojan attack is similar, but the man-in-the-middle is hostile code actually running on the users computer.</p>
<p><strong>Not just a theoretical attack</strong></p>
<p>This isn&#8217;t just a theoretical attack. It&#8217;s fairly widespread, and probably underreported. One example from a couple of years ago is use of a trojan to<a href="http://www.technologyreview.com/computing/23488/?a=f" target="_blank"> steal half a million dollars</a> from a local company, despite their banks use of one-time-password, securid style two factor authentication. <a href="http://blog.washingtonpost.com/securityfix/2006/07/citibank_phish_spoofs_2factor_1.html" target="_blank">Here&#8217;s another</a>.</p>
<p>The accounts an ESP is protecting likely aren&#8217;t worth half a million dollars, so maybe bank-grade two factor authentication is good enough for them?</p>
<p>Another heavy user of two factor authentication is the online game World of Warcraft. They use a physical security fob or a smartphone app to generate one time passwords.</p>
<p><a href="http://blog.wordtothewise.com/wp-content/uploads/2011/04/wowsmart.png"><img class="aligncenter size-full wp-image-2852" title="wowsmart" src="http://blog.wordtothewise.com/wp-content/uploads/2011/04/wowsmart.png" alt="" width="200" height="230" /></a>As we&#8217;ve <a title="Targeted attacks via email – phishing for WoW gold" href="http://blog.wordtothewise.com/2011/04/targeted-attacks-phishing/" target="_blank">mentioned before</a> there&#8217;s a black market in stolen World of Warcraft accounts. They&#8217;re typically worth $8-$10 in bulk. And they&#8217;re being <a href="http://www.incgamers.com/News/21240/first-blizzard-authenticator-hack-confirmed" target="_blank">targeted by a key-logging trojan</a> that intercepts the authentication data and passes it to the attacker, who then can take control of the account until they log out.</p>
<p>That means it can be cost-effective for an attacker to use a reasonably sophisticated keylogger trojan to take control of an account worth $10 for a couple of hours, which is bad news if you&#8217;re relying on your customers accounts not being that high value a target.</p>
<p><strong>What value does 2FA have, then?</strong></p>
<blockquote><p>it won&#8217;t work for remote authentication over the Internet. I predict that banks and other financial institutions will spend millions outfitting their users with two-factor authentication tokens. Early adopters of this technology may very well experience a significant drop in fraud for a while as attackers move to easier targets, but in the end there will be a negligible drop in the amount of fraud and identity theft.<cite><a href="http://www.schneier.com/blog/archives/2005/03/the_failure_of.html">Bruce Schneier, 2005</a></cite></p></blockquote>
<p>2FA is a decent way to improve password security. It&#8217;s easier and cheaper to require some form of 2FA than it is to train your users to use good passwords, and not to reuse passwords. And they can be part of a decent security approach &#8211; though the inconvenience and support overhead might exceed their value. But focusing on 2FA as a security solution won&#8217;t protect you from most current attack vectors, and can distract you and consume resources you could better spend on more effective approaches.</p>
<blockquote><p>By concentrating on authenticating the individual rather than authenticating the transaction, banks are forced to defend against criminal tactics rather than the crime itself.<cite><a href="http://www.schneier.com/blog/archives/2005/04/more_on_twofact.html">Bruce Schneier, 2005</a></cite></p></blockquote>
<p>But two factor authentication is a <em>great</em> way to deal with some non-security related business problems, such as sharing of &#8220;flat fee&#8221; accounts by multiple users.</p>
<p>Two factor authentication is <em>not</em> a magic bullet for ESP security, and if it distracts you from implementing more effective (behaviour-based, rather than authentication based) security approaches then that narrow focus risks making your overall security worse.</p>
<p>Unless, that is, you&#8217;re defending solely against security threats from 1995.</p>
<p><a href="http://www.imdb.com/title/tt0113243/"><img class="aligncenter size-full wp-image-2854" title="Hackers" src="http://blog.wordtothewise.com/wp-content/uploads/2011/04/hackers.png" alt="" width="380" height="253" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/04/defending-against-the-hackers-of-1995/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>What is Two Factor Authentication?</title>
		<link>http://blog.wordtothewise.com/2011/04/what-is-two-factor-authentication/</link>
		<comments>http://blog.wordtothewise.com/2011/04/what-is-two-factor-authentication/#comments</comments>
		<pubDate>Thu, 28 Apr 2011 20:53:36 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[2fa]]></category>
		<category><![CDATA[Authentication]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2840</guid>
		<description><![CDATA[Two factor authentication, or the snappy acronym 2FA, is something that you&#8217;re going to be hearing a lot about over the next year or so, both for use by ESP employees (in an attempt to reduce the risks of data theft) and by ESP customers (attempting to reduce the chance of an account being misused [...]]]></description>
			<content:encoded><![CDATA[<p>Two factor authentication, or the snappy acronym 2FA, is something  that  you&#8217;re going to be hearing a lot about over the next year or so,  both  for use by ESP employees (in an attempt to reduce the risks of  data  theft) and by ESP customers (attempting to reduce the chance of an   account being misused to send spam). <a href="http://blog.wordtothewise.com/wp-content/uploads/2011/04/rsa-securid-tokens.jpg"><strong></strong><img class="aligncenter size-medium wp-image-2844" title="rsa-securid-tokens" src="http://blog.wordtothewise.com/wp-content/uploads/2011/04/rsa-securid-tokens-300x119.jpg" alt="" width="300" height="119" /></a><strong>What is Authentication? </strong></p>
<p>In computer security terms authentication is proving who you are &#8211; when you enter a username and a password to access your email account you&#8217;re authenticating yourself to the system using a password that only you know.</p>
<p><em>Authentication</em> (&#8220;who you are&#8221;) is the most visible part of computer access control, but it&#8217;s usually combined with two other A&#8217;s &#8211; <em>authorization</em> (&#8220;what you are allowed to do&#8221;) and <em>accounting</em> (&#8220;who did what&#8221;) to form an access control system.</p>
<p><strong>And what are the two factors?</strong></p>
<p>Two factor authentication means using two independent sources of evidence to demonstrate who you are. The idea behind it is that it means an attacker need to steal two quite different bits of information, with different weaknesses and attack vectors, in order to gain access. This makes the attack scenario much more complex and difficult for an attacker to carry out.</p>
<p>It&#8217;s important that the different factors are independent &#8211; requiring two passwords doesn&#8217;t count as 2FA, as an attack that can get the first password can just as easily get the second password. Generally 2FA requires the user to demonstrate their identity via two out of three broad ways:</p>
<ol>
<li>Something the user <em>knows</em> &#8211; a password or a PIN</li>
<li>Something the user <em>has</em> &#8211; a key, an ID card, a phone number, a digital certificate or a physical token</li>
<li>Something the user <em>is</em> &#8211; such as a fingerprint</li>
</ol>
<p>An everyday example of 2FA is using a cash machine or ATM. You insert your ATM card (something you <em>have</em>) and enter your PIN (something you <em>know</em>) to get access to your bank account. An attacker would have to both steal or copy your card <em>and</em> know your PIN to access your account. While a crooked waiter might be able to copy your card and someone could look over your shoulder to see your PIN, it&#8217;s much more difficult for an attacker to get both.</p>
<p>Most deployed 2FA systems work in much the same way. They require you to enter a password you know, and then to demonstrate that you have something in your possession &#8211; by having your computer present a digital certificate, or having you enter a number from a security token like those pictured above, or respond to an SMS message.</p>
<p><strong>Security problems solved, then?</strong></p>
<p>I&#8217;ll look at that tomorrow.</p>
<p>(Spoiler: <em>No</em>)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/04/what-is-two-factor-authentication/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Real. Or. Phish?</title>
		<link>http://blog.wordtothewise.com/2011/04/real-or-phish/</link>
		<comments>http://blog.wordtothewise.com/2011/04/real-or-phish/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 23:05:57 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[phishing]]></category>
		<category><![CDATA[redsnapper]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2756</guid>
		<description><![CDATA[After Epsilon lost a bunch of customer lists last week, I&#8217;ve been keeping an eye open to see if any of the vendors I work with had any of my email addresses stolen &#8211; not least because it&#8217;ll be interesting to see where this data ends up. Yesterday I got mail from Marriott, telling me [...]]]></description>
			<content:encoded><![CDATA[<p>After <a title="Epsilon Breach" href="http://www.magillreport.com/Epsilon-Valdez-How-Bad-Might-it-Get/" target="_blank">Epsilon lost a bunch of customer lists</a> last week, I&#8217;ve been keeping an eye open to see if any of the vendors I work with had any of my email addresses stolen &#8211; not least because it&#8217;ll be interesting to see where this data ends up.</p>
<p>Yesterday I got mail from Marriott, telling me that &#8220;unauthorized third party gained access to a number of Epsilon&#8217;s accounts including Marriott&#8217;s email list.&#8221;. Great! Lets start looking for spam to my Marriott tagged address, or for phishing targeted at Marriott customers.</p>
<p>I hit what looks like paydirt this morning. Plausible looking mail with Marriott branding, nothing specific to me other than name and (<a title="Tagged Email Addresses" href="http://blog.wordtothewise.com/2010/07/tagged-email-addresses/" target="_blank">tagged</a>) email address.</p>
<p>It&#8217;s time to play <a title="Wheel of Fish" href="http://www.youtube.com/watch?v=I_FoUcDSFxs" target="_blank">Real. Or. Phish?</a></p>
<p><b>1. Branding and spelling is all good. It&#8217;s using decent stock photos, and what looks like a real Marriott logo.</b></p>
<p><i>All very easy to fake, but if it&#8217;s a phish it&#8217;s pretty well done. Then again, phishes often steal real content and just change out the links.</i></p>
<p>Conclusion? Real. Maybe.</p>
<p><b>2. The mail wasn&#8217;t sent from marriott.com, or any domain related to it. Instead, it came from &#8220;Marriott@marriott-email.com&#8221;.</b></p>
<p><i>This is classic phish behaviour &#8211; using a lookalike domain such as &#8220;paypal-billing.com&#8221; or &#8220;aolsecurity.com&#8221; so as to look as though you&#8217;re associated with a company, yet to be able to use a domain name you have full control of, so as to be able to host websites, receive email, sign with DKIM, all that sort of thing.</i></p>
<p>Conclusion? Phish.</p>
<p><b>3. SPF pass</b></p>
<p><i>Given that the mail was sent &#8220;from&#8221; marriott-email.com, and not from marriott.com, this is pretty meaningless. But it did pass an SPF check.</i></p>
<p>Conclusion? Neutral.</p>
<p><b>4. DKIM fail</b></p>
<p><tt>Authentication-Results: m.wordtothewise.com; dkim=fail (verification failed; insecure key) header.i=@marriott-email.com;</tt></p>
<p><i>As the mail was sent &#8220;from&#8221; marriott-email.com it should have been possible for the owner of that domain (presumably the phisher) to sign it with DKIM. That they didn&#8217;t isn&#8217;t a good sign at all.</i></p>
<p>Conclusion? Phish.</p>
<p><b>5. Badly obfuscated headers</b></p>
<p><tt>From: =?iso-8859-1?B?TWFycmlvdHQgUmV3YXJkcw==?= &lt;Marriott@marriott-email.com&gt;<br />
Subject:  =?iso-8859-1?B?WW91ciBBY2NvdW50IJYgVXAgdG8gJDEwMCBjb3Vwb24=?=</tt></p>
<p><i>Base 64 encoding of headers is an old spammer trick used to make them more difficult for naive spam filters to handle. That doesn&#8217;t work well with more modern spam filters, but spammers and phishers still tend to do it so as to make it harder for abuse desks to read the content of phishes forwarded to them with complaints. There&#8217;s no legitimate reason to encode plain ascii fields in this way. Spamassassin didn&#8217;t like the message because of this.</i></p>
<p>Conclusion? Phish.</p>
<p><b>6. Well-crafted multipart/alternative mail, with valid, well-encoded (quoted-printable) plain text and html parts</b></p>
<p><i>Just like the branding and spelling, this is very well done for a phish. But again, it&#8217;s commonly something that&#8217;s stolen from legitimate email and modified slightly.</i></p>
<p>Conclusion? Real, probably.</p>
<p><b>7. Typical content links in the email</b></p>
<p><i>Most of the content links in the email are to things like &#8220;http://marriott-email.com/16433acf1layfousiaey2oniaaaaaalfqkc4qmz76deyaaaaa&#8221;, which is consistent with the from address, at least. This isn&#8217;t the sort of URL a real company website tends to use, but it&#8217;s not that unusual for click tracking software to do something like this.</i></p>
<p>Conclusion? Neutral</p>
<p><b>8. Atypical content links in the email</b></p>
<p>We also have other links:</p>
<ul>
<li>http://bp.specificclick.net?pixid=99015955
<li>http://ad.yieldmanager.com/pixel?id=550897&#038;id=95457&#038;id=102672&#038;id=515007&#038;t=2
<li>http://ad.doubleclick.net/activity;src=3286198;type=mari1;cat=rwdemls;ord=1; num=[Random Number]?
<li>http://ib.adnxs.com/seg?add=1519
<li>http://action.mathtag.com/mm//MARI//red?nm=rwdemls&amp;s0=&amp;s1=&amp; s2=&amp;v0=&amp;v1=&#038;ampv2=&amp;ri=[Random Number]
<li>http://media.fastclick.net/w/tre?ad_id=26033;evt=13686;cat1=14501;cat2=14505
<li>http://images.bfi0.com/creative/spacer.gif
</ul>
<p>(Those &#8220;[Random Number]&#8221; bits aren&#8217;t me hiding things. That&#8217;s literally what is in the email.)</p>
<p><i>That&#8217;s an awful lot of other servers this mail is going to try and contact when you read it. I&#8217;m pretty sure that most of those are tracking links (but how many legitimate emails that advertise a single company and which are sent directly by that company, need to use half a dozen independent affiliate tracking links?).</i></p>
<p>Conclusion? Doesn&#8217;t look terribly honest. Maybe some sort of affiliate scam rather than a phish, though.</p>
<p><b>9. Most of the links in the email go to marriott-email.com, but then immediately redirect to marriott.com.</b></p>
<p><i>This shows someone is tracking clicks, which is pretty common for mail sent via ESPs, so as to make click tracking information available to the client without the client having to do any work to capture data on their website.</i></p>
<p>Conclusion? Real.</p>
<p><b>10. The unsubscription link goes to a terrible page with a set of checkboxes, rather than providing a simple unsubscription button.</b></p>
<p>Conclusion? Sadly, that&#8217;s a sign that it&#8217;s real.</p>
<p><b>11. Sending network configuration</b></p>
<p>It was sent from a machine with reverse DNS of dmailer0112.dmx1.bfi0.com, but which claimed to be called dmx1.bfi0.com, not a valid hostname for the IP address it came from.</p>
<p><i>This is pretty common misconfiguration of the network that happens at larger ESPs with complex outbound smarthost farms. I&#8217;d expect a phisher not to have that sort of mistake if they were sending from their own machine or through a botnet. And while &#8220;dmx1.bfi0.com&#8221; could be an obscure end-user DSL, the reverse DNS of dmailer0112 looks like it&#8217;s a system intended to send email, not a botnet.</i></p>
<p>Conclusion? Real.</p>
<p><b>Final Conclusion</b></p>
<p>You&#8217;ve probably guessed by now. It&#8217;s real email, sent on behalf of Marriott Rewards through one of their ESPs. But if it takes me several minutes of groveling through the mail before I convince myself it&#8217;s real, what chance does a typical consumer have of telling the difference between a well targeted phishing email and a typical piece of commercial email?</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/04/real-or-phish/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Multipart MIME cheat sheet</title>
		<link>http://blog.wordtothewise.com/2011/03/multipart-mime-cheat-sheet/</link>
		<comments>http://blog.wordtothewise.com/2011/03/multipart-mime-cheat-sheet/#comments</comments>
		<pubDate>Mon, 14 Mar 2011 17:53:37 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[cheat]]></category>
		<category><![CDATA[crib]]></category>
		<category><![CDATA[mime]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2645</guid>
		<description><![CDATA[I&#8217;ve had a couple of people ask me about MIME structure recently, especially how you create multipart messages, when you should use them and which variant of multipart you use for different things. (And I&#8217;m working on a MIME parser / generator for Abacus at the moment, so it&#8217;s all fresh in my mind) So [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve had a couple of people ask me about MIME structure recently, especially how you create multipart messages, when you should use them and which variant of multipart you use for different things. (And I&#8217;m working on a MIME parser / generator for <a title="Abacus" href="http://wordtothewise.com/products/abacus.html">Abacus</a> at the moment, so it&#8217;s all fresh in my mind)</p>
<p>So I&#8217;ve put together a quick cheat sheet, showing the structure of four common types of email, and how their MIME structure looks.</p>
<p><b>Simple plain text</b></p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #00f; background: #ccf; margin: 10px; padding: 15px">
text/plain
</div>
<p><b>Plain text with a PDF attachment</b></p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #00f; background: #ccf; margin: 10px; padding: 15px">
multipart/mixed</p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
text/plain
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
application/pdf
</div>
</div>
<p><b>HTML with a plain text fallback</b></p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #00f; background: #ccf; margin: 10px; padding: 15px">
multipart/alternative</p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
text/plain
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
text/html
</div>
</div>
<p><b>HTML with embedded images and plain text fallback</b></p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #00f; background: #ccf; margin: 10px; padding: 15px">
multipart/alternative</p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
text/plain
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #f0f; background: #fcf; margin: 10px; padding: 15px">
multipart/related</p>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #ff0; background: #ffc; margin: 10px; padding: 15px">
text/html
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #ff0; background: #ffc; margin: 10px; padding: 15px">
image/gif
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #ff0; background: #ffc; margin: 10px; padding: 15px">
image/gif
</div>
<div style="border-radius: 10px; -moz-border-radius: 10px; border: 1px solid #ff0; background: #ffc; margin: 10px; padding: 15px">
image/jpeg
</div>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/03/multipart-mime-cheat-sheet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Yes, we have no IP addresses, we have no addresses today</title>
		<link>http://blog.wordtothewise.com/2011/01/yes-we-have-no-ipv4/</link>
		<comments>http://blog.wordtothewise.com/2011/01/yes-we-have-no-ipv4/#comments</comments>
		<pubDate>Tue, 01 Feb 2011 00:37:13 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Industry]]></category>
		<category><![CDATA[ip addresses]]></category>
		<category><![CDATA[ipv6]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2561</guid>
		<description><![CDATA[We&#8217;ve just about run out of the Internet equivalent of a natural resource &#8211; IP addresses. ICANN allocated the last couple of blocks of general usage IPv4 addresses to APNIC earlier today. There are just five usable blocks of addresses left, and they&#8217;re reserved by IANA policy for the final phase of IPv4 exhaustion, one [...]]]></description>
			<content:encoded><![CDATA[<p>We&#8217;ve just about run out of the Internet equivalent of a natural resource &#8211; IP addresses.</p>
<p><iframe width="132px" height="431px" frameborder="0" scrolling="no" src="http://ipv6.he.net/v4ex/sidebar/?" allowtransparency="true" style="margin-bottom: 10px; margin-right: 10px;" align="left"></iframe></p>
<p>ICANN allocated the last couple of blocks of general usage IPv4 addresses to APNIC earlier today.</p>
<p>There are just five usable blocks of addresses left, and they&#8217;re <a href="http://www.icann.org/en/general/allocation-remaining-ipv4-space.htm" target="_blank">reserved by IANA policy</a> for the final phase of IPv4 exhaustion, one for each RIR.</p>
<p>Like any other resource that&#8217;s been strip-mined to depletion that doesn&#8217;t mean that IP addresses are completely unavailable &#8211; there are still some in the delivery pipeline (the <a href="http://en.wikipedia.org/wiki/Regional_Internet_registry" target="_blank">regional internet registries</a> who handle allocation of addresses), ISPs have stockpiled addresses they don&#8217;t really need yet in anticipation of the exhaustion, and we can still recycle the addresses we&#8217;ve already got.</p>
<p>But there won&#8217;t be any new IPv4 addresses available.</p>
<p>What does that mean to you?</p>
<p>It&#8217;s going to affect your business in lots of ways over the next few years. And delaying thinking about it will just make it more expensive and more painful once you&#8217;re forced to pay attention.</p>
<div style="clear:both"></div>
<p><strong>IPv4 addresses are increasing in price</strong></p>
<p>You&#8217;re going to find it increasingly difficult to get assigned new ones. It&#8217;s long past time to stop thinking of them as &#8220;effectively free&#8221;.</p>
<p>You&#8217;re going to need IPv6 production deployments fairly soon, and we&#8217;re well past the point of &#8220;it&#8217;s cheaper to wait, so that our vendors can do the IPv6 work that&#8217;s needed&#8221;. Delaying putting IPv6 prototypes into the field is going to get increasingly painful and expensive.</p>
<p>If your CTO and CFO are not concerned about this yet, they should be.</p>
<p><strong>Stop wasting addresses</strong></p>
<p>Dedicating even a single IP address to a customer is wasteful, if you don&#8217;t need that to handle the volume of email sent. You should be signing all your mail with DKIM <em>today</em> and building domain based reputation for your customers, as IPv4 based per-customer reputation is going away.</p>
<p>As you grow and gain more customers you&#8217;re going to have to start sharing v4 addresses between customers, which will share their IPv4-based reputation.</p>
<p>And fairly soon, you&#8217;ll want to start sending mail over IPv6 to some destinations. Odds are good that per-customer IPv6 reputation isn&#8217;t going to happen much. There&#8217;ll be some broad IPv6 based reputation to distinguish your outbounds from spammers and botnets, sure, but IPv6 based reputation down to a per-customer level isn&#8217;t something you&#8217;re likely to see much benefit from.</p>
<p>Assigning multiple IPv4 addresses to a single customer is ridiculously wasteful. You&#8217;re going to want to engineer away from any need to do that &#8211; and have your sales staff stop promising it &#8211; so that you can recycle those precious, precious IPv4 addresses for use elsewhere by other customers.</p>
<p><strong>Your recipients are moving to IPv6</strong></p>
<p>Big consumer access ISPs are some of the biggest consumers of IP addresses. They&#8217;re likely going to be moving to native IPv6 for end-users, combined with some sort of v6-to-v4 translation before anyone else.</p>
<p>That means that your web users, from home and mobile devices, will be trying to reach you via IPv6 sooner rather than later. Native v6 will mostly work better than v6-to-v4. So you want to be thinking about v6 access soon.</p>
<p>Have you got v6 space assigned from your ISP(s) yet? No? Ask them for it this week.</p>
<p>Do you have plans for making your image and click-tracking webservers v6-aware?</p>
<p><strong>Everything you do with IPv4 you need to be able to do with IPv6</strong></p>
<p>Does your address list management support tracking signups and confirmations from IPv6 users as well as IPv4?</p>
<p>Can you track opens and click throughs from IPv6 users?</p>
<p>How about reporting? Does your database and reporting engine support IPv6 addresses? Can you do geolocation for IPv6 addresses?</p>
<p>Does your smarthost vendor support preferentially routing mail via IPv6?</p>
<p><strong>IPv6 is an opportunity</strong></p>
<p>What would you do if someone were to offer you dedicated MX machines at Yahoo, Google and Hotmail. Machines that were much the same as their primary MXes, but lightly loaded with much more available capacity. How much would you be prepared to pay for access to them?</p>
<p>Sooner or later there&#8217;ll be IPv6-only MXes at those, and other ISPs. Will you be ready to use them?</p>
<p>Do your competitors offer an IPv6-ready system yet? How much of a business disadvantage will you be at once they do?</p>
<p><strong>And finally</strong></p>
<p>None of this is <em>new</em>. You&#8217;ve known for years that you need to be more frugal with IPv4 addresses and have dual-stack IPv6-capable services.</p>
<p>But it&#8217;s getting <em>urgent</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2011/01/yes-we-have-no-ipv4/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Clicktracking 2: Electric Boogaloo</title>
		<link>http://blog.wordtothewise.com/2010/10/clicktracking-2-electric-boogaloo/</link>
		<comments>http://blog.wordtothewise.com/2010/10/clicktracking-2-electric-boogaloo/#comments</comments>
		<pubDate>Thu, 21 Oct 2010 23:51:47 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[Industry]]></category>
		<category><![CDATA[click through]]></category>
		<category><![CDATA[malware]]></category>
		<category><![CDATA[Technical]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2145</guid>
		<description><![CDATA[A week or so back I talked about clicktracking links, and how to put them together to avoid abuse and blocking issues. Since then I&#8217;ve come across another issue with click tracking links that&#8217;s not terribly obvious, and that you&#8217;re not that likely to come across, but if you do get hit by it could [...]]]></description>
			<content:encoded><![CDATA[<p>A week or so back I <a href="http://blog.wordtothewise.com/2010/10/clicktracking-link-abuse/" target="_blank">talked about clicktracking links</a>, and how to put them together to avoid abuse and blocking issues.</p>
<p>Since then I&#8217;ve come across another issue with click tracking links that&#8217;s not terribly obvious, and that you&#8217;re not that likely to come across, but if you do get hit by it could be very painful &#8211; phishing and malware filters in web browsers.</p>
<p><a href="http://blog.wordtothewise.com/wp-content/uploads/2010/10/malwarepopup.png"><img class="aligncenter size-full wp-image-2147" title="Malware Popup" src="http://blog.wordtothewise.com/wp-content/uploads/2010/10/malwarepopup.png" alt="Visting this site may harm your computer" width="590" height="284" /></a></p>
<p>First, some background about how a lot of malware is distributed, what&#8217;s known as &#8220;drive-by malware&#8221;. This is where the hostile code infects the victims machine without them taking any action to download and run it, rather they just visit a hostile website and that website silently infects their computer.</p>
<p>The malware authors get people to visit the hostile website in quite a few different ways &#8211; email spam, blog comment spam, web forum spam, banner ads purchased on legitimate websites and compromised legitimate websites, amongst others.</p>
<p>That last one, compromised legitimate websites, is the type we&#8217;re interested in. The sites compromised aren&#8217;t usually a single, high-profile website. Rather, they tend to be a whole bunch of websites that are running some vulnerable web application &#8211; if there&#8217;s a security flaw in, for example, WordPress blog software then a malware author can compromise thousands of little blog sites, and embed malware code in each of them. Anyone visiting any of those sites risks being infected, and becoming part of a botnet.</p>
<p>Because the vulnerable websites are all compromised mechanically in the same way, the URLs of the infected pages tend to look much the same, just with different hostnames &#8211; <em>http://example.com/foo/bar/baz.html</em>, <em>http://www.somewhereelse.invalid/foo/bar/baz.html</em> and <em>http://a.net/foo/bar/baz.html</em> &#8211; and they serve up just the same malware (or, just as often, redirect the user to a site in russia or china that serves up the malware that infects their machine).</p>
<p>A malware filter operator might receive a report about <em>http://example.com/foo/bar/baz.html</em> and decide that it was infected with malware, adding <em>example.com</em> to a blacklist. A smart filter operator might decide that this might be just one example of a widespread compromise, and go looking for the same malware elsewhere. If it goes to <em>http//a.net/foo/bar/baz.html</em> and finds the exact same content, it&#8217;ll know that that&#8217;s another instance of the infection, and add <em>a.net</em> to the blacklist.</p>
<p>What does this have to do with clickthrough links?</p>
<p>Well, an obvious way to implement clickthrough links is to use a custom hostname for each customer (&#8220;<em>click.customer.com</em>&#8220;), and have all those pointing at a single clickthrough webserver. It&#8217;s tedious to setup the webserver to respond to each hostname as you add a new customer, though, so you decide to have the webserver ignore the hostname. That&#8217;ll work fine &#8211; if you have customer1 using a clickthrough link like <em>http://click.customer1.com/123/456/789.html</em> you&#8217;d have the webserver ignore &#8220;<em>click.customer1.com</em>&#8221; and just read the information it needs from &#8220;<em>123/456/789.html</em>&#8221; and send the redirect.</p>
<p>But that means that if you also have customer2, using the hostname <em>click.customer2.com</em>, then the URL <em>http://click.customer2.com/123/456/789.html</em> it will redirect to customer1&#8242;s content.</p>
<p>If a malware filter decides that <em>http://click.customer1.com/123/456/789.html</em> redirects to a phishing site or a malware download &#8211; either due to a false report, or due to the customers page actually being infected &#8211; then they&#8217;ll add <em>click.customer1.com</em> to their blacklist, meaning no <em>http://click.customer1.com/</em> URLs will work. So far, this isn&#8217;t a big problem.</p>
<p>But if they then go and check <em>http://click.customer2.com/123/456/789.html</em> and find the same redirect, they&#8217;ll blacklist <em>click.customer2.com</em>, and so on for all the clickthrough hostnames of yours they know about. That&#8217;ll cause any click on any URL in any email a lot of your customers send out to go to a &#8220;This site may harm your computer!&#8221; warning &#8211; which will end up a nightmare even if you spot the problem and get the filter operators to remove all those hostnames from the blacklist within a few hours or a day.</p>
<p>Don&#8217;t let this happen to you. Make sure your clickthrough webserver pays attention to the hostname as well as the path of the URL.</p>
<p>Use different hostnames for different customers clickthrough links. And if you pick a link from mail sent by Customer A, and change the hostname of that link to the clickthrough hostname of Customer B, then that link should fail with an error rather than displaying Customer A&#8217;s content.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2010/10/clicktracking-2-electric-boogaloo/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Clicktracking link abuse</title>
		<link>http://blog.wordtothewise.com/2010/10/clicktracking-link-abuse/</link>
		<comments>http://blog.wordtothewise.com/2010/10/clicktracking-link-abuse/#comments</comments>
		<pubDate>Mon, 11 Oct 2010 22:26:24 +0000</pubDate>
		<dc:creator>steve</dc:creator>
				<category><![CDATA[Best Practices]]></category>
		<category><![CDATA[click tracking]]></category>
		<category><![CDATA[cryptography]]></category>
		<category><![CDATA[Technical]]></category>
		<category><![CDATA[unsubscribe]]></category>
		<category><![CDATA[VERP]]></category>

		<guid isPermaLink="false">http://blog.wordtothewise.com/?p=2095</guid>
		<description><![CDATA[If you use redirection links in the emails you send out, where a click on the link goes to your server &#8211; so you can record that someone clicked &#8211; before redirecting to the real destination, then you&#8217;ve probably already thought about how they can be abused. Redirection links are simple in concept &#8211; you [...]]]></description>
			<content:encoded><![CDATA[<p>If you use redirection links in the emails you send out, where a click on the link goes to your server &#8211; so you can record that someone clicked &#8211; before redirecting to the real destination, then you&#8217;ve probably already thought about how they can be abused.</p>
<p>Redirection links are simple in concept &#8211; you include a link that points to your webserver in email that you send out, then when recipients click on it they end up at your webserver. Instead of displaying a page, though, your webserver sends what&#8217;s called a &#8220;302 redirect&#8221; to send the recipients web browser on to the real destination. How does your webserver know where to redirect to? There are several different ways, with different tradeoffs:</p>
<p><span id="more-2095"></span></p>
<p><strong>The simplest approach</strong></p>
<p>The simplest sort of redirection link includes the final destination in the link itself &#8211; something like <em>http://click.example.com/cnn.com/WORLD/</em>. The webserver at click.example.com would simply strip off the first part of the link, and redirect to the remainder &#8211; <em>cnn.com/WORLD/</em>.</p>
<p>This is nice, because it&#8217;s fairly transparent to the recipient &#8211; when they hover over the link in their mail client or webmail it&#8217;ll be fairly clear where it&#8217;s going.</p>
<p>But it has several limitations. One is that you can&#8217;t really record very much data about the click &#8211; you know where it was redirecting to, but almost nothing else.</p>
<p>The bigger problem is that it&#8217;s very easy for a spammer to abuse &#8211; they can send out spam that has the link <em>http://click.example.com/onlinepharmacy.ru/order.html</em>, to hide their real link from spam filters, and your webserver will happily redirect recipients to go there. Or, worse, that can be used to redirect to a website hosting viruses. That can cause all sorts of problems for your reputation, up to and including having your redirection webserver blacklisted by antivirus and antiphishing organizations, meaning it&#8217;ll be blocked by many web browsers.</p>
<p><strong>Add some metadata</strong></p>
<p>Some of the things you might want to be able to record about a click would be which customers mail it was found in, which mailing campaign and which recipient it was sent to. This would let you do more sensible reporting and click-tracking, and also let you spot when a link is misused in some way (for example, thousands of clicks on a url that was sent to just one recipient).</p>
<p>That might look like <em>http://click.example.com/123/456/789/cnn.com/WORLD/</em>. Your webserver would strip off the first four parts, recording a click for customer <em>123</em>, campaign <em>456</em> and recipient <em>789</em>, then redirect to the remainder &#8211; <em>cnn.com/WORLD/</em></p>
<p>This lets you do better reporting and is still fairly transparent to the recipient, but can still be abused in the same way.</p>
<p><strong>Use a database</strong></p>
<p>If you stored every link you wanted to redirect to in  a database you could simply store a unique key for each link &#8211; so you might record that key <em>2718</em> means <em>http://cnn.com/WORLD/</em>. Then the redirection URL might look like <em>http://click.example.com/123/456/789/2718</em></p>
<p>This lets you do good reporting and is much more difficult for spammers to abuse (but not impossible &#8211; if the spammer signs up for a free or demo account on your system, then sends a test email to themselves, they can then reuse the links that they received in that mail).</p>
<p>But it&#8217;s fairly opaque to the recipient &#8211; they have no idea where the link will go. And it requires maintaining a database of every link you&#8217;ve ever used, for as long as it&#8217;s valuable (which could easily be several years if a recipient goes back to an old newsletter) and requires a database lookup for every click &#8211; which adds a fair bit of infrastructure you need to keep working 24/7 just to make links work.</p>
<p><strong>Use a database and a cosmetic link</strong></p>
<p>You could take the database format and add the final destination link on the end &#8211; like this <em>http://click.example.com/123/456/789/2718/cnn.com/WORLD/</em> &#8211; and then just ignore everything after the url key (<em>2718</em>). That&#8217;ll work exactly the same way, but the final destination will be fairly transparent to the recipient.</p>
<p>This still can&#8217;t be abused by spammers, as if they try to use <em>http://click.example.com/123/456/789/2718/mypharmacy.ru</em>, it&#8217;ll still just redirect to http://cnn.com/WORLD/ as the only meaningful bit of the redirection link is the &#8220;<em>2718</em>&#8220;.</p>
<p><strong>Cryptographically sign your links</strong></p>
<p>A different approach is to record all the information you need in the link and to also add a cryptographic signature to prevent people from misusing it. This is much simpler than the word &#8220;cryptography&#8221; suggests, you just need to use a magic word (we&#8217;ll use &#8220;albatross&#8221;) and know about the md5() function.</p>
<p>You start off with the same destination string we used in <strong>Add some metadata</strong> &#8211; &#8220;<em>/123/456/789/cnn.com/WORLD/</em>&#8220;. Then you add the magic word on the end, to give &#8220;<em>/123/456/789/cnn.com/WORLD/albatross</em>&#8220;, and take the md5 &#8220;hash&#8221; of that. That&#8217;s some cryptographic black magic that&#8217;ll give you a string of letters and numbers that&#8217;s a &#8220;fingerprint&#8221; of that string. It&#8217;ll look something like &#8220;<em>609a78b941bdf9f045cadcfa2e09d54c</em>&#8220;. Then you combine that with the destination string to look like this:</p>
<p><em>http://click.example.com/609a78b941bdf9f045cadcfa2e09d54c/123/456/789/cnn.com/WORLD/</em></p>
<p>Then, when your webserver sees this link it splits it into the hash (<em>609a78b941bdf9f045cadcfa2e09d54c</em>) and destination string (<em>/123/456/789/cnn.com/WORLD/</em>). It then does exactly the same thing you did when you created the link &#8211; appends the magic word to the destination string to give &#8220;<em>/123/456/789/cnn.com/WORLD/albatross</em>&#8221; and takes the md5 hash of that string. If the result of that matches the hash in the link, it knows it&#8217;s a valid redirection link and it can record the click-tracking data and forward to the destination link. If the result doesn&#8217;t match it knows that the link has been tampered with, and can return an error page.</p>
<p>To generate the link in PHP would be something like this:</p>
<pre>$destination = "/$customerid/$campaignid/$recipientid/$link";
$clicktrack = 'http://click.example.com/' . md5($destination . 'albatross') . $destination;</pre>
<p>This is much cheaper to generate and validate than using a database, even a typical in-memory database.</p>
<p><strong>Which to use?</strong></p>
<p>Don&#8217;t use the simple approach &#8211; it&#8217;ll get abuse sooner or later and you&#8217;ll regret it. Any of the database or cryptographic approaches work just fine, though the cryptographic approach may be easier to scale up and maintain. The database approaches make it easier to disable a link, or direct it to somewhere else at a later point, in case of abuse or some other need.</p>
<p><strong>What else is it good for?</strong></p>
<p>You can use the same sort of approach to validate unsubscription links and VERP return paths for bounce handling. And &#8220;open tracking&#8221; using these sort of links for image URLs, if you find that a useful metric to offer.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.wordtothewise.com/2010/10/clicktracking-link-abuse/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

