[display-names] Initial Thoughts on Display Name Defenses

Wed Mar 27 10:13:12 PDT 2013

Murray - Thanks for setting up this list.

Display Name Defenders -

As we know, defending against domain name abuse is a tricky subject. 
It's clear that it's permissible under RFC5322 to allow arbitrary text
to be included in the "display-name" part of the "From" field.  So it's
possible (and even reasonable) to send a message like:

-----
| To: "Jane Smith" <jane.smith at emailaddress.com>
| From: "Customer Service @Company.com" <customer.service at company.com>
-----

Unfortunately, this also means there's nothing to stop someone from
sending a message like:

-----
| To: "John Doe" <john.doe at emailaddress.com>
| From: "legitimate at brand.com" <attacker at spoofer.com>
-----

Many email clients will happily display "legitimate at brand.com" as the
sender, while hiding the "address-spec" part of the "From" field.  The
result is that John Doe can be forgiven for thinking that the mail is
legitimate.

Spoofed messages like this will look even more legitimate to the
receiver if the attacker sets up an SPF record, signs the mail using
DKIM, and publishes a DMARC record (assuming alignment with the
"spoofer.com" domain).

I would like to explore if it would be reasonable to consider a means by
which the display-name part of the From field appears to include what
looks like an email address.  If so, there will be value comparing it
(even if only the registered domain name) to the address in the
address-spec part.  If they are not equal, the mail could be treated as
(highly) suspect, if not rejected outright.

I'm aware that there are a number of ways by which a determined attacker
could try to fool such a system (eg. using left-to-right overrides). 
But setting that aside, and before we get too far ahead of ourselves
dreaming up solutions, I'd like to see if we could build a data-driven
analysis of usage patterns in the wild.

For example, those who have access to a large corpus of mail could
potentially mine their data to see how often a rudimentary RegEx turns
up an email address in the display-name that doesn't match the one in
the address-spec.  Then, by evaluating those, we may be able to
determine how often such a case represents legitimate mail.  My
hypothesis is that the number of legitimate cases like this will be very
small, likely along the lines of:

-----
| To: "Bill Jones" <bill.jones at emailaddress.com>
| From: "surveys at company.com" <company.surveys at marketing.com>
-----

Once we have the data, though, we can build an understanding of how the
practice is used.  With that we can begin to consider possible solutions.

Anyway, soes this approach sound like a reasonable path forward to begin
to wade into the waters?

- Trent

-- 
J. Trent Adams

Profile: http://www.mediaslate.org/jtrentadams/
LinkedIN: http://www.linkedin.com/in/jtrentadams
Twitter: http://twitter.com/jtrentadams