While trying to catch up with a huge backlog of recent email and potential blog topics, I came across this article that Graham Cluley posted a few days ago on Internationalized Domain Name (IDN) homograph attacks, the kind of spoofing attack where a site address looks legitimate but is not what it seems because a character or characters has been substituted deceptively (a technique very commonly used in phishing). The example central to Graham's article is the work of security researcher Paul Moore, who registered the domain IIoydsbank.co.uk and then invited Twitter users to spot the difference. Just to make things more interesting, Moore bought a TLS certificate from Cloudflare for his site so that it showed the green padlock that is meant to reassure us that all is secure.

green padlock

Clearly we shouldn't overestimate the security conferred by that little graphic and the magic 'https' at the start of URLs. Apart from the fact that phishing sites and pop-ups have often made use of fake padlock graphics, the fact that traffic is encrypted doesn't mean that the traffic can't be malicious. However, Lloyds' customers are not in danger from IIoydsbank.co.uk, since Moore has apparently transferred the domain to the real Lloyds.

In fact, this is a pretty basic example of a long-known attack. Sorry, but I'm going to quote myself (from an earlier article here), starting with a 'simplistic example'

'… like IIoydsbank.com, where I’ve substituted a capital I [i] for each of the two Ls at the beginning.' A common variation today is to use a homoglyph: in the Unicode character set there are many characters that look to the casual eye (at least in some fonts) very much like others, but are for purposes of identifying a web address completely different.

(It seems to have become a convention in security that we can't look at fake bank sites without mentioning Lloyds, in much the same way that we can't talk about public key cryptography without using Alice and Bob as examples.)

To take another example from the same blog that I expanded on in a later article for ITSecurity, in the right font, a lower case 'o' is indistinguishable from an omicron. The example below is a screenshot from Notepad, where I used an omicron instead of an 'o' in one of the two versions of welivesecurity.com.

glyphs2

The first one is the 'fake', in case you were wondering.

 

Since I don't think there's such a thing as a .com domain where the middle character is an omicron, that might not be the best example. But I suspect that there are plenty of financial institutions whose names include a spoofable 'o', and there are plenty of other usable homographs: fortunately, they work better in some fonts than others, so in some contexts they are much easier to distinguish.

Only when I came upon Graham's article as cited above did I realize that he'd actually looked at this issue in a WeLiveSecurity article before I did, else I'd have drawn attention to it in my own more recent articles, as it's a pretty neat summary.

However, as Martijn Grooten pointed out in a Virus Bulletin article last year:

In practice, hardly any homoglyph attacks have been seen in the wild, despite the technology having been widely implemented.

But that doesn't mean there isn't an issue here, that it doesn't happen, or that it couldn't happen more often. And while Paul Moore is far from the first person to raise the topic, he's certainly brought it to the attention of a good few more potential victims.

Fortunately, you can reduce the risk drastically and painlessly. Don't be too eager to trust links in email and other messages, or on web sites where you didn't expect to find yourself. If your bank needs you to log in somewhere, use links and pages you already know to be kosher. There are many ways in which to disguise a malicious link as a legitimate address: this is just one of them.

It has to be said that some service providers are sufficiently aware of the issue to implement countermeasures. Google, for instance, though it has supported non-Latin characters in Gmail since summer 2014, has also taken advantage of the Unicode Consortium's Identifier Profiles for Restriction-Level Detection as a guide for rejecting 'suspicious combinations' of characters. However, even implementation of the strictest level of restriction available, ASCII-Only, would not have triggered an automatic alarm in this case, since there is no restriction on mixed case that I'm aware of. On the other hand, as Graham pointed out, the certificate provider that supplied Paul Moore with a TLS certificate, could at least have looked more carefully at an application from a site whose name included the word 'bank'. Perhaps DNS registration services could take the same precaution: that doesn't seem too hard to automate.