Phish, Phowl, and Passwords

I spend a lot of time defending educational as opposed to purely technical solutions to security. Not that I don't believe in the usefulness of technical solutions: that is, after all, ESET's basic business. However, there are many people in the security business who believe that education is a waste of time because it isn't 100% effective. Unfortunately, you can make the very same argument against any technological solution. Randy Abrams and I discussed that conflict of ideas at some length in a paper for AVAR: see People Patching: Is User Education Of Any Use At All? And Robert Slade made some excellent points more recently in a blog about Security unawareness.

Static passwords are a pretty good example of a technology that's proved to be less than 100% effective time and time again, yet is considered effective enough to remain the authentication mainstay of many a web service. Well, I could argue that it's not so much about effectiveness, as a trade-off between effectiveness in terms of privacy, and the cost of implementing better authentication mechanisms. But that's a discussion for another time.

There's a proverb to the effect that if you give a man a fish, you feed him for a day: teach him to fish, and you feed him for a lifetime. While the provenance of that saw is obscure, it's worth examining more closely in the context of security, though in that context it might be better recast as If you show a man a phish, you prevent him from falling for that one: if you teach him to recognize phishing, you save yourself and him a lot of hassle. And, in fact we've written quite a lot about phishing in the past:

Quite a Few Pairs of Breaches

However, right now I'd like to apply that thought to password practice, an area of security (or, more accurately, privacy) that's probably of more immediate concern to many of us. In a year that's so far been most notable for the number of major password breaches. On more than one occasion I've quoted Mark Burnett's top 500 and one or two similar lists of the most-overused passwords, and recently I've noted quite a few journalists citing their own lists, but what does this teach the man-in-the-street (especially if he's doing his on-line banking on his smartphone as he wanders down to the pub) about password choices?

Well, it isn't entirely useless, or I wouldn't have bothered in the first place (or, to be precise, the second place, i.e. at the time of the Yahoo! debacle. Sometimes a service uses the three strikes and out' approach to regulating password or passcode authentication, suspending an account after three failed attempts to supply the correct password, so avoiding the top 25   (say) most over-used passwords may be good enough to secure the account from an opportunistic attack using common passwords, and even where the suspension is automatically lifted after a preset time, that does at least reduce the potential effectiveness of a dictionary or guessing attack. But simply listing the top umpteen bad passwords isn't really teaching anyone anything about password selection except to avoid a tiny handful of the billions of possible passwords and passphrases.

Horrific Heuristic

And in fact, that tiny handful, whether it's 25 or 10,000, remains tiny even when you measure it against the millions of combinations that will be tried in a determined dictionary attack. In a recent Securiteam blog, I compared the two approaches in these terms. If you simply offer a list of bad passwords ordered by prevalence, you are effectively offering a series of micro-heuristics like this:

Don't use a'
Don't use aa'
Don't use aaa'
…
Don't use aaaaaaaaaaaaaaaaaaaaaaa'
Don't use b'
Don't use bb'

Valid heuristics, yes, but it saves an awful lot of typing just to say: Don't use any password consisting of a single character repeated N times.  Or even password is a really, really bad choice of password: it's so obvious that everyone uses it, and letmein isn't much better.

So let's look (for the third and probably last time in this blog, at any rate) at that list of 25 passwords again. But rather than ranking them by how commonly they're used (the sorted by prevalence column) let's look at the alphanumeric order and see if that enables us to extract any heuristics more useful than don't use any of these 25 strings'.

Sort of Sorted

Original Ranking Sorted by prevalence Sorted alphanumerically
1 password 111111
2 123456 1234
3 12345678 12345
4 1234 123456
5 qwerty 1234567
6 12345 12345678
7 dragon 2000
8 pussy 696969
9 baseball abc123
10 football baseball
11 letmein dragon
12 monkey football
13 696969 harley
14 abc123 jennifer
15 mustang jordan
16 michael letmein
17 shadow master
18 master michael
19 jennifer monkey
20 111111 mustang
21 2000 password
22 jordan pussy
23 superman qwerty
24 harley shadow
25 1234567 superman

Well, that's interesting and maybe a little unexpected. In fact, it demonstrates the dangers of (1) using too small a dataset and (2) making assumptions about how applicable those data are in different contexts. Having done some analysis on purely numeric data as well as with larger password datasets, I know that the rule I mentioned above Don't use any password consisting of a single character repeated N times is pretty sound in the context of both alphanumeric passwords and purely numeric strings (especially PINs Personal Identification Numbers: see Hearing a PIN drop and PIN Holes: Passcode Selection Strategies), but that heuristic isn't specifically supported by this small dataset, where only one such password 111111 is represented. So you'll have to take my word for it that in larger datasets, other single character passwords (numeric and alphabetical) are indeed (over-)used and therefore a bad choice.

Rules are Rules

A rule that does hold, however, is that passwords consisting of an ascending series of numbers starting at 1 are not a great or unique and original idea. The following all appear in our list above, all but one being in the top 6.

  • 1234
  • 12345
  • 123456
  • 1234567
  • 12345678

Curiously, 1234567 comes in at number 25. That may be related to the fact that many authentication mechanisms enforce (or used to enforce) a minimum of only six characters: people who take this easy route to selecting a password are not likely to go to seven characters if they only need six. A seven-character minimum is pretty unusual. However, when services started to get more password-conscious (or entropy-conscious), many started to use an eight-character minimum, which probably explains why 12345678 ranks so highly. 1234 is also very highly ranked in PIN prevalence data, by the way.

There may actually be two reasons why people favour this group of numeric strings.

  1. It's not difficult to remember a simple increment-by-one series like this: all you have to do is remember when to stop.
  2. But you hardly need to remember the series at all: all you have to do on most computer keyboards is finger-step your way along the appropriate row of the keyboard. Which certainly also explains the presence of QWERTY, the first six alphabetical characters on the next row down on a standard keyboard. And yes, people do user QWERTYUIOP or a subset thereof when they need a longer password. In countries that use a slightly different layout on that row AZERTY, for example we see reports of the modified string or substring being used instead of a QWERTYUIOP substring. (See PIN Holes: Passcode Selection Strategies.)

What about 2000? Well, that's too popular to be a good choice, of course. But why 2000? Probably because people quite often use memorable dates, even just a year where they can get away with 4 digits, as in the context of many PINs. However, it's pretty safe to assume that memorable years (1066, 1492, 2000, 2001, any recent Olympic year) will be high on a password guesser's list, and where an automated attack can be implemented, it doesn't take long to cycle through all the possible 4-digit combinations.

Then there's 696969. I have a theory about why that one is so popular, and while the popularity of pussy (which is also in this top 25) is no doubt because cat lovers need passwords too, there are several other words and phrases likely to be sex related - including four-letter words - that aren't in this list, but do turn up in several others. I'm not particularly prudish myself, but I would suggest that if you think that no-one else ever used an obscenity or a word related to sexual practices as a password, you should think again.

There is just one mixed alphanumeric string in this list abc123 but there are several others that turn up in other lists, including such venerable items as NCC1701, better known as the USS Enterprise. Well, you might want to avoid those two.

Back to the Drawing Board

So we have several sport-related passwords: clearly baseball and football are too popular to be a good idea, but you'll find that other popular sports also make over-popular passwords (Michael and Jordan? Hmm...). But then, any word you're likely to find in a dictionary is going to be guessed eventually (i.e. sooner rather than later) in an automated attack. We could look at the psychology behind the other choices of dictionary word that make up the rest of this list, but there doesn't seem to be a lot of point to it. Clearly, there isn't much potential for useful heuristics in a top 25. So for the next blog in this series, I'm going to abandon the Top Umpteen approach altogether and start again from the basics of sound password selection. If you'd like to try a more flippant approach, though, you might want to take a look at A Torrent of Abuse for an attempt at password advice through parody.

Remember, though, that any password is only as good as the service to which it gives access: it doesn't matter how hard to guess it is, if the service provider is incapable of providing competent security to keep a competent password secure.

A Teasing Conclusion

So here's a quick summary of the little that we can learn from this top 25:

  • Avoiding the most popular passwords is safer than using one of them, especially the top three. But avoiding even the top 100, 1000, or 10,000 is only good enough if the authentication mechanism is well-implemented and your passwords are well-protected by the provider on its own systems.
  • Passwords, passphrases and PINs consisting of a single character repeated are very, very unsafe.
  • Any numeric or digital series ascending in increments of one or more is vulnerable to a guessing attack, a dictionary attack, or an algorithmic attack. So any substring of 0123456789 or abcdefghijklmnopqrstuvwxyz is likely to fail pretty quickly.
  • Any password or passphrase that can be found in a dictionary is easily crackable if the authentication mechanism allows a dictionary attack.
  • Passwords with a sexual connotation or using swearwords are very widely used, and therefore highly vulnerable to a guessing or dictionary attack.

In addition, a decent password manager saves you a lot of thinking in terms of generating a hard-to-crack password and reduces the temptation to re-use passwords and risk a cascade of breaches when one of your providers slips up, as so many have done recently. I'm looking at password management software at the moment, and while I'm reluctant to make too-specific recommendations, I'll be trying to give you some idea of what to look for in password management in another forthcoming article.

David Harley CITP FBCS CISSP
ESET Senior Research Fellow