Last chapter got pretty intense with the technical talk, but it was important in order to understand the fundamentals of security. Now that that’s out of the way, we can take a huge leap forward from the basics and talk about the more practical dimensions of security and how to stay safe during day-to-day computer usage.

HTTPS

The first point of business is to put all of that encryption we just learned about to work to secure the Internet. Recall that all information on the Internet can pass through several intermediate computers before reaching its destination. Without some extra work, any information sent over the Internet is susceptible to interception by a third party. In other words, how can you ensure that no one other than the intended recipient is receiving your data?

This problem can be broken down into two parts:

  1. How can you ensure that your packets are being routed to the correct destination and not to some other site (which could be owned by scammers)?
  2. If your packets are being routed correctly, how can you ensure that intermediate computers aren’t reading their contents before passing them on?

For example, if you’re trying to do your online banking, scammers might redirect your packets to their own site which pretends to be the site of your bank. Or maybe you’re intentionally connecting to a website that looks like it’s owned by your bank but actually isn’t! Or maybe you are connected to your bank’s actual site, but some computer on the Internet is recording all of your account information as it routes your packets. In terms of the types of security mentioned in the last chapter, our concerns here are mainly confidentiality and authenticity.

Both of these concerns are resolved by using an application layer protocol called HTTPS (rather than HTTP).

HTTP Secure (HTTPS) is a protocol that wraps HTTP in an additional protocol layer called Transport Layer Security (TLS).

In case you’ve lost track of all the protocols, here’s what all of the protocol layers would look like when you use HTTPS:

IP

TCP

TLS

HTTP

Certificates

The first thing that TLS does when connecting to a website is request the site’s “digital certificate”. The certificate is some information that your computer can use to verify that the site is the one you intended to connect to. This verification works using a concept I call “inherited trust”.

Inherited trust is the idea that if Alice trusts Bob and Bob trusts Charlie, Charlie inherits Alice’s trust of Bob, so Alice trusts Charlie.

So how can you trust the authenticity of the site you’re connecting to using inherited trust? The solution is to introduce a neutral third party.

A certificate authority (CA) is a trustworthy, neutral third party responsible for verifying the trustworthiness and identity of websites.

When the owner of a website wants people to be able to connect to their site using HTTPS, they contact a CA to verify their identity. If the CA trusts them, it issues them a certificate encoding the site’s identity and the CA’s identity (among other things). When your web browser receives a site’s certificate via TLS, it checks to see if the site matches the certificate and if the certificate was issued by one of the CAs that the browser trusts. If it trusts the CA that issued it, the connection protocol proceeds. Otherwise your browser should pop up a scary warning.

You (hopefully) trust your browser, so if your browser trusts the CA, and the CA trusts the site, you should trust the site.

In order for the certificate to be trusted, it must be impossible for the website owner to forge it. This is achieved using a combination of hashing (described later) and a twist on a public key cipher. The twist is that the encryption key is private and the decryption key is public (this is reversed from the last chapter). The CA encrypts every certificate using their secret key. Everyone can read the certificates since the decryption key is public, but since only the CA knows the encryption key this verifies that they created the certificate.

Encryption

After the identity of the website has been verified, your browser extracts a public encryption key from the certificate and the rest of the communication occurs under the confidentiality of encryption (using a key exchange as described previously).

An important thing to notice here is that the encryption (TLS) is inside the Internet and transport layers. That means that even if two computers are using HTTPS to hide the contents of their communication, they can’t hide the fact that they’re communicating in the first place. Every message they send will show the IP addresses of the sender/receiver in the open.

Hiding the IP addresses of the sender and receiver is still possible, but it’s much more complicated (and is rarely needed). One method of doing so is called onion routing.

Whenever you’re doing something on the Internet that you wouldn’t want someone spying on (such as making purchases are checking e-mail), make sure that you’re using HTTPS. You can tell what protocol your browser is using from the beginning of the address: it should be http:// or https://.

Passwords

Probably the most common security feature that users bump up against is passwords. Everyone understands the concept: choose a secret phrase initially, repeat that phrase later to gain access. Not quite so many people intuitively know that passwords like “12345” and “password1” aren’t very secure. And far fewer people still actually know why such passwords aren’t secure.

To understand this topic, we will imagine ourselves as the administrators of some website that uses passwords to secure users’ accounts.

Like most websites, we will assume that we have some interface where a user can choose a password and it is sent to us over the Internet. Later, when that user wants to gain access to their account, they can type in a password and again send it to us over the Internet. It’s then up to us to make sure that that password matches the original.

So how should we handle our users’ passwords?

Plaintext

Of course the simplest way would be to just store their original password. Then when they try to log in later, we can simply compare their password attempt to the stored version. Easy! And awful.

Why is this so awful? Because it presents a single massive point of failure for our website: suppose a hacker or a disgruntled employee got access to our store of passwords—they could then access every single account! And it can get even worse. Have you ever reused your username and password on different sites? Well if you did and someone steals your plaintext password on one site, they’ll now have a much easier time getting access to your account on other sites.

Password Safety Tip #1: choose unique passwords for each site. If you reuse passwords and any of the sites store your password insecurely, it’s as if every site stores it insecurely.

Have you ever used one of those “I forgot my password” things where they e-mail you to help you get access to your account? Did they help you get into your account by e-mailing you the password you chose (rather than making you create a new password)? If so, be worried! That is a dead giveaway that the site is storing your password in the clear—practically begging for some hacker to steal it. Avoid trusting such sites with sensitive information.

Hashes

So plaintext passwords are out. How can we remember a user’s password information so that we can compare it later, while not leaving the password unprotected? If you thought of encryption, you’re on the right track. But encryption isn’t quite the solution we need: if we encrypt the passwords, we still need a key to decrypt them. That would simply move the single point of failure from the collection of passwords to the decryption key.

If you have a really good memory, you might recall that we had a similar problem before: when Alice was sending Bob a book via the worst postal service ever, we were concerned that the book Bob received might not match the version Alice sent. In that case, we solved our problem using a checksum: a small bit of data that acted like a summary of the data being transmitted. If the summaries didn’t match, it was pretty unlikely that the book matched the original, and vice versa. So maybe a checksum is the tool we’re looking for. When a user chooses their password, we compute a checksum of the password and store that. Then when they access their account later, we again compute the checksum and compare.

This is getting close to being secure, but again it’s not quite what we’re looking for. A checksum, as it’s name implies, is more about checking that data are correct. What we need is something that provides security.

First, let’s take a step back. A checksum is actually a member of a broader class of functions called hashes.

A hash is a function which takes an arbitrary number of bytes as input and outputs a fixed number of bytes. The act of putting data in a hash function to get output is called hashing the data. The output of a hash function is called a digest or sometimes—confusingly—a hash.

So whether you give your hash a gigabyte of input or 0 bytes of input, it always produces some fixed number of bytes of output. If you look back at our definition of checksum, you’ll see that a checksum is just a special kind of hash. In order to protect our passwords we’ll need a different kind of hash.

A cryptographic hash is a hash where:

  • given an output, it is difficult to find an input that produces it
  • given an input, it is difficult to find another input that produces the same output
  • it is difficult to find two different inputs that produce the same output
  • two nearly identical inputs will produce significantly different outputs

Again, “difficult” here means that it should take so long for a computer to find the solution that the information would be practically useless to the attacker once they find it.

Our first example of a checksum—adding up the input bytes like positive integers to get a single byte of output—is decidedly not a cryptographic hash:

With a cryptographic hash, our process is to pass the user’s chosen password through the hash and store the output. When they log in later, we apply the same hash to their password attempt and compare the result. But now, the features of cryptographic hashes guard us against attackers. If someone steals the hash outputs, to access anyone’s account they’ll need to find inputs that match the outputs they have. Our hash is designed to make that really difficult. Furthermore, they have almost no information about the relationships between people’s passwords. If Alice chooses “password1” and Bob chooses “password2”, those might produce the two hash outputs:

10b222970537b97919db36ec757370d2
f1f16683f3e0208131b46d37a79c8921

They’re totally different! Even if the attackers figure out Alice’s password, they’ll have no idea how close they are to getting Bob’s.

You might think we’re all set now, but there are still a few flaws with this method. For one, what if Alice and Bob both choose “password1”? The hash might slow down the attackers, but as soon as they figure out Alice’s password they’ll have Bob’s too.

Password Safety Tip #2: try to choose really unique passwords—ones that are unlikely to be chosen by any of the thousands of other users of the websites you frequent. If you use a really common password, you risk compromising your security when someone else with the same password is compromised.

Add Salt

So the last problem is that while hashes do a great job of protecting users who have unique passwords, they spill the beans about users who have the same password. After all, a hash is a function: if you give it the same input, you always get the same output.

This problem is exacerbated by another feature of cryptographic hashes: they’re really hard to design. They’re so hard to make that pretty much everyone uses the same three or four publicly-available cryptographic hash functions. Now imagine that a bunch of websites all use the same hash function to store a common password that is shared by a bunch of users. If that password is compromised on any one of those sites, suddenly the attackers know who uses that password on all of the sites. Terrible!

The solution to this problem is pretty simple, however. We add some extra data to the hash input.

A password salt is a piece of random information which is added to the password before it is hashed. The salt does not need to be kept secret and is usually stored alongside the hash.

For example, suppose Alice and Bob both choose the terrible “password1” as their password. For each user, our site generates a random salt:

Alice: 2884b55c3236fcf1b9a6
Bob:   9fc9bddc4efe8e2dd99d

Then we add these to the passwords and hash them

password12884b55c3236fcf1b9a6 → hash → dd42a730500e5bf784b9fe5973181328
password19fc9bddc4efe8e2dd99d → hash → c0ac4c8d37d6298310d0477e23670f9b

Then for each user we store the hash output along with the salt

Alice: dd42a730500e5bf784b9fe5973181328, 2884b55c3236fcf1b9a6
Bob:   c0ac4c8d37d6298310d0477e23670f9b, 9fc9bddc4efe8e2dd99d

So now even if lots of people reuse passwords on the same site or across many sites, the attackers won’t know. This is currently the best way we know of to secure passwords.

Add Entropy

But there’s still one problem and there’s still one thing I haven’t explained yet: why is “password1” such a bad password? Well even if your password storage is completely secured with a fancy cryptographic hash and random salts, there’s nothing to stop a hacker from simply guessing passwords until they get one right.

The only reason why password guessing is a feasible strategy for hackers is because so many computers users are lazy: in one of the largest breaches of an improperly secured password database, hackers found that well over 2 million users had “123456”, “123456789”, or “password” as their password. That may not be useful information for breaking into a specific person’s account, but attackers who just want access to a lot of accounts (whoever may own them) can usually succeed by programming a computer to try every user name against the most common known passwords.

Even if you manage to avoid using a really common password, there is still danger in the form of using a really common password pattern. Many people create passwords using simple patterns such as number sequences like “13579”, keyboard patterns like “qwerty”, or regular English words. Because these patterns are so simple, computers can blaze through every possibility for the pattern in a matter of seconds. For example, suppose you make a password by choosing some English word, adding two random digits on the end, and possibly doing some simple letter substitutions like “@” for “a” or “$” for “s”, etc. Well there are about 173,000 English words, 10×10 2-digit numbers, and—being really generous here—let’s say about 20 different ways to substitute letters in the word you chose. That means there are about 173,000×10×10×20=346,000,000 passwords you can make with this method. That’s not even 30 bits of information; most computers could crank through every possible password of that pattern in no time at all!

Password Safety Tip #3: whatever password you use, make sure it’s at least 12 characters long. Computers these days are so fast that they can check every possible shorter password in a reasonably short amount of time.

And so we come to the real challenge: how do you make a password that is hard for a hacker to guess but still easy for you to remember?

I find it helpful to consider this question:

If a hacker knew how I was coming up with my password, how long would it take for them to guess it?

For example, if your system for creating passwords is to choose a word from the dictionary, the answer is “not very long”. There is no silver bullet for coming up with a good password, but one method that I like is to first think of some quote or phrase that you will easily remember:

Joseph’s face was black as night
And the pale yellow moon shone in his eyes

and choose a character for each word

Joseph's face was black as night and the pale yellow moon shone in his eyes
J        f    w   b     a  n     &   t   p    Y      0    s     i  h   I

The more creative, the better. Notice “&” for “and”, “0” for “moon” and “I” for “eyes”. So this gives me the password

Jfwban&tpY0sihI

Now even if an attacker knows the sort of method I’m using, there are way too many English phrases and letter substitutions for them to guess quickly. Another interesting approach is XKCD’s four random words.

Password Safety Tip #4: try to use passwords that would be hard to guess even if an attacker knew how you were choosing them. For example, “wPE7T%+G” may look like a random, secure password, but if your passwords on other sites are “wPE8T%+G” and “wPE9T%+G”, you’re kinda shooting yourself in the foot.

Alternatively, you could use extremely secure but unmemorable passwords like long strings of random characters and use a tool like Password Safe to help you keep track of them.

And now that you know this, be extra cautious about so-called “password strength” meters on websites. By definition they cannot measure the actual strength of your passwords—only a human adversary can do that. For every password strength meter there are false positives and negatives, weak passwords that look strong and strong passwords that look weak.

Malware

If you have ever used the Internet, hopefully you have heard about (and have worried about) malware. If not, maybe you are more familiar with one of the many types of malware: worms, Trojans, viruses, adware, spyware, backdoors, rootkits, etc. The specific kinds don’t really matter to us since our goal is always the same: to keep them off of our computers.

Malware (short for malicious software) is a program that does something bad to your computer. Most malware tries to do so secretly.

This definition is intentionally vague because there are a lot of “bad” things that a program can do to your computer. It could try to steal your personal information, monitor your behavior, take control of your computer, damage your computer, use your computer for illegal activities, or simply spread itself to other computers. There are also a lot of ways that malware tries to conceal itself. It may simply try to lay low and run in a way that you won’t notice or it may try to disguise itself as a program that you recognize and trust.

Getting malware off of a computer is a notoriously difficulty problem, and there are lots of companies that will try to sell you “antivirus” software that they claim can identify and remove malware. I’m not writing this book to debate the relative merits of the various antivirus programs out there, but there are a few points to keep in mind about them:

Consequently, it is a very bad idea to rely entirely on an antivirus program to automatically handle your malware problems. The best defense against malware is to educate yourself about malware so that you can avoid getting it on your computer in the first place. But having some sort of antivirus as a backup can be a good idea too.

There are generally two ways that malware gets onto your computer:

  1. By exploiting software that you use
  2. By tricking you into putting it on your computer

Exploits

It is nearly impossible to create a computer program that works correctly 100% of the time. If any software developer tells you that their software is totally bug-free, they are lying (possibly to themselves). Such bugs can often be exploited by hackers to gain access to programs or computers that they should not be able to. The National Vulnerability Database maintains a list of such bugs, which stands at about 67 thousand at the time of writing. When these bugs are in the software that you use and trust every day—your operating system, web browser, e-mail program, etc.—they can become avenues for hackers to install malware on your computer.

Unfortunately there is very little that you can do to protect yourself against exploits. Your best bet is to hope that the software developers are aware of these vulnerabilities and will release newer versions that are fixed. Thus you should strive to keep all of your software up-to-date.

If a known vulnerability goes unfixed by the developers, your only real option is to get rid of the vulnerable software in order to protect yourself.

User Error

As scary as exploits are, they are actually pretty uncommon sources of malware infection when it comes to everyday computer use. That is because many exploits can be technically difficult for a hacker to take advantage of. And why bother with a difficult solution when it’s so much easier to trick a user into installing malware themselves?

This is what superusers would call “problem exists between keyboard and chair” (PEBKAC, for short). The malware isn’t the real issue and removing it won’t solve anything, because the real problem is that the user is installing the malware and will probably continue to do so unintentionally.

Here’s how a typical PEBKAC situation might unfold:

  1. Bob needs to make some changes to a PDF he received from a client
  2. Bob discovers that he doesn’t have any software on his computer for changing a PDF file
  3. So Bob Googles for “free PDF editor download” and clicks a link that looks promising
  4. He downloads a program and starts it, which opens an “Installer”. He clicks “Next” a few times to get through some screens with buttons and checkboxes, but finally the PDF editor opens and Bob gets his work done

Did Bob just install some malware on his computer? I would be willing to bet that he did. Why? Because Bob just downloaded some random software from the Internet without looking into who he was getting it from. Even worse, it was free.

Remember that phrase “there’s no such thing as a free lunch”? That is generally true for free software as well. But there are a few kinds of “free” when it comes to software, and each carries a different level of risk.

Open source software is software that is not only free, but where the software developers openly invite members of the public to contribute to the development of the software (or at least let them see how the software is made).

Shareware is software that is free, but is limited in some way unless you pay the developers for the “full version”. For example shareware might stop working after a certain amount of time, or certain key features might be disabled.

Freeware is free software (which is not open source).

Open source software is generally very trustworthy and should be preferred when possible. You can trust open source because it is very hard to hide malware in a program that the public can watch the development of. However it is also the least common type of free software on the Internet. Shareware is much more common, and is also generally trustworthy (though a little less so since you don’t get to see how it’s made). However shareware can be pretty frustrating due to the limitations designed to entice you to pay for it.

Freeware, however, is extremely problematic. It’s not open source, so you can’t verify that it’s safe, and the developer’s motives for making it free can be questionable since they have nothing obvious to gain. Many freeware programs will sneak in malware that collects your personal information or monitors your web browsing activity so that the developer can sell it to advertisers. Others may even take complete control of your computer and hold it ransom until you pay the developers. You should always be extremely cautious when downloading freeware.

That being said, there is some wonderful freeware on the Internet that has earned people’s trust through years of malware-free service. It’s not a simple matter of avoiding freeware entirely.

And of course that doesn’t mean that non-free software is guaranteed to be malware-free. There are a few known instances of companies trying to sneak malware into the software they sell.

Download Responsibly

Clearly, avoiding malware is a complicated issue. There are no hard rules, no strict steps to follow. Ultimately you will need to develop “street smarts”: an intuition for when some software seems “sketchy” or unsafe.

When looking to download software from the Internet, there are a few precautions you can take that should help to avoid malware:

The Internet isn’t the only way that malware can spread. Portable USB thumb drives are also a common way for malware to get from computer to computer. Really, any method of transferring data from one computer to another could be used to transfer malware and should fall under the same scrutiny as a download from the Internet.

Exercises

  1. Find out how your browser indicates that a site’s identity is either unverified, verified, or invalid. Many sites let you connect with HTTP or HTTPS so you can see the difference. For example, http://www.wikipedia.org and https://www.wikipedia.org. To see how your browser handles invalid certificates, this site (at the time of writing) has a certificate that doesn’t match its identity. For extra credit, try to find the list of CAs that your browser trusts.
  2. Connect to a site with HTTPS and see what information your browser can tell you about its certificate. At minimum, you should be able to see which CA issued the certificate and when it expires.
  3. In the section on hashes, I briefly mentioned why our example checksum is not a cryptographic hash function. Demonstrate this with concrete examples for each of the four features of cryptographic hashes.
  4. Come up with a couple different password generation methods and try to count their strength in terms of the number of possible password choices. You may have to estimate a few counts (read the section on making passwords again to see how to count password possibilities). Which methods seem like a good trade-off between strength/memorability?
  5. Try to find the “official” download sites for the following software. Also determine if the software is open source, shareware, freeware, or none of the above.
    • Paint.NET
    • Photoshop
    • The GIMP