Security
Encoding, Hashing, and Encryption: What’s the difference?
Encoding, hashing, and encrypting are common concepts applied and discussed when trying to secure data. Many vendors claim to use strong encryption methods and standards, but it is necessary for a security team to assess whether it really is appropriate.
Let’s take a look at the differences between and proper usage of encoding, hashing, and encryption.
Encoding
To encode something is to communicate it in a way that the receiver will understand. There are many encoding standards including Base64, UTF-8, and ASCII to name a few. Each standard has a purpose, and applications using those encoding standards expect to receive data compliant with that encoding standard.
An easy comparison is human language. The words, syntax, and rules of language are different and unique to each linguistic culture. An English speaker may not be able to read or speak Japanese, but with the proper translator, he or she may be able to decode the Japanese to English and even encode their English to Japanese by using a translator.
The following two lines represent the same data:
- Plaintext: This line has secret data you really should encrypt.
- base64: VGhpcyBsaW5lIGhhcyBzZWNyZXQgZGF0YSB5b3UgcmVhbGx5IHNob3VsZCBlbmNyeXB0Lg==
To human eyes, these two lines look unrelated. To an application, they could be interpreted to be the same.
If one were to rely simply on encoding to secure data, anyone who intercepts it has real data. Since encodings have publicly known standards, the type of encoding used is also easy to detect and decoding an easy task.
Any vendor that claims to use strong base64 encryption in their solution is suspect.
Hashing
Hashing is an integrity validation method. The problem hashing sets out to solve is not to secure data from being read, but rather to validate that the data in question has not been changed.
Hashing algorithms are one way functions which take an input and compute a unique output. Hashing algorithms are designed such that you cannot take a hash value and work the algorithm back to get the original input. An attacker would have to generate many potential inputs and compare the output to the hashing algorithm output to see if they were correct.
Hashing algorithms include SHA1, SHA256, and SHAKE256 among others.
The following four lines are representations of the same data.
- Plaintext: This line has secret data you really should encrypt.
- MD5: 59b9465b41f4c1979c728012e6440a5d
- SHA1: abb5afbc365744c60488be5ecc4da920218f3b3f
- Bcrypt: $2a$08$/H6H76E3g2eD1QqRNxrhz.2HUKuFxcfkphAy0t0W.8tcnjSuPmj/O
Hashing is never meant to be reversed. It should not be used to secure communications from being read by unauthorized parties.
A common use of hashing is to secure the storage of passwords. When a user creates a password, the password is put through a hash algorithm. The resulting hash is then stored. When a user attempts to authenticate, the password is hashed again and checked against the hash stored.
Again, hashing is an integrity check. It is not encryption.
Encryption
Encryption is making data unreadable by anyone except those who know the secret key. The secret can be a single shared key or it may be broken into two keys -- one kept private and one given to the other party.
Say Alice and Bob want to securely communicate with each other. They decide to use a shared key of “aardvark” for their secret. When encrypting a message, Alice enters the encryption key. When decrypting the message, Bob enters the key to be able to read the message.
Alice and Bob, though, are worried about the secret message being read be a third party who might be able to guess or obtain that key. They decide to use a two key system (asymmetric key encryption) to encrypt messages. Alice wants to send a message that only Bob can read. Alice uses a key (public key) Bob gave her beforehand. Bob receives this encrypted message and uses his other key (private key) to decrypt and read that message. Since only Bob knows his private key and has never shared that with anyone, no one can decrypt the message except Bob. Bob also uses his private key to send a message to Alice and Alice can use the public key Bob gave her to read it. This type of encryption is known as signing. Since the public key Bob gave Alice decrypted the message, Alice can safely assume Bob has sent the message as only Bob has the other key of that pair.
Common strong encryption algorithms include: AES, Blowfish, and RSA.
Encoding, hashing, and encryption can be used together. A base64 encoded message to an application may be hashed so the integrity of that message can be verified by the receiver. The message may then be encrypted and sent to the receiver who will decrypt and compare the message hash against the hash value it received from the sender to make sure the message wasn’t tampered with and did not change.
Understanding the difference between these concepts can help you with considering security design and architecture, especially when it comes to procurement or application review. Accidentally implementing encoding as an encryption method can be very dangerous to an organization as that would mean the transmissions are real data that can be easily decoded by anyone.
Kristen Norris
Awesome.