What is a Message Digest - A simple answer to this question is - Message Digest is basically a Cryptographic Hash Function. Still, if you are not yet clear with the term Message Digest, lets make it even more simple - Message Digest is simply a digital summary for a given piece of information. Basically, it is the 'fingerprint' of the message, i.e. the message digest can be used to uniquely identify a message.
I had used the term 'cryptographic' when i intended to explain the term Message Digest. How does crytography come into the picture, what does it mean? - well, cryptography is simply the scrambling of data i.e representing the data in such a way, that no one else except for the intended recipient can make sense out of it. Lets say , like 'Monday' can be represented as 'npoebz' (here,scrambling is done in the fashion of shifting all alphabets towards the right by 1. Eg : 'a' is replaced by b, 'b' is replaced by 'c' etc ). When we use both the terms together - Cryptographic + Hashing : Things gradually start making sense.
Lets get a little more technical.
Defining : Cryptographic Hashing
MD5 stands for Message Digest algorithm 5, and was invented by US cryptographer Professor Ronald Rivest in 1991 to replace the old MD4 standard. MD5 is simply the name for a type of cryptographic hashing function Ron came up with, way back in ’91.
The idea behind cryptographic hashing is to take an arbitrary block of data and return a fixed-size “hash” value. It can be any data, of any size but the hash value will always be fixed.
The ideal cryptographic hash function has four main or significant properties:
- it is easy to compute the hash value (but not necessarily quick) for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash
Cryptographic hashing has a number of uses, and there are a vast number of algorithms (other than MD5) designed to do a similar job. One of the main uses for cryptographic hashing is for verifying the contents of a message or file after transfer.
Like ,consider the following case,
The method works for messages, with the hash verifying that the message received matches the message sent.
On a very basic level, if you and a friend have a large file each and wish to verify they’re exactly the same without the hefty transfer, the hash code will do it for you.
Hashing algorithms also play a part in data or file identification. A good example for this is peer to peer file sharing networks, such as eDonkey2000. The system used a variant of the MD4 algorithm (below) which also combined file’s size into a hash to quickly point to files on the network.
A signature example of this is in the ability to quickly find data in hash tables, a method commonly used by search engines.
Another use for hashes is in the storage of passwords. Storing passwords as clear text is a bad idea, for obvious reasons so instead they are converted to hash values. When a user inputs a password it is converted to a hash value, and checked against the known stored hash. As hashing is a one-way process, provided the algorithm is sound then there is theoretically little chance of the original password being deciphered from the hash.
Cryptographic hashing is also often used in the generation of passwords, and derivative passwords from a single phrase.
Message Digest algorithm 5
The MD5 function provides a 32 digit hexadecimal number. If we were to turn ‘makeuseof.com’ into into an MD5 hash value then it would look like:64399513b7d734ca90181b27a62134dc. It was built upon a method called the Merkle”“DamgÃ¥rd structure (below), which is used to build what are known as “collision-proof” hash functions.
No security is everything-proof, however and in 1996 potential flaws were found within the MD5 hashing algorithm. At the time these were not seen as fatal, and MD5 continued to be used. In 2004 a far more serious problem was discovered after a group of researchers described how to make two separate files share the same MD5 hash value. This was the first instance of a collision attack being used against the MD5 hashing algorithm. A collision attack attempts to find two arbritary outputs which produce the same hash value ““ hence, a collision (two files existing with the same value).
Over the next few years attempts to find further security problems within MD5 took place, and in 2008 another research group managed to use the collision attack method to fake SSL certificate validity. This could dupe users into thinking they are browsing securely, when they are not. The US Department of Homeland Security announced that: “users should avoid using the MD5 algorithm in any capacity. As previous research has demonstrated, it should be considered cryptographically broken and unsuitable for further use“.
Despite the government warning, many services still use MD5 and as such are technically at risk. It is however possible to “salt” passwords, to prevent potential attackers using dictionary attacks (testing known words) against the system. If a hacker has a list of random often-used passwords and your user account database, they can check the hashes in the database against those on the list. Salt is a random string, which is linked to existing password hashes and then hashed again. The salt value and resulting hash is then stored in the database.
If a hacker wanted to find out your users’ passwords then he would need to decipher the salt hashes first, and this renders a dictionary attack pretty useless. Salt does not affect the password itself, so you must always choose a hard-to-guess password.
Conclusion
MD5 is one of many different methods of identifying, securing and verifying data. Cryptographic hashing is a vital chapter in the history of security, and keeping things hidden. As with many things designed with security in mind, someone’s gone and broken it.