Introduction

Cryptography Cryptography is not the simply the act of encoding a message and then sending it. The attacker finds important the problems of managing and exchanging secrets, the binding actions to identities and understanding the risks, as well as the benefits of cryptography. Most often the effectiveness of an attack against cryptography is not direct but rather will be against a weakness that was exposed because the entire system was not designed to managed properly.

Whether or not it is decided to use encryption on data, the following things must be carefully considered.

  • Should the key is lost or corrupted, are you willing to lose your data?
  • Are both the data in motion and the data at rest being considered?
  • How does the allowing of the use of encryption impact our monitoring controls?
  • How much encryption strength is needed knowing there are performance needs to consider?
  • What is the cost of operating a cryptosystem over time and is the responsibility of a real key management process being considered?

Cryptosystems can be difficult to do correctly. There is always the possibility of a mistake being made that an attacker is looking for.

Concepts of Cryptography

Introduction to Cryptography

Cryptography defined simply means to "hide" and "write" (Crypt = Hide, Graph = Write,) and goes back as far as the act of writing itself.

Cryptic messages were typlically hidden to all but those who possessed the key to unlocking them. Also, by sharing this secret key, it was possible to authenticate that the sender of a message was the trusted party since only trusted personnel would be told the secret key. However, a system of sharing secrets can become pretty cumbersome as the number of trusted parties grow. This makes the management of keys the biggest problem.

The PAIN (Privacy, Authentication, Integrity and Non-repudiation) model of cryptography, created by Larry Greenblatt, drives the design and risk behind implementing all cryptographic systems.

Algorithm Classes

The basic three classes of cryptographic algorithms are:

  • Symmetric (shared keys, secret keys)
  • Asymmetric (public key)
  • Hashing Algorithms (one-way)

Symmetric (shared keys, secret keys)

Very popular today is the use of a shared key system and is widely accepted that all cryptographic systems in the past were based on some form of shared key encryption. In the shared key system, the key is used to both encrypt and decrypt the message.

There are a couple of challenges in using the symmetric system. The first is how to initially share the secret key. For example, if a user is required to know the password, how could the user get the password if it is contained in an email that requires the password to accesses the email?

The second challenge is that of non-repudiation. For example, should two people share a key to a file cabinet. Both can access the cabinet and place whatever they want into it. If one should place a "malicious" file into the file cabinet, it would be a case of one word over another as to who placed the "malicious" file in the cabinet as both parties possessed a key to the cabinet.

Asymmetric (public key)

Introduced in 1796, asymmetric encryption is based on a key pair, one key set to private and the other is set to public. The concept behind this is that of their relationship. Both keys are required; one to encrypt and the other to decrypt. The public key is opened to everyone while the private key is just that, private. Should an unauthorized person gain access to a private key, they could use it in conjunction with the public key to access the secure information.

One major disadvantage to the asymmetric algorithms is the expense of the computational efforts needed with each of the calculations.

Hashing Algorithms (one-way)

By using either symmetric or asymmetric algorithms secrecy, authenticity and non-repudiation is provided, but how is it known if the message itself was not tampered with?

Parity and checksums are the most popular methods for accidental corruption, and these methods in digital communication are quite simple. However, for deliberate corruption, they are not dependable. It is quite easy to modify the message and to determine how to spoof the parity check or checksum.

By the use of cryptographic mathematics, a system has been devised to provide the checksums that are not nearly as easy to spoof. Formulas known as hashing algorithms and the resulting checksums are more commonly known as message digests.

For example, if two people create a message digest using a hashing algorithm, these two people can compare the message digest. If there is a discrepancy and they do not match, then something has happened to the message. If the message digest, also known as hash values or just hash are equal and alike, this will show that the message was not corrupted along the way.

Key Management

Whether the objective is privacy or authentication, key management is probably the most challenging operational issue with cryptography.

As an example, licenses are not self-issued to individuals as they could easily be counterfeited. They are issued through a third party, such in the case of a driver's license. The driver cannot issue their own, but has to obtain one through the recognized third party agency. Certificate Authorities are used as a third party issuer of keys.

If a user loses their private key, all data encrypted with the corresponding public key would be lost. A copy of the private key could be made, however, we would then lose the trust model granted to us by asymmetric key systems once this key was no longer private. This could cause an impersonation of the user.

One way to prevent this is by generating symmetric keys that are used only once. They are exchanged by using a properly implemented asymmetric system. Accounting for how the integrity checks through message digests need to be signed as well and the whole sequence becomes quite complicated.

Attacks Against Encryption

The science of deducing the key from available resources is known as cryptanalysis. An attacker can figure out the key by analyzing the encrypted message to see if there were any meaningful patterns found in the encrypted text.

For example, say all of the letters of the alphebet are shifted by three characters, A becoming D, B becomes F, C becomes F and so on, for instance. This system is easy to analyze. The most common letter used in the English language is the letter E. Find the letter that shows in the message most commonly, and it will probably be the letter E.

Steps are taken to ensure that a good encryption algorithm does not reveal any patterns. However, an attacker can simply guess the key, try all possible keys providing the keys were short or even try random values if given enough time. This type of an attack is known as a Brute Force attack.

A brute force attack works if the key is limited in size. The key size affects the entropy, which is the lack of order or predictability. As an example, if a key had a space of 2, just by adding one more space, (number), makes the guessing of the key harder. A typical 4-digit PIN, (0-9, or 104 has a key space of 10,000. A key that is 128 bits long, or 16 bytes, has over 340 trillion, trillion, gazillion possible keys. Now, consider a key size of 4096 bits. The possibilities are literately astronomical.

The longer and more random the bits are in a key makes it more difficult to decrypt. This is because of the disorder that is introduced into the system. The objective is to make sure that the key cannot be guessed, that patterns do not exist in output and that brute force becomes impractical as well. However, this is getting harder and harder to do as processor speeds increase. The requirement to make up longer keys makes it harder for the user to remember and the longer the key means it increases the time to encrypt and decrypt messages.

Regardless the sophistication of an encryption technique is, there is often a weakness that can be introduced by a user. Given the difficulty of cryptanalysis, mistakes more commonly present opportunities for the opportunity. Systems implement all three major cryptographic techniques to minimize this risk. Checks and balances ensures that everything must work correctly or not at all. Detection points and non-repudiation controls are possible as well.

Symmetric Encryptions

Substitution and Transposition

Substitution is a system of substituting letters with other characters. For a simple example, the process of substituting letters with numbers, such as A = 1, B = 2, C = 3 and so on. This system is easily defeated by just looking at the encoded text and making a few guesses.

This is the weakness of substitution cyphers, but historically, this is how cyphering worked. The attacker would look for frequencies within the cypher. For example, in the English language, the letter 'E' is the most frequently used. This is also known as the Caesar Cypher. By looking for the frequency of use, a little deductive reasoning and guessing, the cypher could be decrypted. This type of attack is better known as Frequency Analysis.

To mitigate frequency analysis, a system of adding a Key Word was devised. Depending on the number of characters in the key word, there would be as may substituted alphabets. For example, say the keyword was "Battle." The first character would use a substitution set where the alphabet was shifted or rotated up 22 characters, meaning A = V, B = X and so on. The second character would be shifted or rotated by 21 characters so that A = U. The third character would be shifted or rotated by 12 characters so that A = L.

Let's use the word 'Hello Katherine' to demonstrate.

The first character, H becomes D. The next character E, becomes A and so on. Since there are only six letters in the key, the seventh character, A would be substituted with the first set B. The effect is that this is only six times stronger than the Caesar Cypher since we only have six different alphabets. Multiple substitution sets are better known as Polyalphabetic Substitution.

Another way to encrypt messages is by scrambling them. This simply involves changing the positions of the characters in a text. this process is commonly called Transposition or Permutation depending on the mathematical processes used.

Most commonly used is a combination of substitution and transposition achieving what is called "Confusion and Diffusion."

Stream and Block Ciphers

Data can be encrypted in a serial stream of bits moving as a packet across a network host or as a block of bytes that can be loaded into computer memory and processed in multiple ways. The nature of the way plain text would compare eventually to its cipher text representation resides in the concepts of substitution, transposition, confusion and diffusion.

It is a matter of both design choice and situational necessity if encoding is done as a stream or block. The ultimate goal is to not reveal any details about either the key used in the algorithm or the original plain text.

Some cyphers are still in use that are purely substitute. These are stream cyphers like E0 that is used in Bluetooth and RC4 that can be found in many cryptography systems like WEP, SSL/TLS, TKIP and more. These are computational cheap and easy to implement.

Block ciphers are more secure than stream ciphers, however they are more difficult to compute. Stream ciphers, however, are much faster. There must be a balance between availability and confidentiality to be met. The cost-to-reward ratio is always a consideration in cryptography.

The process of off loading the encryption to a dedicated processor or Application Specific Integrated Circuit (ASIC) that can speed things up considerably is known as Off Loading. Creating an ASIC stream cipher is a much simpler process for stream ciphers and block ciphers. This has changed however as the cost of ASICs have gone down quite a bit. The cost of a flash drive with AES ASICs is now for less than five dollars.

Stream Ciphers

XOR is a Boolean logic equation. When comparing the two binary values 0 or 1, and using the XOR function, only one input must be 1 to return an output of 1. Boolean logic functions are expressed in Truth Tables. The truth table for an XOR is like this:

Inputs Results
0 0 0
0 1 1
1 0 1
1 1 0

This process can be reversed. For example.

Plain Text 00100001
Key Value 10100011
Cipher Text 10000010

If the cipher text is XORed with the key value, it would return original plain text.

Stream ciphers use a Pseudo Random Number Generator PRNG to create a long number this is XORed with the plain text to create an encrypted stream. The PRNG is seeded, which is where a number is used as a starting point, known as a seed value, or just seed, with a secret value that only the sender and receiver know about. This ensures that that only the intended receiver can return the plain text.

Block Ciphers

Block ciphers are also XOR plain text. They also change the positions of the text to achieve the goal of confusion and diffusion. One of the first widespread block ciphers, it was very strong and the thought it was ever really cracked was misleading. It was never really found to be vulnerable to any type of pattern matching but was a victim of simple brute forcing.

The key size allowed in DES was 56 bits. There were eight sub-keys, each being seven bits long and applied according to a key schedule. The key schedule was an additional bit per key used for parity. Today, systems can guess all of the possible values in a 56 bit key space in minutes or even seconds.

Often used in block ciphers are substitution boxes (S-Boxes) that are logical means to infuse diffusion and confusion. After combining all of the "ingredients" together, these combinations are the modes that block ciphers support. The idea is to create so much variation that tracing the cipher text backwards toward the plain text is impossible, unless you possess the key.

The steps and combinations of flipping bits is determined by the key. Diffusuion makes sure bit flipping is spread out and that one bit affects may others without regularitu. Ensuring no patterns exist in the result is the job of confusion. Initialization Vectors (IVs) are part of the algorithm as well enhancing the confusion properties of the encryption sequence. Both the sender and receiver must know the shared secret key and how the algorithm works in order to communicate.

Symmetric Ciphers

In the case of Wired Equivalent Privacy (WEP) the key is shared outside of band, which is shared prior to any relationship and the IV changes are transmitted with the data. In other cases the IV is generated in the common fashion.

Symmetric ciphers are more effective in regards to the key size, such as in memory and processor, when compared to asymmetric ciphers, however they do have some problems such as the delivery of a key to from the issuer to the user. For instance, if the key is sent in plain text, a MiTM attacker could see it. If the key is sent encrypted, the user would not be able to decipher it. Again, this is a catch-22 scenario.

Scalability is yet another problem to contend with. If a four users wanted to send encrypted messages to each other, the users would have to send a key to each of the other users and this number would grow exponentially. There is an actual formula for this:

N(N-1)/2. N = The number of participants.

Since one person does not need to share the key with them self, the number is multiplied by N-1. The keys are shared between the participants, so we divide this number by two. Here is a mathematical representation as an example. We have four senders. The formula is 4x3/2, which is 6. If we add a sender, the formula becomes 5x4/2, which is 10. The second number in the multiplication represents the original sender; remember the formula N-1.

Another problem is with repudiation. If more than one person possesses the key, there is no actual proof as to who sent the message since the key is not unique to a single user. One could claim that user A sent the message to user B, however if both A & B possess the same key, user B could claim that they did not send the message.

To summarize, symmetric encryption is faster than asymmetric encryption but is limited in use. This limitation comes from challenges of sharing the initial key and the requirement of lots of keys as we add more users. Symmetric encryption can be used to both hide and authenticate but cannot be used to provide non-repudiation. Symmetric encryption requires two parties to share secret keys, symmetric encryption is also known as shared secret, or even shared secret key systems.

Asymmetric Encryption

The Role of Asymmetric Encryption

Symmetric encryption has been around as long as written messages have. The concept of asymmetric is new however. The Diffie-Hellman algorithm was created to solve the first problem with symmetric encryption and that is how to share the initial secret key. Diffie-Hellman is a Key Agreement.

Other asymmetric algorithms, RSA and ECC are used to encrypt has values to authenticate the hash besides being used to share the initial symmetric key. The overhead of the larger keys required in asymmetric systems make these algorithms too slow for encryption large amounts of text or bulk data.

Authentication and Non-Repudiation

Though there are many methods or mathematical processes to provide asymmetric services, the concepts are essentially the same; users do not share secret keys. Separate users have their own key pair and each is made up of mathematically related numbers.

A pair will be created for a user. This user will keep one key for them self, the private key, and will give out the other key, the public key, to whoever wants to communicate with them. One key is used to encrypt the message while the other key is used to decrypt the message. This process will provide three services depending on which key is used to encrypt the message. These services are:

  • Open Message Format
  • Secure Message Format
  • Mutual Authentication

Open Message Format

Open Message Format is when a user wants to prove that they are the sender and uses their private key to encrypt the message, which is usually the hash value of a larger message. Since anyone can obtain this user's public key, there is no secrecy, but all receivers would know that this user sent the message, or at a minimum that his private key was used.

Known as digitally signing the document, a symmetric key, or message digest can also be encrypted with this user's private key. Since the private key is not shared with anyone else, there is no denial on the part of the user that this key was used to encrypt the message. Everyone knows this message came from this user and this is how asymmetric algorithms provide non-repudiation.

Secure Message Format

Secure Message Format is when 'A' wants to send 'B' a message that only 'B' could read. 'A' uses their public key to encrypt the message. While only 'B' could decrypt the message, they would not know who sent the message, only that they were the intended recipient.

Mutual Authentication

Mutual Authentication is where both the sender and receiver are clearly identified. For example, 'A' sends 'B' a message that is encrypted twice, once with the private key belonging to 'A' and again with the public key belonging to 'B'. At this point 'B' could not only read the message but would also know that it came from 'A'.

MiTM (Man in the Middle) Attacks

Man in the Middle Attacks involves an attacker inserting themselves in between two parties communicating with each other and are essentially eavesdropping attacks.

An example goes like this. You send a letter to a friend in another city. While the letter is in transit, someone who handles the mail takes your letter, opens it and reads the contents of your letter. They then reseal the letter and send it on it's way to your friend. You friend receives the letter and does not have any clue that the letter has been opened and read during transit.

Additionally, the person who is opening the letter while it is in transit could modify the letter somewhat, just enough so that you or your friend in another city wouldn't have any clue that the modifications were not part of the original message.

Asymmetric Encryption Summarized

Asymmetric encryption, though too slow for large amounts of data, solves a problem of symmetric encryption, and that is the sharing of the initial symmetric key.

Asymmetric encryption can be used to share the initial secret and provide a form of authenticity that can not be easily repudiated.

Another feature is scalability. Unlike the symmetric system formula N(N-1)/2, the use of only one key pair for each participant is needed. Should there be 1,000 participants, only 1,000 key pars or 2,000 keys would be needed. On the flip side, introduced is a new problem of authentication of the public keys to prevent someone from spoofing a public key to impersonate someone else. Key management systems are used to address these problems.

Hashing Algorithms

Message Integrity

Message Integrity is the validity of a transmitted message. This means that a message has not been tampered with or altered. Hashing, the most common approach, is to use a function that combines all the bytes in the message with a secret key and produces a message digest that is difficult to reverse. If even one bit of a 128 bit message digest would change, it would alter the entire message greatly. Hash functions are also known as Message Digest.

The sender creates a hash on their message before sending it to the recipient. The recipient also computes a has on the message when they receive it. The recipient then compares the two hash values. If they are different, the recipient can determine that the message has been corrupted, however if they are the same, she shouldn't necessarily feel that the message is safe. To feel safe, the recipient needs to authenticate the hash, or message digest. The reasoning behind this is that a Man in the Middle could have changed the message and created a new counterfeit hash.

Salting is the process of adding random data that is used as an additional input to a one-way function that hashes data, a password or pass-phrase. The primary function of salts is to defend against dictionary attacks or against its hashed equivalent, a rainbow table attack.

If the sender runs an MD5 on a message, the new keyed hash is created. This keyed hash was formally known as a Message Authentication Code (MAC). The sender can now send the message only to the recipient, not the key. The key will have to be secretly shared between the sender and recipient, as with all symmetric systems.

Once the recipient has both the message and key, they can salt the message with this key and calculate the MAC. If the recipient's and sender's MAC agree, then there is a reasonable belief that the message wasn't altered and is genuine. This process, however, does not guarantee non-repudiation since the symmetric key or salt was known to both the sender and the recipient. A MAC provides the integrity and authenticity services.

If the recipient wants the sender to only send messages that can be traced to the sender alone, the asymmetric system must be employed. This time the sender creates a message digest of the message, the encrypted digest and their public key. When the recipient receives the message, they calculate the message digest on the message and decrypts the digest the sender sent with the public key provided. If the recipient's hash value is the same as the sender's, then the message can reasonably be trusted. This process is known as Digital Signing and has a has value that has been encrypted with the sender's private key which is known as a Digital Signature.

By using symmetric algorithms to make large messages private, hashing algorithms can provide integrity and asymmetric systems will allow both to agree on the secret key. Authenticating the has means strong authentication that cannot be denied or repudiated. When mixing and matching these algorithms, we can gain Privacy, Authentication, Integrity and Non-Repudiation (PAIN) services of cryptography.

One-Way Hashes

Hashing is an efficient way to go when it is necessary to compare two files. All you have to do is to calculate the hashes and comparing them. If they are the same, then the files are identical as well. A change of about 60% can occur if even there is a one bit difference in a 700Mb file. The bit flip also creates cascading changes known as diffusion throughout the resulting hash as the calculation runs through its permutations. There would be no question that the files are different.

Hashes are used in digital forensics to validate the integrity of evidence. After the collection of a hard drive, an image of the hard drive is created and both are hashed. They will match. This indicates that the image evidence that the investigator will be analyzing is an exact bit-for-bit copy of the original. Sometimes every file is hashed and every physical sector of the drive is also hashed just to be certain.

Similar to when a notary public uses their seal to vouch that a document is original, when authenticating a document such as a digital certificate, a message digest of the file is taken and then encrypted with the private key of a trusted party. In a digital certificate though, if there has been any modification or corruption, the message digest that was signed will not match the new calculation of the hash. Fortunately, attackers cannot simply replace the hash because they don't have the trusted party's private key to sign it. The receiver of the digital document will only use the public key of the trusted third party to check the signature.

As a way to store representations of passwords without actually storing anything, hashes are used in the authentication process. A password is selected by a user and then the hash is stored in the account database of the system. The next time the user identifies themselves to the system, the has that corresponds to their account is looked up and placed in memory. A hash of what they give as input is looked up during the authentication step is compared and if they match, they are allowed to log in.

Attacking One-Way Hashes

Though the possible inputs to a hash function are infinite, the fixed length characteristic of the hash limits the possible hashes that can result. A Collision is the result of two different data producing the same hash. This is the primary weakness of most hashing algorithms.

Though a captured hash cannot be used to determine the data, an attacker can still try different combinations of input until the same hash is produced. If the maximum size of the possible input data is known, then a brute force attack is a lot easier. If passphrases are used, their length could be anything from four to 40 characters or more in length or if the hashing function used produces a longer hash, such as SHA-384. This makes a brute force attack impractical.

It might be worth trying a guess, a dictionary or brute force attack. There is a remote possibility that a short combination of characters will collide with the passphrase actually is. In this case, if the authentication system accepts the hash, it does not matter if the data is the same or not.

To make finding collisions more difficult, challenges, salts and other forms of nonce values can be added to the hashing calculations. This can ensure that if the same data were hashed several times it would not produce the same result unless the nonce was also added.

Cryptosystems and Key Management

Cryptosystems

The use of all three algorithm classes, symmetric, asymmetric and hashing are used in a Cryptosystem. Some examples include PGP and S/MIME for email, SSH/TLS for Web-based applications, SSH for securing administration channels or IPSec (VPNs) for remote access.

Key Management

Though the man behind the encryption algorithm is pretty strong, or seems so, the hardest part of using encryption is the key management. Passwords not only need to be strong, but they must be handled properly. An extremely strong password is no protection if the password is written onto a sticky-note and placed on the PC monitor or under the keyboard. Also, little protection is provided by weak passwords that are easily remembered or guessed.

There are mechanisms like password lockers that can be used to protect credentials, however these lockers themselves must be encrypted and its key property stowed. Also needed is a way to trust the authenticity of keys so that only the correct identities have access to the data.

One of the biggest challenges is how a recipient of a public key can really trust the authenticity of the key. One way is with the strongest algorithms for authenticity by using asymmetric a private / public key system.

Public Key Infrastructure

No one can be trusted to validate their own public key, so trusted third parties are used to validate public keys. These third parties come in two types, pubic certificate authorities or another person who is trusted by both parties.

The first type is the Public Key Infrastructure (PKI) system. In Pretty Good Privacy (PGP) we can use the second. The difference between the two systems is why we trust these third parties to begin with. With PKI we have internationally trusted agencies. In PGP, someone you know becomes the validating third party. This is also sometimes referred to as A Web of Trust; not always what you know, but who you know.

PKI and PGP systems basically work the same way in that a trusted third party will digitally sign the public keys of the users. When a trusted third party signs the key, the threats of attacks that include counterfeit keys can be mitigated.

The most common way to exchange and store collected public keys are with digital certificates. It is simply a text file that contains three items: information about the identity of whose key this is, the public key itself and a digitally signed message digest that authenticates the certificate file. The certificate is downloaded into a supporting client such as an email application or Web browser. The signature of the trusted third party is validated and the certificate either gets stored until it expires or is revoked. The revoking of a certificate is tricky, but it is necessary if the private key half of the pair is every compromised or if the identity of the party who the key belongs ever changes.

Advanced Attacks Against Cryptography

Cryptanalysis

The ultimate goal of the cryptanalyst is to translate the cipher text into corresponidng plain text. The most effective way to do this is to determine the encryption key. Should there be an absence of this knowledge, there might be other means to attack the system. Denial of Service is always an option of last resort if the victim go for an extortion scheme.

Brute Force Attacks

Say you have a combination lock. It has four wheels and each wheel has ten digits, (0 - 9). Doing the math of the possible combinations, 104, you have 10,000 possible keys. The term for this is Entropy, which is the measure of chaos or randomness and describes how many possible keys to guess. Say you have a way of guessing 1,000 guesses per hour. After five hours the odds are 50/50 that the correct combination would be guessed. This is known as the work factor, meaning that it would take about five hours to crack such a system.

Pattern Analysis

Plain and simple, Pattern Analysis is the act of finding patterns in data. For instance, using the lock example, say the wheels on the lock feel "looser" when a correct number on the dial is located. With each correct choice, the wheels become more and more loose. This is a pattern, and through careful analysis, such patterns can help when an attacker is attempting to break a system.

Four common pattern analysis attacks in Cryptanalysis include:

  • Cipher Text Only
  • Known Plain Text
  • Chosen Cipher Text
  • Chosen Plain Text

Cipher Text Only

While Brute Forcing the numeric key combinations on our lock, an attacker might find a useful pattern. The analogy to this is like putting a jigsaw puzzle together by looking for similar color combinations, edges and cut lines.

Known Plain Text

This process is easier than the Cipher Text Only method. It is similar to, using the jigsaw puzzle analogy, having a picture of the completed puzzle makes putting the pieces together much easier.

Chosen Cipher Text

In the Chosen Cipher Text, the attacker can ask what a the victim what part of the encrypted text means. With this information, the attacker can base their guesses on this known cipher text.

Chosen Plain Text

Out of the four attacks, the Chosen Plain Text is the most powerful. The attacker chooses a deliberate set of words to see how they get encrypted. This is done in one of two ways. One, by batch, where a very long message is created and then accessing the resulting cipher text and look for a pattern. The second is by Differential Analysis where the attacker can make a message, see how it is encrypted and then alter the input slightly to analyze how this affects the cipher text. These attacks have become more difficult for attackers as the math involved in today's systems.