# Linear cryptanalysis for dummies

- Tutorial

Hi% username%!

Many people know that the DES algorithm was considered the default standard in the field of symmetric encryption for a long time. The first successful attack on this indestructible algorithm was published in 1993, 16 years after its adoption as a standard. The method, which the author called linear cryptanalysis, in the presence of 2

^{47}pairs of open / encrypted texts, allows you to open the secret key of the DES cipher in 2

^{43}operations.

Under the cut, I will try to summarize the main points of this attack.

#### Linear cryptanalysis

Linear cryptanalysis is a special type of attack on symmetric ciphers aimed at recovering an unknown encryption key from known open messages and their corresponding ciphertexts.

In the general case, an attack based on linear cryptanalysis reduces to the following conditions. An attacker has a large number of open / encrypted text pairs obtained using the same encryption key K. The attacker's goal is to recover partially or fully the key K.

First, the attacker examines the cipher and finds the so-called. statistical counterpart, i.e. an equation of the following form, executed with probability P ≠ 1/2 for an arbitrary pair of open / closed text and a fixed key:

P

_{I1}⊕ P

_{I2}⊕ ... ⊕ P

_{Ia}⊕ C

_{I1}⊕ C

_{I2}⊕ ... ⊕ C

_{Ib}= K

_{I1}⊕ K

_{I2}⊕ ... ⊕ K

_{Ic }

**(1)**,

where P

_{n}, C

_{n}, K

_{n}are the nth bits of text, ciphertext and key.

After a similar equation is found, the attacker can recover 1 bit of key information using the following algorithm

**Algorithm 1**

Let T be the number of texts for which the left side of equation (1) is 0, then

If T> N / 2, where N is the number famous plaintexts.

Assume that K

_{I1}⊕ K

_{I2}⊕ ... ⊕ K

_{Ic}= 0 (when P> 1/2) or 1 (when P <1/2).

Otherwise,

Assume that K

_{I1}⊕ K

_{I2}⊕ ... ⊕ K

_{Ic}= 1 (when P> 1/2) or 0 (when P <1/2).

Obviously, the success of the algorithm directly depends on the value of | P-1/2 | and from the number of available open / closed text pairs N. The greater the probability P of equality (1) differs from 1/2, the less the number of open texts N is needed for the attack.

There are two problems that need to be solved in order to successfully implement the attack:

- How to find an effective equation of the form (1).
- How to use this equation to get more than one bit of key information.

Let us consider the solution to these issues using the DES cipher as an example.

#### Description DES

But first, we briefly describe the operation of the algorithm. About DES has already been said enough. A full description of the cipher can be found on Wikipedia . However, to further explain the attack, we need a number of definitions that are best entered in advance.

So, DES is a block cipher based on the Feistel network . The cipher has a block size of 64 bits and a key size of 56 bits. Consider the encryption scheme of the DES algorithm.

As can be seen from the figure, when encrypting the text, the following operations are performed:

- Initial permutation of bits. At this point, the bits of the input block are mixed in a specific order.
- After that, the mixed bits are divided into two halves, which are fed to the input of the Feistel function. For standard DES, Feistel's network includes 16 rounds, but there are other variations of the algorithm.
- Two blocks received in the last round of conversion are combined and another permutation is performed on the received block.

On each round of the Feistel network, the 32 least significant bits of the message pass through the function f:

Consider the operations performed at this stage:

- The input block passes through the expansion function E, which converts a 32-bit block into a 48-bit block.
- The resulting block is added with the round key K
_{i}. - The result of the previous step is divided into 8 blocks of 6 bits each.
- Each of the received blocks B
_{i}passes through the substitution function S-Box_{i}, which replaces the 6-bit sequence, 4-bit block. - The resulting 32-bit block goes through the permutation P and returns as a result of the function f.

Of greatest interest, from the point of view of cryptanalysis of the cipher, for us are S blocks designed to hide the connection between the input and output data of the function f. For a successful attack on DES, we first construct statistical analogues for each of the S-blocks, and then extend them to the entire cipher.

#### S block analysis

Each S-block receives a 6-bit sequence as input, and a fixed 4-bit value is returned for each such sequence. Those. There are a total of 64 input and output data options. Our task is to show the relationship between the input and output data of S blocks. For example, for the third S-block of DES cipher, the 3rd bit of the input sequence is equal to the 3rd bit of the output sequence in 38 cases out of 64. Therefore, we found the following statistical analogue for the third S-block:

S

_{3}(x) [3 ] = x [3], which is executed with probability P = 38/64.

Both parts of the equation represent 1 bit of information. Therefore, if the left and right sides were independent of each other, the equation would have to be executed with a probability equal to 1/2. Thus, we just demonstrated the relationship between the input and output data of the 3rd S-block of the DES algorithm.

Consider how to find a statistical analogue of the S-block in the general case.

For the S-block S

_{a}, 1 ≤ α ≤ 63 and 1 ≤ β ≤ 15, the value NS

_{a}(α, β) describes how many times out of 64 possible XOR input bits S

_{a}superimposed on bits α are equal to XOR output bits superimposed on bits β, i.e.:

where the symbol • is logical I.

The values of α and β, for which NS

_{a}(α, β) is most different from 32, they describe the most effective statistical analogue of the S-block S

_{a}.

The most effective analogue was found in the 5th S-block of the DES cipher for α = 16 and β = 15 NS

_{5}(16, 15) = 12. This means that the following equation holds: Z [2] = Y [1] ⊕ Y [2] ⊕ Y [3] ⊕ Y [4], where Z is the input sequence of the S-block and Y is the output sequence.

Or, given that in the DES algorithm, before entering the S-block, the data are added modulo 2 with a round key, i.e. Z [2] = X [2] ⊕ K [2] we get

X [2] ⊕ Y [1] ⊕ Y [2] ⊕ Y [3] ⊕ Y [4] = K [2], where X and Y are input and output data of function f without taking into account permutations.

The resulting equation is satisfied on all rounds of the DES algorithm with the same probability P = 12/64.

The following table lists the effective ones, i.e. having the largest deviation from P = 1/2, statistical analogues for each s-block of the DES algorithm.

#### Building statistical analogues for several rounds of DES

Let us now show how we can combine the statistical analogues of several DES rounds and finally obtain a statistical analog for the entire cipher.

To do this, consider a three-round version of the algorithm:

We use an effective statistical analogue of the 5th s-block to calculate certain bits of the value of X (2).

We know that with probability 12/64 in the f-function the equality

*X [2] ⊕ Y [1] ⊕ Y [2] ⊕ Y [3] ⊕ Y [4] = K [2],*where X [2] - the second input bit of the 5th S-block, it is essentially the 26th bit of the sequence obtained after the extension of the input bits. Analyzing the expansion function, we can establish that in place of 26 bits is the 17th bit of the sequence X (1).

Similarly, Y [1], ..., Y [4] are essentially the 17th, 18th, 19th and 20th bits of the sequence received before the permutation P. Having examined the permutation P, we get that the bits Y [1] , ..., Y [4] are actually bits Y (1) [3], Y (1) [8], Y (1) [14], Y (1) [25].

The key bit K [2] involved in the equations is the 26 bit subkey of the first round K1 and then the statistical analogue takes the following form:

*X (1) [17] ⊕ Y (1) [3] ⊕ Y (1) [8] ⊕ Y1 [ 14] ⊕ Y (1) [25] = K1 [26]*.

Therefore,

*X (1) [17] ⊕ K1 [26] = Y (1) [3] ⊕ Y (1) [8] ⊕ Y (1) [14] ⊕ Y (1) [25]*

**(2)**s probability P = 12/64.

Knowing 3, 8, 14, 25 bits of the sequence Y (1), you can find 3, 8, 14, 25 bits of the sequence X (2):

*X (2) [3] ⊕ X (2) [8] ⊕ X (2) [14] ⊕ X (2) [25] = PL [3] ⊕ PL [8] ⊕ PL [14] ⊕ PL [25 ] ⊕ Y (1) [3] ⊕ Y (1) [8] ⊕ Y (1) [14] ⊕ Y (1) [25]*or taking into account equation (2)

*X (2) [3] ⊕ X ( 2) [8] ⊕ X (2) [14] ⊕ X (2) [25] = PL [3] ⊕ PL [8] ⊕ PL [14] ⊕ PL [25] ⊕ X (1) [17] ⊕ K1 [26]*with a probability of 12/64.

**(3)**Find a similar expression using the last round. This time we have the equation

*X (3) [17] ⊕ K3 [26] = Y (3) [3] ⊕ Y (3) [8] ⊕ Y (3) [14] ⊕ Y (3) [25]*.

Since

*X (2) [3] ⊕ X (2) [8] ⊕ X (2) [14] ⊕ X (2) [25] = CL [3] ⊕ CL [8] ⊕ CL [14] ⊕ CL [25] ⊕ Y (3) [3] ⊕ Y (3) [8] ⊕ Y (3) [14] ⊕ Y (3) [25]*

we get that

*X (2) [3] ⊕ X (2) [8] ⊕ X (2) [14] ⊕ X (2) [25] = CL [3] ⊕ СL [8] ⊕ СL [14] ⊕ СL [25] ⊕ X (3) [17] ⊕ K3 [ 26]*

**(4)**with a probability of 12/64.

Equating the right-hand sides of equations (3) and (4) we obtain

*CL [3] ⊕ CL [8] ⊕ CL [14] ⊕ CL [25] ⊕ X (3) [17] ⊕ K3 [26] = PL [3] ⊕ PL [8] ⊕ PL [14] ⊕ PL [25] ⊕ X (1) [17] ⊕ K1 [26]*with probability (12/64)

^{2}+ (1-12 / 64)

^{2}.

Given that X (1) = PR and X (3) = CR, we obtain a statistical analogue of

*CL [3, 8, 14, 25] ⊕ CR [17] ⊕ PL [3, 8, 14, 25] ⊕ PR [ 17] = K1 [26] ⊕ K3 [26]*

**(5)**,

which holds with probability (12/64)

^{2}+ (1-12 / 64)

^{2}= 0.7.

The statistical analogue described above can be represented graphically as follows (the bits in the figure are numbered from right to left and starting from zero):

All bits on the left side of the equation are known to the attacker, therefore he can apply algorithm 1 and find out the value K1 [17]] K3 [17]. We show how to use this statistical analogue to open not 1, but 12 bits of the encryption key K.

#### DES attack with famous plaintext

Here is a method by which you can expand the attack and get 6 bits of the first round subkey right away.

When compiling equation (5), we took into account the fact that we do not know the value of F1 (PR, K1) [3, 8, 14, 25]. Therefore, we used its statistical counterpart K1 [26] ⊕ PR [17].

Instead of the expression K1 [26] ⊕ PR [17], we return the value F1 (PR, K1) [3, 8, 14, 25] and obtain the following equation:

*CL [3, 8, 14, 25] ⊕ CR [17] ⊕ PL [3, 8, 14, 25] ⊕ F1 (PR, K1) [3, 8, 14, 25] = K3 [26]*

**(6)**, which will be executed with a probability of 12/64. The probability has changed since we left only the statistical analogue from the third round, all other values are fixed.

We have already determined above that the value of F1 (PR, K1) [3, 8, 14, 25] is influenced by the input bits of the 5th S-block, namely the key bits K1 [25 ~ 30] and the bits of the PR block [16 ~ 21]. Let us show how, having only a set of open / closed texts, you can restore the value of K1 [25 ~ 30]. For this we use algorithm 2.

**Algorithm 2**

Let N be the number of open / closed text pairs known before the attack. Then, to open the key, you must do the following steps.

For (i = 0; i <64; i ++) do

{

For (j = 0; j

if (CL

_{j}[3, 8, 14, 25] ⊕ CR

_{j}[17] ⊕ PL

_{j}[3, 8, 14, 25] ⊕ F1 (PR

_{j}, i) [3, 8, 14, 25] = 0) then

T

_{i}= T

_{i}+1

}

}

As the probable sequence K1 [25 ~ 30], the value i is taken for which the expression | T

_{i}-N / 2 | has maximum value.

With a sufficient number of known plaintexts, the algorithm will most likely return the correct value of six bits of the subkey of the first round K1 [25 ~ 30]. This is explained by the fact that if the variable i is not equal to K1 [25 ~ 30], then the value of the function F1 (PR

_{j}, K) [3, 8, 14, 25] will be random and the number of equations for such a value i, for which the left-hand side is equal to zero, will tend to N / 2. If the subkey is correctly guessed, the left part will be 12/64 with probability equal to the fixed bit K3 [26]. Those. a significant deviation from N / 2 will be observed.

Having received 6 bits of K1 subkey, you can similarly open 6 bits of K3 subkey. All that is needed is to replace C in P in equation (6) and K1 in K3:

*PL [3, 8, 14, 25] ⊕ PR [17] ⊕ CL [3, 8, 14, 25] ⊕ F3 ( CR, K3) [3, 8, 14, 25] = K1 [26]*.

Algorithm 2 will return the correct value K3 [25 ~ 30] because the decryption process of the DES algorithm is identical to the encryption process, just the sequence of keys is reversed. So, in the first round of decryption, the key K3 is used, and in the last round, the key is K1.

Having received 6 bits of K1 and K3 subkeys, an attacker recovers 12 bits of the common cipher key K, because round keys are the usual permutation of the key K. The number of clear texts necessary for a successful attack depends on the probability of a statistical counterpart. To open 12 bits of the key of a 3-round DES, 100 pairs of open / closed texts are enough. Opening up 12 bits of a 16-round DES key will require about 2

^{44}pairs of texts. The remaining 44 bits of the key are opened by normal enumeration.

#### References

Mitsuru Matsui - Linear cryptanalysis method of DES cipher

Tutorial on Linear and Differential Cryptanalysis