A High Efficiency Hardware Implementation of S-Boxes Based on Composite Field for Advanced Encryption Standard

Abstract

The SubBytes (S-box) transformation is the most crucial operation in the AES algorithm, significantly impacting the implementation performance of AES chips. To design a high-performance S-box, a segmented optimization implementation of the S-box is proposed based on the composite field inverse operation in this paper. This proposed S-box implementation is modeled using Verilog language and synthesized using Design Complier software under the premise of ensuring the correctness of the simulation result. The synthesis results show that, compared to several current S-box implementation schemes, the proposed implementation of the S-box significantly reduces the area overhead and critical path delay, then gets higher hardware efficiency. This provides strong support for realizing efficient and compact S-box ASIC designs.

Share and Cite:

Wang, Y. , Bin, S. , Zhu, S. and Hu, X. (2024) A High Efficiency Hardware Implementation of S-Boxes Based on Composite Field for Advanced Encryption Standard. Journal of Computer and Communications, 12, 228-246. doi: 10.4236/jcc.2024.124016.

1. Introduction

Advanced Encryption Standard (AES) is one of the most important block encryption algorithms at present and has been widely applied in various fields, such as communication, network security, e-commerce, and so on. In this algorithm, the plaintext is fixed at 128 bits, while its key length can vary between 128,192 and 256 bits. This algorithm arranges the 128-bit plaintext into a 4 ´ 4 state matrix by byte, and then encrypts the plaintext through multiple iterations of the round function which includes SubBytes (S-box), ShiftRow, and MixColumnand AddRoundKey operations. The number of iterations of encryption is determined by the length of the key 10 iterations for the 128-bit key, 12 iterations for a 192-bit key, and 14 iterations for a 256-bit key. Among these operations in the AES algorithm, S-box transformation is the only nonlinear transformation that plays a crucial role in the security of the encryption algorithm [1] . It makes cryptanalysis more difficult by adding confusion and diffusion and ensuring the security of the AES algorithm. However, its complexity largely impacts AES implementation performance.

Researchers have focused on improving the performance of S-box implementation through two methods: the look-up table method (LUT) [2] [3] and the algebraic method [4] [5] . The LUT method can transform complex nonlinear operations into simple look-up table operations, reducing the complexity and difficulty of implementation, but it has drawbacks like high area overhead and vulnerability to attack. In the algebraic method, the S-box is implemented based on the logical expression between inputs and outputs. Compared with the LUT method, the algebraic method does not need to calculate and store 256 predefined values of S-Box in ROM, resulting in a smaller area overhead. However, this kind of method involves the complex computation of the multiplicative inversion in finite field GF(28), which leads to low processing speed.

To achieve a better tradeoff between area overhead and speed for S-box implementation, the research has explored various strategies. Wang et al. [6] modified the affine transformation part of the S-box by using a pipeline, and proposed an area optimized combinational logic S-box implementation of AES. QIN et al. [7] obtained the logic expressions of the S-box and inverse S-box in the AES algorithm through the improved Q-M simplifying method, which reduced the delay, area, and power of the circuit. However, due to the bottleneck of multiplication inversion operation, this kind of improvement is limited. To address the bottleneck, a method based on composite fields was proposed [8] [9] . Using this method, one can convert the multiplicative inversion in GF(28) to low-order fields such as GF(24), GF(22), and GF(2) through isomorphic mapping, thereby reducing the complexity of multiplicative inversion and improving the efficiency of S-box processes. However, the implementation performance of S-box based on composite fields is related to the selection of low-order fields and their representation basis. To address this issue, several scholars have conducted extensive research on the selection of isomorphic matrix or polynomial basis and have proposed various optimization schemes of S-box [10] [11] [12] [13] . R. Ueno et al. [14] used different polynomial basis and Galois field arithmetic to optimize the inversion circuit, and proposed an efficient and compact S-box architecture. Some structural optimization schemes have also been proposed to improve the performance of the S-box. A. Reyhani-Masoleh et al. [15] mapped the finite field GF(28) to tower fields and proposed the structure of S-box and inverse S-box combination. Y.-T. TENG et al. [16] adopted a pipeline architecture based on composite fields and combined S-box and inverse S-box to achieve optimization effects. In [17] [18] , an optimization scheme of the S-box is proposed by changing the order and structure of affine transformation and multiplication inversion. In addition, the method of combining combinational logic simplification technique with composite fields is often used to optimize the S-box implementation. A. Nakashima et al. [19] optimized the isomorphic mapping logic by combining multiplicative and exponential offsets. S. -H LIN et al. [20] proposed a S-box structure of a seven-stage hardware pipeline system with high throughput based on composite fields, logic optimization technology and pipeline-flow technology.

In this paper, we propose a low-area, high-performance S-box ASIC (Application Specific Integration Circuit) implementation by combining composite fields, the Karnaugh map simplification technique, the Input Variable Bypass Technique (IVB) [21] and other combinational logic simplification techniques. The main contributions of this paper are as follows:

1) We merged the inverse isomorphism and affine transformation into one matrix transform, which simplifies the implementation of the S-box and reduces the area overhead.

2) Based on composite fields, we map the multiplicative inversion in GF(28), GF(24) and GF(22) to GF((24)2), GF((22)2) and GF(2) respectively. This approach reduces the complexity of multiplicative inversion.

3) We proposed a high efficiency of S-box architecture based on segmented optimization by combing different logic simplification techniques and exploring different segment combinations.

The organization of the remaining sections of this paper is as follows: Section 2 introduces basic theoretical background of this research topic in this paper. Section 3 describes the segmented optimization S-box structure proposed in this paper. Section 4 presents the logic derivation and optimization of each module in the proposed structure. Section 5 provides the experiment result and performance evaluation. Finally, in Section 6, this paper is concluded.

2. Preliminaries

2.1. GF(28) and Its Operations

GF(28) is a finite field with 256 elements. These elements can be represented in 8-bit binary or as polynomials of degrees less than 8 with coefficients in GF(2). Suppose an element a GF ( 2 8 ) and its binary form is a 7 a 6 a 5 a 4 a 3 a 2 a 1 a 0 , then its corresponding polynomial representation is shown in Equation (1).

a ( x ) = a 7 x 7 + a 6 x 6 + + a 1 x + a 0 ( a i GF ( 2 ) ) (1)

The operations in GF(28) include addition/subtraction, multiplication, inversion, and other operations. The addition/subtraction operation here ( ) is a bitwise XOR operation which, in polynomial representation, corresponds to the XOR of coefficients of the same terms.

The multiplication operation is modular multiplication ( ). It involves two steps: polynomial multiplication and reduction modulo an irreducible polynomial m ( x ) . Let a ( x ) and b ( x ) be elements in GF(28) and c ( x ) be their product. Then c ( x ) = a ( x ) × b ( x ) mod m ( x ) .

The multiplicative inverse a 1 ( x ) of a ( x ) in GF(28) is defined as the element that satisfies a ( x ) × b ( x ) mod m ( x ) = 1 .

Note that, in the AES algorithm, irreducible polynomial in GF(28) is specified as m ( x ) = x 8 + x 4 + x 3 + x + 1 .

2.2. The Fundamental Construction Principle of the S-Box Based on Composite Fields

The original S-box transformation in the AES algorithm consists of two parts: multiplicative inversion and affine transformation in GF(28). Its algebraic expression is shown as Equation (2):

S b [ x ] = A x 1 63 H (2)

where x , S b [ x ] GF ( 2 8 ) are the 8-bit input and output of the S-box respectively. x 1 is the inversion of x, and A is affine transformation matrix.

A = [ 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 ] .

As mentioned above, in the S-box transformation, the operation to find the inversion element of x is the most complex and time-consuming. Traditionally, the extended Euclidean algorithm is employed for inversion in GF(28), but its hardware implementation efficiency is not high. Therefore, the method of solving inversion based on composite fields was proposed. The key idea of this method is to transform the inversion operation in high-order fields into low-order fields, thus reducing the complexity of the inverse algorithm. Specifically, the implementation of the S-box based on composite fields can be divided into four steps:

1) Mapping an element q in GF(28) to q in the composite field GF(24)2 by using an isomorphic matrix so that q can be represented as b x + c in GF(24)2 where b , c GF ( 2 4 ) .

2) Using Equation (3) to find the multiplicative inverse of q in GF((24)2) [22] .

q 1 = ( b x + c ) 1 = b ( b 2 λ b c β c 2 ) 1 x + ( c b β ) ( b 2 λ b c β c 2 ) 1 (3)

3) Converting q 1 in GF(24)2 into q 1 in GF(28) by using the inverse of isomorphic matrix.

4) Performing the operation A q 1 63 H to get final the output of S-box S b [ q ] .

Note that β and l in Equation (3) are the coefficients of the irreducible polynomial x 2 + β x + λ chosen for multiplication when mapping GF(28) to GF((24)2).

3. Proposed S-Box Architecture Based on Segmented Optimization

This section presents a segmented optimized S-box architecture. As shown in Figure 1, this architecture consists of an isomorphic mapping module, an inversion module in composite fields and a merged module of inverse isomorphism and affine transformation. The functions of each module are described as follows:

The isomorphic mapping module: To convert an 8-bit element q (be regarded as a column vector) in GF(28) into q in GF((24)2) by multiplying q by the isomorphic matrix d. That is q = δ q .

Figure 1. Proposed S-box architecture based on segmented optimization.

The inversion module in composite fields: To calculate the inversion of q in GF((24)2). The inversion operation is completed by three steps F1, F2, F3. All operations in submodules F1, F2, F3 are 4-bit operations in the low-order field GF(24) which largely reduces the computational complexity.

The merged module of inverse isomorphism and affine transformation: To complete the operation a = A δ 1 q 1 63 H equivalently by replacing A δ 1 with a merged matrix B and optimizing XOR constant operation where δ 1 represent the inverse isomorphism matrix.

4. Logic Derivation and Optimizations

In this section, we present the logic derivation for three modules and corresponding sub-segments in each module shown in Figure 1 in Section 3.

4.1. The Isomorphic Mapping Module

When calculating multiplicative inverses in GF(28) based on composite fields, the first step is to map the elements in GF(28) to the composite field GF((24)2) using by multiplying an isomorphic matrix, but these isomorphic matrix is not fixed because the elements in GF((24)2) can be generated by different irreducible polynomials. It implies that there exists diversity in the isomorphic mapping matrix from elements in GF(28) to GF((24)2). To optimize the implementation performance of the isomorphic mapping module in S-box, it is necessary to carefully to select appropriate irreducible polynomials to get the best isomorphic matrix, thereby reducing critical path delay and area overhead. Refer to [22] , we choose the isomorphic matrix δ as shown in Equation (4).

δ = [ 1 0 1 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 1 0 0 0 1 1 0 1 0 0 1 1 1 1 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 1 1 ] (4)

Assuming that the input of the S-box is q = [ q 7 , q 6 , q 5 , q 4 , q 3 , q 2 , q 1 , q 0 ] T , the expression for the output of the isomorphic mapping module is shown as Equation (5).

δ × q = q = [ b 3 b 2 b 1 b 0 c 3 c 2 c 1 c 0 ] = [ q 7 q 5 q 7 q 6 q 4 q 3 q 2 q 1 q 7 q 5 q 3 q 2 q 7 q 5 q 3 q 2 q 1 q 7 q 6 q 2 q 1 q 7 q 4 q 3 q 2 q 1 q 6 q 4 q 1 q 6 q 1 q 0 ] (5)

where b3b2b1b0 represents the high 4 bits of output q , and c3c2c1c0 represents the low 4 bits of output q .

4.2. The Inversion Module in Composite Fields

To obtain the multiplicative inverse of q in GF((24)2) expressed as Equation (3), we decompose Equation (3) into F1, F2 and F3 as shown in Equation (6), (7) and (8), and use the recursive deduction method to get the logical expressions of F1, F2 and F3.

F 1 = b 2 λ b c β c 2 (6)

F 2 = ( F 1 ) 1 (7)

F 3 = b F 2 | | ( c b β ) F 2 (8)

where the symbol “||” in Equation (8) represents concatenation.

In order to optimize the implementation, we select β = 1, λ = 1100 b to be the coefficients of the generating polynomial m 1 ( x ) = x 2 + β x + λ of the composite field GF((24)2), then Equations (6), (7) and (8) are simplified to (9), (10), (11).

F 1 = b 2 λ b c c 2 (9)

F 2 = ( F 1 ) 1 (10)

F 3 = b F 2 | | ( c b ) F 2 (11)

Note that, when calculating inversion in composite fields, three types of irreducible polynomials are required to complete the operation conversion from the high-order field to the low-order field. To simplify the following expression, we list the three types of polynomials as shown in Table 1.

4.2.1. F1 Module

Figure 2 shows the calculation process of F1. Next, we will deduce and optimize the logic expression for each step in Figure 2.

Table 1. Three types of irreducible polynomials.

Figure 2. F1 module logic operation diagram.

1) b2 operation in GF(24)

As can be seen from Figure 2, we let k = b 2 = k 3 k 2 k 1 k 0 = ( b 3 b 2 b 1 b 0 ) 2 , then this operation can be expressed as the form in GF((22)2) shown in Equation (12).

k = b 2 = ( b H x + b L ) 2 mod m 2 ( x ) = b H 2 x + ( b H 2 φ b L 2 ) = k H x + k L (12)

where k H = k 3 k 2 , k L = k 1 k 0 , b H = b 3 b 2 , b L = b 1 b 0 , m 2 ( x ) (see Table 1) is an irreducible generating polynomial for GF((22)2). Obviously,

k H = b H 2 ; k L = b H 2 φ b L 2 . (13)

Then one can further decompose the operation b H 2 and b L 2 in GF(22) to GF(2) to get Equation (14):

{ b H 2 = ( b 3 x + b 2 ) 2 mod m 3 ( x ) = b 3 x + ( b 3 b 2 ) b L 2 = ( b 1 x + b 0 ) 2 mod m 3 ( x ) = b 1 x + ( b 1 b 0 ) (14)

where m 3 ( x ) (see Table 1) is an irreducible generating polynomial in GF(2).

By combining Equation (13), (14), we can obtain Equation (15).

{ k H = b 3 x + ( b 3 b 2 ) = k 3 x + k 2 k L = ( b 1 b 2 ) x + ( b 0 b 1 b 3 ) = k 1 x + k 0 (15)

According to Equation (15), we can get the expression of k as shown in equation (16).

{ k 3 = b 3 k 2 = b 2 b 3 k 1 = b 1 b 2 k 0 = b 0 b 1 b 3 (16)

2) t = b 2 λ operation

As mentioned above, in order to optimize the expression, we choose λ = { 1100 } 2 and let

t = b 2 λ = k λ = t 3 t 2 t 1 t 0 = ( k 3 k 2 k 1 k 0 ) λ .

Then, the polynomial representation of t can be written as Equation (17).

t = ( k H x + k L ) ( λ H x + λ L ) mod m 2 ( x ) = ( k H λ H x 2 + k H λ L x + k L λ H x + k L λ L ) mod m 2 ( x ) = ( k L λ H k H λ H ) x + k H λ H φ = t H x + t L (17)

where t H = t 3 t 2 , t L = t 1 t 0 , λ H = 11 , λ L = 00 .

Using similar derivation method used for k, t H and t L can be expressed as Equation (18).

{ t H = k L λ H k H λ H = ( ( k 1 x + k 0 ) ( x + 1 ) + ( k 3 x + k 2 ) ( x + 1 ) ) mod m 3 ( x ) = ( k 0 k 2 ) x + ( k 0 k 1 k 2 k 3 ) = t 3 x + t 2 t L = k H λ H φ = [ ( k 3 x + k 2 ) ( x + 1 ) x ] mod m 3 ( x ) = k 3 x + k 2 = t 1 x + t 0 (18)

Then we can get the expression of t as Equation (19).

{ t 3 = k 0 k 2 t 2 = k 0 k 1 k 2 k 3 t 1 = k 3 t 0 = k 2 (19)

3) s = c ( b c ) operation

For s, we let m i = b i c i , then the expression of s can be written as Equation (20).

{ s 3 = c 3 m 2 c 2 m 3 c 3 m 0 c 0 m 3 c 2 m 1 c 1 m 2 c 3 m 3 c 3 m 1 c 1 m 3 s 2 = c 2 m 2 c 2 m 0 c 0 m 2 c 3 m 3 c 3 m 1 c 1 m 3 s 1 = c 1 m 0 c 0 m 1 c 2 m 2 c 1 m 1 c 3 m 2 c 2 m 3 s 0 = c 0 m 0 c 1 m 1 c 3 m 2 c 2 m 3 c 3 m 3 (20)

By combining Equations (16), (19), (20) and simplifying them, we can obtain the logic expression of e as Equation (21).

{ e 3 = b 0 c 3 ¯ b 1 c 2 ¯ b 2 c 1 ¯ c 3 b 2 c 2 b 3 c 0 b 3 c 3 b 3 ¯ c 3 b 1 c 1 b 3 e 2 = c 2 ¯ b 0 c 1 ¯ b 3 c 2 b 2 ¯ c 0 b 2 c 3 b 3 ¯ c 3 b 1 e 1 = c 1 b 0 c 0 b 1 c 2 b 2 ¯ c 1 b 1 ¯ c 3 b 2 c 2 ¯ b 3 e 0 = c 0 b 0 ¯ c 1 b 1 ¯ c 3 ¯ b 2 c 2 ¯ b 3 c 3 b 3 ¯ (21)

4.2.2. F2 Module

The F2 module is used to calculate the inversion in GF(24). To derive and simplify the output expression of F2, we first utilize the idea of tower fields to partition the operation in F2 into three parts: G1, G2 and G3, as depicted in Figure 3. Subsequently, after obtaining the logic expression of G1, G2 and G3, we merge them into an expression. Finally, we employ Karnaugh map simplification technique, input variable bypass technique [21] and other logic simplification technique to further simplify and obtain the output of F2.

Specifically, according to the idea of tower fields, the G1, G2 and G3 can be expressed as Equation (22).

{ G 1 = e H 2 φ e L ( e H e L ) G 2 = ( G 1 ) 1 G 3 = e H G 2 | | ( e H e L ) G 2 (22)

where e H = e 3 e 2 , e L = e 1 e 0 .

Figure 3. F2 module structure diagram.

For G1, we first get Equation (23), then obtain the expression of G1 as shown in equation (24) by combining Equations (22) and (23) and simplifying them.

{ e H 2 = ( e 3 , e 2 e 3 ) e H 2 φ = ( e 2 , e 3 ) e L ( e H e L ) = ( e 1 e 3 ¯ e 0 e 3 e 1 e 2 , e 1 e 3 ¯ e 0 e 2 ¯ ) (23)

G 1 = ( h 1 , h 0 ) = ( e 1 e 3 ¯ e 0 e 3 e 2 e 1 ¯ , e 3 e 1 e 3 ¯ e 0 e 2 ¯ ) (24)

For G2, we directly use the simplification method of Karnaugh map to obtain its output expression as shown in Equation (25).

{ h 1 ' = h 1 h 0 ' = h 1 h 0 (25)

Finally, by combining with Equation (24) and (25) and G3 in Equation (22) and simplifying the expression combined, we obtain the expression of F2 as shown in Equation (26).

{ y 3 = e 0 ¯ e 3 e 1 e 2 e 3 ¯ e 1 ¯ e 2 y 2 = e 0 e 2 ¯ e 3 e 1 e 2 e 3 e 1 ¯ e 2 y 1 = e 1 e 0 e 1 e 2 ¯ e 3 e 1 e 2 e 3 ¯ e 0 ¯ e 1 ¯ e 2 e 0 e 1 e 3 ¯ y 0 = e 1 e 2 e 1 e 2 ¯ e 3 ¯ e 0 ¯ e 1 ¯ e 2 e 0 e 1 e 3 e 0 e 2 ¯ e 3 ¯ (26)

4.2.3. F3 Module

As shown in Figure 1, b, c and y are the inputs of F3, then we can derive to get the output expression of F3 q 7 1 q 6 1 q 5 1 q 4 1 q 3 1 q 2 1 q 1 1 q 0 1 as shown in Equation (27).

{ q 7 1 = y 3 ( b 0 b 1 b 2 b 3 ) y 2 ( b 1 b 3 ) y 1 ( b 2 b 3 ) y 0 b 3 q 6 1 = y 3 ( b 1 b 3 ) y 2 ( b 0 b 2 ) y 1 b 3 y 0 b 2 q 5 1 = y 3 b 2 y 2 ( b 2 b 3 ) y 1 ( b 0 b 1 ) y 0 b 1 q 4 1 = y 3 ( b 2 b 3 ) y 2 b 3 y 1 b 1 y 0 b 0 q 3 1 = y 3 ( b 0 c 0 b 1 c 1 b 2 c 2 b 3 c 3 ) y 2 ( b 1 c 1 b 3 c 3 ) y 1 ( b 2 c 2 b 3 c 3 ) y 0 ( b 3 c 3 ) q 2 1 = y 3 ( b 1 c 1 b 3 c 3 ) y 2 ( b 0 c 0 b 2 c 2 ) y 1 ( b 3 c 3 ) y 0 ( b 2 c 2 ) q 1 1 = y 3 ( b 2 c 2 ) y 2 ( b 2 c 2 b 3 c 3 ) y 1 ( b 0 c 0 b 1 c 1 ) y 0 ( b 1 c 1 ) q 0 1 = y 3 ( b 2 c 2 b 3 c 3 ) y 2 ( b 3 c 3 ) y 1 ( b 1 c 1 ) y 0 ( b 0 c 0 ) (27)

To optimize the implementation of Equation (27) in the hardware circuit, we extract the common part of 8 sub-expressions of Equation (27) and define intermediate variable d i as shown in Equation (28). Then the expression can be optimized into Equation (29).

{ d 0 = b 0 b 1 d 1 = b 0 b 2 d 2 = b 1 b 3 d 3 = b 2 b 3 d 4 = b 3 c 3 d 5 = b 2 c 2 d 6 = b 1 c 1 d 7 = b 0 c 0 d 8 = d 0 d 3 d 9 = d 4 d 5 d 6 d 7 d 10 = d 4 d 6 d 11 = d 4 d 5 d 12 = d 5 d 7 d 13 = d 6 d 7 (28)

{ q 7 1 = y 3 d 8 y 2 d 2 y 1 d 3 y 0 b 3 q 6 1 = y 3 d 2 y 2 d 1 y 1 b 3 y 0 b 2 q 5 1 = y 3 b 2 y 2 d 3 y 1 d 0 y 0 b 1 q 4 1 = y 3 d 3 y 2 b 3 y 1 b 1 y 0 b 0 q 3 1 = y 3 d 9 y 2 d 10 y 1 d 11 y 0 d 4 q 2 1 = y 3 d 10 y 2 d 12 y 1 d 4 y 0 d 5 q 1 1 = y 3 d 5 y 2 d 11 y 1 d 13 y 0 d 6 q 0 1 = y 3 d 11 y 2 d 4 y 1 d 6 y 0 d 7 (29)

4.3. The Merged Module of Inverse Isomorphism and Affine Transformation

This module completes the operation a = ( A δ 1 q 1 ) 63 H . For this module, we use two methods to optimize the logic to reduce the number of logic gates and shorten the critical path delay: 1) To merge the inverse isomorphic mapping and affine transformation into a matrix transformation. 2) To simplify the logic expression for XOR operation with 63H by using NOT gate to replace XOR.

Specifically, we let B = A × δ 1 , q 1 and q to be the input and output of the merged module of inverse isomorphism and affine transformation respectively. Then the expression of the original expression a = ( A δ 1 q 1 ) 63 H can be reduced to Equation (30).

a = B q 1 63 H (30)

where the inverse transformation matrix δ 1 , affine transformation matrix A and merged matrix B corresponds to Equations (31), (32) and (33) respectively.

δ 1 = [ 1 1 1 0 0 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 ] (31)

A = [ 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 0 1 ] (32)

B = A δ 1 = [ 1 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 ] (33)

Then by combining the Equations (30) and (33), we can get the expression of output of S-box a as shown in Equation (34).

{ a 7 = q 7 1 q 3 1 q 2 1 0 a 6 = q 7 1 q 6 1 q 5 1 q 4 1 1 a 5 = q 7 1 q 2 1 1 a 4 = q 7 1 q 4 1 q 1 1 q 0 1 0 a 3 = q 2 1 q 1 1 q 0 1 0 a 2 = q 6 1 q 5 1 q 4 1 q 3 1 q 2 1 q 0 1 0 a 1 = q 7 1 q 0 1 1 a 0 = q 7 1 q 6 1 q 2 1 q 1 1 q 0 1 1 (34)

Observing the expression in Equation (34), we find that expressions of a 6 , a 5 , a 1 , a 0 includes constant term 1 and they are common term q 7 1 . Thus we can use q 7 1 ¯ to substitute the q 7 1 1 and convert Equation (34) into Equation (35), which simplify the expression and reduce the number of XOR gate.

{ a 7 = q 7 1 q 3 1 q 2 1 a 6 = q 7 1 ¯ q 6 1 q 5 1 q 4 1 a 5 = q 7 1 ¯ q 2 1 a 4 = q 7 1 q 4 1 q 1 1 q 0 1 a 3 = q 2 1 q 1 1 q 0 1 a 2 = q 6 1 q 5 1 q 4 1 q 3 1 q 2 1 q 0 1 a 1 = q 7 1 ¯ q 0 1 a 0 = q 7 1 ¯ q 6 1 q 2 1 q 1 1 q 0 1 (35)

4.4. Example

To test the correctness of the output of each segment, taking the input 11110000 as an example and referring to equations (5), (21), (26), (28), (29) and (35), we calculate and obtain the operation results of each segment in Figure 1 and the final S-box output as shown in Table 2. It is observed that the output of S-box of proposed S-box is consistent with the output of AES standard, which proves that the expression of each segment is correct.

5. Experiment Result and Performance Evaluation

In order to evaluate the hardware implementation performance of the proposed architecture, we modelled and simulated the proposed design using Verilog language and Modelsim respectively. Design Complier (DC) was employed to complete ASIC synthesis under TSMC 90nm CMOS Technology library.

Table 2. The segmented calculation results of S-box in this paper and the comparison with AES standard output.

Figure 4 shows the simulation results of two kinds of S-box. In Figure 4, s_box_1_out represents the output result of the standard AES S-box, and s_box is that of the S-box proposed in this paper. As shown in Figure 4, the output of the S-box in this paper is consistent with that of the standard S-box for different inputs, which proves that the scheme in this paper is correct.

Table 3 shows the DC synthesis results and the theoretical estimation results of Critical Path Delay (CPD). Note that the delay equivalent relationship between logic gates used to estimate CPD and equivalent CPD in Table 3 refers to the standard in [13] . The specific equivalent relationship is listed in Table 4. Note that, in Table 4, XOR stands for two-input XOR gate, AND two-input AND gate, AND3X1 three-input AND gate, OR two-input OR gate, MUX two-input multiplexer.

According to the DC synthesis results in Table 3, we first compared the synthesis results of our proposed S-box with [16] synthesized using the same TSMC 90 nm CMOS standard cell library.

Figure 4. Modelsim simulation results of the proposed S-box.

Table 3. DC synthesis results of different S-boxes and theoretical estimation results of critical path delay.

Table 4. The delay equivalent relationship.

The results indicate that the S-box implementation proposed in this paper only needs 1593.24 μm2 area overhead. In comparison with the implementation of the pipeline architecture proposed in [16] , although the implementation in this paper reduces the frequency and throughput by approximately 13.54% and 13.55% respectively, the area overhead is reduced by about 42.47%, leading to an increase in hardware efficiency (throughput/area) by about 50.29%. Obviously, the scheme proposed in this paper offers significant advantages in terms of area overhead and hardware efficiency.

In addition, in terms of CPD, the S-box implementation proposed in this paper has a CPD of 15 XOR + 2 AND + 1 AND3X1 + 2 INV. The CPD is calculated based on Equations (5), (21), (26), (28), (29), (35) which is corresponding to five segments in Figure 1 in Section 3. Specifically, the CPD is 3 XOR for Equation (5), 4 XOR + 1 AND + 1 INV for Equation (21), and 3 XOR + 1 AND3X1 + 1 INV for Equation (26) and (28) (calculated in parallel). Then the CPD of Equation (29) and (35) are 2 XOR + 1 AND and 3 XOR, respectively. Comparatively, the CPD in [16] is 19 XOR + 3 AND + 1 OR + 2 MUX. After being converted equivalently according to the relationship shown in Table 4, the CPD of this paper is 51 NAND + 4 INV, whereas that of [16] is 65 NAND + 8 INV. It is evident that the CPD of the S-box proposed in this paper saves the delay of 14 NAND and 4 INV logic gates compared with [16] . Even ignoring the difference in the number of INV logic gates, the CPD of the S-box in this paper is about 21.54% higher than that in [16] , which is a relatively valuable improvement. However, it should be noted that the DC synthesis results in [16] show that its working clock frequency is higher than that in this paper. This is mainly because the synthesis results given in [16] are for the S-box of 5-level pipeline architecture in which its clock frequency is determined by the CPD of some sub-stage, but not entire S-box.

Secondly, we compared our proposed implementation with [14] and [20] which were synthesized using TSMC 65 nm and 40 nm CMOS standard cell library respectively. For the convenience of comparison with related data in [14] , we calculated that the frequency (frequency = 1/Time) and throughput (throughput = 8 bit/Time) in [14] are approximately 328.95 Mhz and 2.632 Gbps respectively based on the Time given in [14] and made a comparison with corresponding data in this paper. The comparison results show that the Time of the proposed implementation in this paper is reduced by approximately 68.42% compared to [14] , which indicates that the speed of S-box implementation proposed in this paper significantly surpasses that reported in [14] . Furthermore, despite the larger area of the S-box proposed in this paper compared to [14] , the S-box in this paper exhibits a lower area-time product (GE*Time) reduced by approximately 28.34%. It implied that our design achieves a superior balance between area and speed.

Reference [20] is the latest known paper focusing on S-box implementation. To facilitate comparison, we first calculated that the equivalent GE in [20] is 1223 two-input NAND Gates (the equivalent GE under TSMC40 nm technology library = area under TSMC40 nm/0.68 where 0.68 is the area of the two-input NAND in the TSMC 40 nm technology library). The comparison results show that the S-box implementation proposed in this paper can save GE and the product of GE and Time by approximately 53.80% and 44.56% respectively and improve 80.59% hardware efficiency compared to that in [20] .

Overall, the S-box implementation proposed in this paper has obvious advantages over the existing schemes in terms of area overhead, critical path delay and hardware efficiency. It can provide favorable support for implementing efficient and compact AES S-box ASIC design.

6. Conclusions and Future Work

Aiming at the shortcomings of low processing speed and computational complexity in multiplicative inversion over the finite field, this paper explores the optimization of AES S-box and proposes a segmented optimized S-box architecture by using the idea of calculating inverse elements in tower fields and combining with Karnaugh map simplification and IVB technique. This architecture was divided into three modules: an isomorphic mapping module, an inversion module in composite fields and a merged module of inverse isomorphism and affine transformation. In this way, we simplify the logic of the S-box implementation.

In the isomorphic mapping module, based on the concept of composite fields, we map elements in the finite field to the composite field, thus reducing the complexity of calculations. In the inversion module in composite fields, we use the recursive method to optimize the inversion part. The experimental results indicate that compared with the traditional algebraic method, our design effectively reduces the computational complexity and area overhead. In the merged module of inverse isomorphism and affine transformation, we combine the matrix of inverse isomorphic mapping and affine transformation into one to further simplify the logic expression of the S-box.

In conclusion, the segmented optimized S-box architecture proposed in this paper has the characteristics of small area, low critical path delay and high hardware efficiency which is very valuable for area-limited applications.

While we have made some progress in the implementation of S-boxes, we only consider the forward AES S-box in this paper because many operations in the block encryption algorithm only involve encryption. The inverse S-box and the forward S-box have many similar parts, so how to design the low-area inverse S-box and how to combine the S-box and the inverse S-box is the work we need to study in the future. At the same time, although the AES encryption algorithm has higher security performance, the security of AES hardware implementation is seriously threatened by the development of hardware attack technology. Therefore, to improve the security of the AES encryption algorithm, adding fault detection on the basis of the proposed S-box architecture is also our future work.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

References

[1] Xu, Y.T., Lyu, Z.G., Huang, Y.G. and Li, X.Y. (2020) Optimization and Implementation of AES Algorithm Based on STM32 MCU. Process Automation Instrumentation, 41, 56-60.
https://chn.oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2020&filename=ZDYB202007013&uniplatform=OVERSEA&v=FBrazSYzB-bBRjeM4cilrp85xrbI9lm56klGQJewY_FfxAjpFv1g49uZwQfYYS3u
[2] Manjith, B.C. (2019) Improving Overall Parallelism in AES Accelerator Using BRAM and Multiple Input Blocks. 2019 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, 22-23 March 2019, 1-5.
https://doi.org/10.1109/i-PACT44901.2019.8960016
[3] Arul Murugan, C., Karthigaikumar, P. and Priya, S.S. (2020) FPGA Implementation of Hardware Architecture with AES Encryptor Using Sub-Pipelined S-Box Techniques for Compact Applications. Automatika, 61, 682-693.
https://doi.org/10.1080/00051144.2020.1816388
[4] Rachh, R.R. and Ananda Mohan, P.V. (2008) Implementation of AES S-Boxes Using Combinational Logic. 2008 IEEE International Symposium on Circuits and Systems (ISCAS), Seattle, 18-21 May 2008, 3294-3297.
https://doi.org/10.1109/ISCAS.2008.4542162
[5] Ahmad, N., Rezaul Hasanand, R. and Jubadi, W.M. (2010) Design of AES S-Box Using Combinational Logic Optimization. 2010 IEEE Symposium on Industrial Electronics & Applications (ISIEA), Penang, 3-5 October 2010, 696-699.
https://doi.org/10.1109/ISIEA.2010.5679375
[6] Wang, Q., Liang, J. and Qi, Y. (2010) The Area Optimized Implementation of S-box in AES Algorithm. Chinese Journal of Electronics, 38, 939-942.
https://kns.cnki.net/kcms2/article/abstract?v=smPsKIJgVaAXklosraIEqnytl0tPMLltpr9WZoEcUceHoSuINcBb4nLePRzxH-SSJdJ5qxypin17TJWVPQy99V8N47WNPrJaAIT7SexfzEdyNjgiXFuNLuhitAQEjLJSuxZ3wNLE8p8=&uniplatform=NZKPT&language=CHS
[7] Qin, X.C. and Li, S.G. (2014) An Expression Method to Implement S-Box and Inverse S-Box Substitution for AES Algorithm. Microelectronics & Computer, 31, 112-115.
https://kns.cnki.net/kcms2/article/abstract?v=smPsKIJgVaAXklosraIEqnytl0tPMLltpr9WZoEcUceHoSuINcBb4nLePRzxH-SSJdJ5qxypin17TJWVPQy99V8N47WNPrJaAIT7SexfzEdyNjgiXFuNLuhitAQEjLJSuxZ3wNLE8p8=&uniplatform=NZKPT&language=CHS
[8] Satoh, A., Morioka, S., Takano, K. and Munetoh, S. (2001) A Compact Rijndael Hardware Architecture with S-Box Optimization. In: Boyd, C., Ed., ASIACRYPT 2001: Advances in CryptologyASIACRYPT 2001, Springer, Berlin, 239-254.
https://doi.org/10.1007/3-540-45682-1_15
[9] Canright, D. (2005) A Very Compact S-Box for AES. In: Rao, J.R. and Sunar, B., Eds., CHES 2005: Cryptographic Hardware and Embedded SystemsCHES 2005, Springer, Berlin, 441-455.
https://doi.org/10.1007/11545262_32
[10] Reyhani-Masoleh, A., Taha, M. and Ashmawy, D. (2018) Smashing the Implementation Records of AES S-Box. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2018, 298-336.
https://doi.org/10.46586/tches.v2018.i2.298-336
[11] Ashmawy, D. and Reyhani-Masoleh, A. (2021) A Faster Hardware Implementation of the AES S-Box. 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, 14-16 June 2021, 123-130.
https://doi.org/10.1109/ARITH51176.2021.00034
[12] Qin, P.Y., Zhou, F., Wu, N. and Xian, F.C. (2021) A Compact Implementation of AES S-Box Based on Dual Basis. 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, 7-10 May 2021, 118-122.
https://doi.org/10.1109/ICET51757.2021.9451103
[13] Li, Y.J., Zhang, W.G., Ge, Y.D., Huang, Y.T. and Huo, S.S. (2023) Optimized Realization of AES-Like Algorithm S-Box. Journal of Cryptologic Research, 10, 531-538.
https://chn.oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2023&filename=MMXB202303007&uniplatform=OVERSEA&v=3krAWpsgW0pQFgxPqPGHNXUu9KOVM3LR-yRj4uPmzaytoGewPl1MMY0QmJe5iVlK
[14] Ueno, R., Homma, N., Nogami, Y. and Aoki, T. (2018) Highly Efficient GF(28) Inversion Circuit Based on Hybrid GF Representations. Journal of Cryptographic Engineering, 9, 101-113.
https://doi.org/10.1007/s13389-018-0187-8
[15] Reyhani-Masoleh, A., Taha, M. and Ashmawy, D. (2020) New Low-Area Designs for the AES Forward, Inverse and Combined S-Boxes. IEEE Transactions on Computers, 69, 1757-1773.
https://doi.org/10.1109/TC.2019.2922601
[16] Teng, Y.T., Chin, W.L., Chang, D.K., Chen, P.Y. and Chen, P.W. (2022) VLSI Architecture of S-Box with High Area Efficiency Based on Composite Field Arithmetic. IEEE Access, 10, 2721-2728.
https://doi.org/10.1109/ACCESS.2021.3139040
[17] Zhong, X.L. and Wu, X.C. (2023) Improved Scheme for AES S-Box and Its Hardware Design. Application Research of Computers, 40, 3784-3788.
https://chn.oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2024&filename=JSYJ202312041&uniplatform=OVERSEA&v=6V31X-ZMm4dw3aUv6LWSfloYqa4O0wQBBQfFsOG5q8ozVts-sEJtUdkPlBfClBfe
[18] Shen, X.C. and Han, M. (2018) Improved S-box Based on Strict Avalanche Distance Criterion. Microelectronics & Computer, 35, 92-96.
https://chn.oversea.cnki.net/KCMS/detail/detail.aspx?dbcode=CJFD&dbname=CJFDLAST2018&filename=WXYJ201806020&uniplatform=OVERSEA&v=y8jwcNYZOk4NEvpQNzq689lZkTI8P8tEKPKjl4d94PoJ4RAsb8iS50lWFfAulS1X
[19] Nakashima, A., Ueno, R. and Homma, N. (2022) AES S-Box Hardware with Efficiency Improvement Based on Linear Mapping Optimization. IEEE Transactions on Circuits and Systems II: Express Briefs, 69, 3978-3982.
https://doi.org/10.1109/TCSII.2022.3185632
[20] Lin, S.H., Lee, J.Y., Chuang, C.C., Lee, N.Y., Chen, P.Y. and Chin, W.L. (2023) Hardware Implementation of High-Throughput S-Box in AES for Information Security. IEEE Access, 11, 59049-59058.
https://doi.org/10.1109/ACCESS.2023.3284142
[21] Maity, H., Kundu, P., Bhowmik, A. and Barik, A.K. (2023) Input Variable Bypass or IVB Technique for Logic Functions Simplification. 2023 IEEE Devices for Integrated Circuit (DevIC), Kalyani, 7-8 April 2023, 1-4.
https://doi.org/10.1109/DevIC57758.2023.10135020
[22] Mui, E.N., Custom, R. and Engineer, D. (2007) Practical Implementation of Rijndael S-Box Using Combinational Logic. Custom R&D Engineer Texco Enterprise Pvt. Ltd.
http://www.geocities.ws/dariuskrail20/Practical_Implementation_of_Rijndael_S-Box_Using_Combinational_Logic.pdf

Copyright © 2024 by authors and Scientific Research Publishing Inc.

Creative Commons License

This work and the related PDF file are licensed under a Creative Commons Attribution 4.0 International License.