High-Performance Design of a 4-Bit Carry Look-Ahead Adder in Static CMOS Logic

ABSTRACT


INTRODUCTION
The necessity of high-speed, energy-saving and area-efficient way of arithmetic operations has become the foremost requirements for present-day microprocessors. Among the arithmetic operations, addition of binary numbers plays the most prominent role since many other arithmetic operations require addition to compute [1]. Moreover, addition plays a critical role in crypto-processing, parallel-processing and digital signal processing for which adder design with excellent performance parameters has gained paramount interest among researchers [2].
In the traditional way of implementing wide adder in Ripple Carry Adder (RCA) style, computation in one stage needs to wait for the carry-out bit from the previous stage [3]. Among high-speed wide adder topologies, the CLA process prevails dominant as the delay occurred due to carry propagation is reduced by computing several stages in parallel [4]. Architecture of 4-bit CLA process plays an important role in wide adder design since 4-bit adders are used as fundamental units [5]. Hence, optimized high-performance design of a 4-bit CLA process will bring about comprehensive performance enhancement in wide adder blocks [6].
Although various CLA techniques mainly focused in CLA logic interpretation and algorithms have been developed, only a handful amount of research work has been conducted in transistor level representation of 4-bit CLA. In this work, a transistor level static CMOS logic based CLA process for generating carry-out

CONVENTIONAL 4-BIT CLA PROCESS IN STATIC CMOS LOGIC
Conventional execution of CLA adder uses carry-generate (G i ) and carry-propagate terms (P i ) [7][8]. If the input bits are denoted as A i and B i , then G i and P i terms can be expressed as the following equations.
As per the above equations and information about conventional CLA process provided in reference [10], logic gate level representation of 4-bit CLA adder can be expressed using Fig. 1. Now, using equations (4), (5), (6) and (7), switch level (transistor level) CLA circuits (for C 1 , C 2 , C 3 and C 4 ) in Fig. 1 have been expressed in reference [4,[11][12][13]. Conventional CLA circuits for computing carry-out terms have been represented by Fig. 2. For the input signals (P i and G i ) of CLA circuits in Fig. 2, conventional design uses AND and XOR gate depicted in Fig. 3.

PROPOSED 4-BIT CARRY LOOK-AHEAD ADDER IN STATIC CMOS LOGIC
The proposed CLA technique, as shown in Fig. 4 computes carry-out bits without P i and G i terms. In the proposed design, all input bits are directly used as inputs in the CLA circuits for carry generation. Sum Generation part is exactly the same as the conventional design. In order to design the CLA circuits in terms of the input bits (A i , B i and C 0 ), it is necessary to derive the simplified Boolean equations in terms of the input bits. The simplified Boolean equations for the proposed design are: Hence, in general, carry-out bits can be written as: Although the equations (8)-(12) is similar to equations of 4-bit Ripple Carry Adder (RCA), the transistor level design methodology presented in this section will transform the RCA equations into CLA process. With thorough scrutiny of equations (8)- (12), it can be observed that the preceding carry-out bit circuit for C i works as a fundamental building unit for its next carry-out bit C i+1 . Therefore, the first step is to implement C 1 . Implementation of N-channel CMOS (NMOS) network requires series connection of NMOS for AND operation (A 0 B 0 ) and parallel connection of NMOS for OR operation (A 0 +B 0 ) [14]. Now, it can be observed from equation (8) that A 0 +B 0 is having AND operation with C 0 . Therefore, C 0 has been added in series with A 0 +B 0 network to implement C 0 (A 0 +B 0 ). Further analysis of equation (8) would reveal that C 0 (A 0 +B 0 ) is having OR operation with A 0 B 0 . Hence, network for A 0 B 0 and C 0 (A 0 +B 0 ) are implemented in parallel (denoted by circuit 1 in Fig. 5). Now, in order to construct the sub-sequent NMOS network circuit for C 2 , circuit 1 would be used as a base. It can be easily understood for equation (9) that A 1 +B 1 would be in series with NMOS network of circuit `1 which implements C 1 (A 1 +B 1 ). Later, A 0 B 0 network has been added in parallel with C 1 (A 1 +B 1 ) to construct NMOS network (denoted by circuit 2 in Fig. 5) required for C 2 . Using exactly the same procedure, NMOS networks required for C 3 (denoted by circuit 3 in Fig. 5) and C 4 (denoted by circuit 4 in Fig. 5) have been constructed by using equations (10) and (11) respectively. Next, to provide full swing output, pull-up network using P-channel CMOS (PMOS) network is needed to be constructed. Since the number of input combinations that produce '0' and the number of input combinations that produce '1' are equal in carry-generation block of adder circuits, PMOS network for pull-up would be the mirror replica of NMOS networks [7,15]. In this technique, since PMOS and NMOS networks mirror replica of each other, the overall circuit becomes symmetric. Complete schematic of carry-generation circuits has been represented by Fig. 6. Since static CMOS logic provides complementary output, it is necessary to invert the logic level to get the desired output [7]. Therefore, inverters have been added to the design in order to get carry-out signals.  For sum generation, input signals A i and B i are used as inputs to XOR gate depicted in Fig. 3. This provides the P i signals. Later, P i and C i signals are used as input to another set of XOR gates to compute sum signals (S i ). This sum generation process in Fig. 4 is the exact equivalent to the process in conventional design in Fig. 1. Hence, the proposed design has only made changes in the design methodology of the CLA circuits whereas the sum generation process has been kept exactly the same as the conventional one.

SIMULATION RESULT ANALYSIS AND COMPARISON
To determine the improvement the proposed 4-bit CLA Adder can bring, the proposed and the conventional design topologies are required to be implemented, simulated and compared. In order to do so, simulation has been conducted utilizing Cadence design and simulation tools with the technology node of 45 nm. The source voltage used is 1 V. Transistor sizes used to implement the VLSI circuits in Fig. 2, Fig. 3 and Fig. 6 are mentioned in Table 1. Post-layout simulation results for the adders are discussed in the following paragraphs.
Key performance parameters: average power, propagation delay and power delay product have been evaluated and compared. To calculate average power, every single combination of the input signals has been applied to the 4-bit adder designs and power consumption has been computed for each case. Then, the average value of the obtained power consumption for all simulation data has been determined as average power. As per Table 2, the proposed design accomplished 4.84 % improvement in power consumption. Dynamic power in integrated circuit is the major contributor to the overall power consumption which occurs due to transitions of logic gates [16][17][18]. The proposed 4-bit CLA adder reduced dynamic power by completely eliminating the AND gates (G i terms) from the design. Since the number of logic gates has been reduced, the proposed design experiences less transition occurrence for which dynamic power consumption has been reduced. As a result, overall improvement of power could be accomplished in this research.
In addition to the improvement in average power, the proposed 4-bit CLA adder made 34.53 % improvement in propagation delay. In a VLSI circuit, there can be two types of critical path: rise time critical path and fall time critical path. Critical path rise time means the transistor path due to which maximum rise time delay occurs. On the other hand, Critical path fall time means the transistor path due to which maximum fall time delay occurs. Critical path rise time and fall time delay graphs (proposed and conventional) have been depicted in Fig. 7. Propagation delay for rise and fall time have been determined by calculating the time from 50% of input signal to 50% of output signal swing. Later, the average of propagation delays for rise and fall time graphs has been taken as the final propagation delay. Now, if we analyze Fig. 1 and Fig. 4 carefully, it can be clearly visualized that the inputs to the CLA circuits in conventional design come from AND and XOR gates whereas the inputs to the proposed CLA circuits are the input bits (A i and B i ) themselves. Since the AND and XOR gates have their own delays which are obviously not negligible, the CLA circuits in the conventional design receive the input signals (P i and G i ) a bit later than the proposed CLA circuits. This enabled the proposed CLA circuits to compute faster than the conventional ones. The improvement in power and delay resulted in 37.696 % improvement in PDP. Thus, the proposed 4-bit CLA Adder can be considered as an excellent substitute of the conventional one due to its enhanced performance in speed and power. Table 1. Transistor sizes of devices in Fig. 2, Fig. 3 and Fig. 6 Transistor Sizes of NMOS and PMOS in Fig. 2    In modern high-speed microprocessors, 4-bit adder architectures are taken as basic building block in order to implement higher order adders (for example 16-bit, 32-bit, 64-bit, etc.) [5]. Therefore, if the 4-bit base circuit have good performance parameters, then the effect of utilizing the 4-bit base in higher order adders will bring oveall performance improvement. Thus, the main goal of this research was to design a 4-bit adder block for using as a building block in higher order adders. Full Adder (FA) based 32-bit Ripple Carry Adder (RCA) in [2] has high delay because of having long carry propagation chain. To overcome this, parallel adder methodologies have evolved among which CLA is quite popular. The carry-generation circuit of 4-bit CLA in [4] is exactly similar to the carry-generation circuit of conventional 4-bit CLA in Fig. 2. Hence, this research only displayed a comparison of the proposed 4-bit CLA with the conventional design. Since the proposed 4bit design shows better performance, the effect of using it as a base to build higher order adder (16-bit, 32-bit, 64-bit, etc.) will bring about extensive performance enhancement.

CONCLUSION
A 4-bit CLA process in static CMOS logic has been developed in this research work. The CLA process generates carry-out terms without utilizing the P i and G i terms which are used in conventional design. The feasibility and effectiveness of the proposed design have been evaluated by comparing it with the conventional design of 4-bit CLA adder in static CMOS logic. The improvements were done in average power, propagation delay and PDP are 4.84 %, 34.53 % and 37.696 % respectively which makes the proposed design quite effective in modern high-performance integrated circuit design.