Design of Bypassing Multipier with Different Adders

Multiplication is one of the essential operations in Digital Signal Processing (DSP) applications like Fast Fourier Transform (FFT), Digital filters etc. Multiplier is designed, considering the tradeoffs between low power and high speed. The bypassing multiplier is an improvement, over Braun multiplier which is one of the parallel array multiplier. The tradeoffs i.e. dynamic power and delay of the Bypassing multipliers can be reduced by using different adders. This paper presents a comparative study of 1-dimensional and 2-dimensional bypassing multipliers using different adders on basis of delay, area and power and for 4x4, 8x8 and 16x16 bits in FPGA Spartan – 3E using Xilinx 12.4 ISE and Synopsys respectively


Introduction
Low power design has become a great concern in VLSI design in recent years. There exists a strong necessity to investigate techniques for lowering energy dissipation of devices, such as Digital Signal Processors(DSP). Digital multipliers are essential arithmetic blocks for many DSP applications: filtering, convolution, DCT, Fourier Transform, etc. It consumes almost 2/3 of the total power. As result optimizing the multipliers for energy is important.
In static CMOS, transition activity dominates the total energy dissipation due to charging and discharging of capacitors. Many prior digital multipliers were aimed at transition or switch reductions to reduce power dissipation as well. A Leapfrog multiplier was proposed in [2] by using a hardware bypassing approach to avoid the redundant computations by disabling the adder units whose partial product becomes zero. Many low-power multiplier designs can be found in the literature. A straightforward approach is to design a multiplier that consumes less power [1] as well as less hardware. Another way is through modifying the structure of full adder at circuit level, depending on number of input [4]. Sung, Ciou, and Wang [3], proposed enhanced efficiency by disabling most adders by introducing 2-dimensional bypassing. As adder is the last stage multiplication, so selecting adder is important part of the design. Efforts have been focused on comparative study and improvement of adder and multiplier designs [11].
For the multiplication of two unsigned n-bit numbers, the multiplicand A = a n-1 a n-2 , . . . , a 0 and the multiplier B = b n-1 b n-2 , . . . , b 0 , the product P = P 2n-1 P 2n-2 , . . . , P 0 , can be represented as the following equation: (1) where i and j are the number of bits in the multiplier and multiplicand, respectively. The multiplier output, for example: P 1 is the result of addition of a 1 b 0 anda 0 b 1. Multiplication is as shown in Figure 1.The terms generated are called partial products and these partial products are then added to get the final multiplier output. In order to achieve high throughput in DSP applications, one such widely used parallel array multiplier is Braun multiplier. In the 4x4 Braun multiplier, as shown in Figure 2 [8],the multiplier array consists of 3 rows of carry-save adders (CSAs), in which each row contains 3 full adders (FAs) and 3 FAs in the last row construct a 3-bit ripple-carry adder. The advantage of the Braun array multiplier is its regular and compact structure, as it can be realized with parallel structure. The limitation is more hardware for braun multiplier that leads to more power consumption. The reduction of the power dissipation can also be achieved through the architectural modification via row bypassing or column bypassing or on the concept of the row and column bypassing, a low power 2-dimensional bypassing technique.

One Dimensional Bypassing
Consider a multiplier with multiplier bits b and multiplicand bits as a as shown in figure 3. A simple thought to improve performance is, as soon as bj was found to be zero, i. e., all partial products a i bj, 0≤ i ≤ n-1, are zero, complete row is bypassed to avoid triggering those adding units in the row. Hence, two multiplexers are required in the adding unit to realize the bypassing operation. If the j-thbit of bis 0, hence the corresponding partial product is 0. To eliminate the redundant signal transitions, disable the adders whose partial product is zero, while shifting and bypassing the partial product of the previous adder rows to the next row of adders. Thus, the outputs from the (j-1)-th row is fed to the (j+1)-th row of CSAs without affecting the multiplication result.
For a column-bypassing multiplier [1],as shown in figure  4., the addition operations in the (i+1) th column can be bypassed if the bit, a i , in the multiplicand is 0, i.e., all partial products a i b j , 0≤ j ≤ n-1, are zero. There are two advantages to this approach. First, it eliminates the extra correcting circuit. Secondly, the modified FA is simpler than that used in the row-bypassing multiplier. Each modified FA in the CSA array is only attached by two tri-state buffers and one 2-to-1 multiplexer.

Two Dimensional Bypassing
Prior designs considered reducing power either only with multiplicand or multiplier bits. Hence, to detect the bitwise nullity of the multiplicand in the vertical direction as well as the partial product in the horizontal direction in an array multiplier to remove the unnecessary operations taken place in the corresponding adding cells. The advantage of this design is less power consumption as less switching activity.
Consider a multiplier [3], with multiplier bits as X and the multiplicand bits as Y,a 2-dimensional bypassing which detects the bitwise nullity of the multiplicand bits, Y j 's, in addition to the state of the multiplier, X i 's. However, a conflict appears when one adding cell, AC ij , encounters a scenario that X i =Y j = 0.
For instance, assume i= 2, j = 1 and X 2 = Y 1 =0 in Figure. 5. If the carry out of the adding cell AC 12 is "1", it should be propagated to the carry in of AC 31 and the nits carry out. However, the carry bit will be lost if AC 31 is bypassed due to Y 1 = 0. Consequently, an error is occurred, since the carry out of AC 31 will be zero. We, thus, propose to include bypass logic (BL) in certain adding cells.
A. Adding Cell with and without bypass logic According to the illustrative example, a simple rule is :If and only if X i is not equal to "0" and the carry in is "1", then the adding cell, AC ij , cannot be bypassed. Hence, an adding cell with the bypass logic is proposed in Figure. 6. It is also represented by a gray box in Figure. 5. In order to save more bypass logic area, the adding cell AC 30 can be further simplified such that a single nand gate is used to replace the adding cell with bypass logic. It is obvious that not every adding cell needs th bypass logic. Given n = 4, it can be easily concluded that AC 31 is the only unit with the necessity of a bypass logic. By a similar induction, for any n x n multipliers, where n ≥ 4, all of the adding cells, AC ij , where n − 1 ≥ i≥ 3 and n − 3 ≥ j ≥ 1, must contain the bypass logic to execute the correct multiplication. In other words, when n = 4, there is only one adding cell which must contain the bypass logic. The other cells are without bypass logic are ,as the structure of figure 7. Therefore, the following rule is concluded. A total of (n − 3) 2 adding cells with bypass logic are required to constitute a 2-dimensional bypassing multiplier, ∀n >3.

Ripple Carry Adder
Ripple carry adder can be designed by cascading full adder in series i.e. carry from previous full adder is connected as input carry for the next stage. In the case of a RCA, the critical path is from the least significant input x0 or y0 to the last sum bit sn. The major limitation of Ripple carry adder is that as the bit length goes on increases, delay also increases. Though the delay is large, but the area requirement is comparatively less as compared to other adders.

Carry Look Ahead Adder
Carry look ahead logic uses the concepts of generating and propagating carries. The generation and propogation can be expressed as

Gi=Ai.Bi and Pi=Ai+Bi
After all, pi and gi only depend on Ai and Bi are the bits of A and B which are immediately available to us. They also only depend on c0, which is also available as input. We don't have to wait for carries to perform this computation. Through figure 8, the functionality of carry look ahead adder is understood.

220
Design of Bypassing Multipier with Different Adders

Carry Select Adder
The essence of the adder scheme is in the realization that we can add two numbers without waiting for the carry signal to be available. Simply, the numbers are added in two instances: one assuming Cin = 0 and the other assuming Cin = 1. The conditionally produced carry is selected through the multiplexer, as 2:1 multiplexer is seen in figure 9, which shows the block diagram of Carry select adder.

Experimental Results
The design of standard Row bypassing, Column bypassing and Row and Column Bypassing multiplier with different adders(Ripple carry, Carry look ahead, Carry select)for 4×4 ,8x8, 16x16 are simulated and synthesized using Verilog HDL. Table 1 shows comparison of different designs for maximum combinational delay, implemented in Xilinx ISE 12.4 targeting Spartan -3E (xc3s500e-4fg320) FPGA. Using Design Vision Synopsys, the designs are synthesized for dynamic power and area as listed in Table 2 and 3. Figure  10 and Figure 11 are the graphical representation of Table 1 and 2 respectively. Figure 12 shows dynamic power for different adders of 8x8 multiplier with different bypassing techniques.      The area representation of ripple carry, carry look ahead, and carry select adder for one-dimensional row bypassing and 2-dimensional is shown in Figure 13 and Figure 14 respectively.

Conclusions
In this paper we have presented the implementation of bypassing multipliers with different adders in the Xilinx targeting Spartan -3E (xc3s500e-4fg320) and Synopsys Design Vision. The use of different adders like carry look ahead adder (CLA), carry select adder apart from ripple carry adder(RCA) in the last stage helped to improve the efficiency in terms of delay and power.
The results show highest speed for column bypassing when implemented with carry select adder compared with other bypassing multipliers. For area and dynamic power, two dimensional bypassing with ripple carry adder consumes least power and area as compared to row bypassing and column bypassing. Thus, we can conclude column bypassing with a carry select adder gives least delay and two dimensional bypassing techniques have lowest dynamic power consumption.