A CMOS Based Self Repair Fault Tolerant Adder for Low Power Biomedical Systems

In recent years, processor cores for biomedical signal processing play a vital role for portable devices. The processing unit in the device performs acquisition, feature detection and decision making. The computing algorithm comprises of adders, multipliers, registers, and buffers. Adders are the basic building blocks of multipliers, filters and feature extraction units. Discrete Wavelet transform is one of the important methods used to filter and extract information from biomedical signals which can be one dimensional or multidimensional. Fault avoidance and tolerance have been approached in past for achieving efficient and reliable system. These prevention and tolerance through redundancy can avoid failures. Different redundancies are adopted through information processing, hardware, software and time redundancy. Fault tolerant circuits can improve the efficiency of the biomedical system like DWT cores. The main objective is to design a reconfigurable full adder with self-checking and self-repairing faults for the DWT architecture. This paper presents a proposed FPGA based fault detection and repairing circuit including the fault location. The existing method uses an on-chip based adder which is complex and is less efficient and not suitable for reprogramming. In the proposed design, DWT architecture was designed using the reconfigurable multiplier, fault tolerant adder and Flip Flop. The implementation was carried out in Quartus Tool for different FPGA kits designed with 90nm and 65nm CMOS technology. Parameters like LUT, power dissipation and delay are investigated. The proposed approach is suitable for portable devices.


Introduction
Biomedical systems in recent years, uses integrated circuit (IC) designed using LSI, VLSI and ULSI technology. Today a greater number of transistors are combined to form blocks to manage the space in the chip. These blocks form the basic building units of Application Specific Integrated Circuits (ASICs) for biomedical application. In past microprocessors and microcontrollers were used in the design of medical devices. They were the part of data acquisition unit to acquire biomedical signals, filter it and process the same. FPGA based implementation made the biomedical devices compact and reliable. Compared to processors FPGA are costlier but more efficient. In ASICs using gate level implementation, the area of the circuit can be reduced. The power consumption can be reduced using CMOS devices instead of BJT and FET devices. Gates forms the basic element of any circuit. Different types of arithmetic and logic operations are implemented using gates. The adder forms the basic building blocks of several operations like subtraction, division, multiplication, signal processing units and compression units. The multiplier circuit determines the critical delay occurring in the system. Since adders are building blocks of multipliers, faster adders can reduce the delay. Several adders like carry skip adder, ripple carry adder, carry save adder etc. are used for multiplier implementation. The blocks of digital signal processing algorithms need adders has functional blocks. All real time applications handling signals with varying frequencies use blocks like Fast Fourier Transform, decimation etc. which needs adders. Results can be obtained in faster rate if the complex adder block is modified structural wise or carry propagation (Dhanasekar et al, 2019

Background Methodology
Fault avoidance and tolerance have been approached in past for achieving efficient and reliable system. These prevention and tolerance through redundancy can avoid failures. Different redundancy is adopted through information processing, hardware, software and time redundancy. The failures normally happen through the unidentified/identifiable faults. In complex megasystems the avoidance problem is maximum and it is difficult to undergo. Hence the module reliability of the system can be improved by fault-tolerance, validation and error correction. Validation measures the system performance during construction process to reduce errors. Error correction is one of the taxonomy of the approaches to reduce failures and latent errors. These can be avoided or tolerated through redundancy. This system fails if the latent error prevents maintenance. Effective error processing tries to correct the error after it becomes effective and by masking. Hardware redundancy approach the two effective redundancies is used by Sonal gupta et al (2018). In parallel computation process triple modular redundancy and double modular redundancy are used for fault identification. The tradeoff is additional component requirements which increases the area. The software redundancy is the most important challenge facing the problem in the area of fault-tolerance. The software errors are different from the hardware errors because in hardware error it will not recur after they discovered and corrected. Hence the software redundancy will correct the programming error to create new error. This redundancy has more complex and immature art than hardware design. The time redundancy is performed for checking the results several times at the same module instead of doing in parallel. This redundancy does not require an extra hardware for performing hardware information's. Figure 1 shows the self checking adder designed by Vasudevan et al (2007). It contains elements like multiplexer, gates, checker and adders. The faults are detected online. Combining the outputs the faults are detected. The design fails to detect the location of adders under fault. The other major problem is fault propagation.   The previous methods are based on SOC based design but the proposed design is reconfigurable. In existing method the self-repairing adder used two identical adders which is a hardware redundancy method. The architecture is large and fails if both the adder fails (Muhammad Ali Akbar et al,). The method does not affect the performance but the area and power increases. The area overload can be avoided by using time redundancy but the penalty paid is the delay. Information redundancy needs circuits which should work error free. They are not compatible with memory circuits. The fault detection can be solved through addition program line when software redundancy is used. (Tsai, 1998)

On-Chip Methodology of Fault Tolerant Full Adder
The system of adders which repairs the single and double fault without disturbing the normal operation from both outputs is presented by Pankaj Kumar et al (2017). The carry select adder (CSA) compared to RCA, it has better computation time and minimum hardware cost. In this work the authors checked online and detected the faults using the sum and carry. In figure 4 shows the self checking adder structure where the XNOR does the detection operation of the faults. The fault is detected on comparison of (A'B'C + ABC'), (XNOR-1) and (XNOR-5) are made  The operations are managed by the control signals.
The previous methods are based on SOC based design but the proposed design is reconfigurable. In existing method the self-repairing adder used two identical adders which is a hardware redundancy method. The architecture is large and fails if both the adder fails (Muhammad Ali  Akbar et al,). The method does not affect the performance but the area and power increases. The area overload can be avoided by using time redundancy but the penalty paid is the delay. Information redundancy needs circuits which should work error free. They are not compatible with memory circuits. The fault detection can be solved through addition program line when software redundancy is used. (Tsai, 1998)

Proposed DWT Architecture with Adder
For Biomedical systems efficient methods for noise removal and information retrieval using Very Large Scale Integration technology is needed. The hardware design can be of ASIC or FPGA type. Most of biomedical signal and image processing application uses wavelet transform and its hardware implementation is required. Wavelet transform is implemented using filter banks. The structures used are direct form structure, lattice structure, poly-phase decomposition etc. The problems in the convolution-based DWT architecture like more area and processing elements are rectified using the Lifting-based architecture. The block diagram of the lifting scheme is shown in figure 6 and the different blocks in Figure 7. The architecture contains adders, multipliers and Flip-Flops.  The proposed lifting-based design and convolution based design has components like self repair fault tolerant adders, multipliers and delay elements. The paper did not address the design of memory elements, but left for future work. The use of memory is to store the intermediate data of predict and update blocks. The extra overhead on area is the only issue in the proposed architecture but the design has less computation and higher speed. The speed is tested with SPICE tools. The circuits can operate till 4GHz. The error checking adder is used in the convolution based architecture also but it occupies more memory for storing the filter co-efficients. But the proposed lifting based DWT architecture utilizes self repair fault tolerant adder in the predict and update block. The different blocks of the DWT architecture is shown in table 1.
The work is extended for the convolution based design. Direct convolution methods are inefficient. Due to the inefficiency in the structure, poly-phase structure design is chosen. Here the filter coefficients are splited as even and odd samples and convolved. DWT architecture for biomedical systems should be efficient and in this work the efficiency is improved by using the adder in the computational unit which comprises the adder, multiplier, multiplexer and D-Flip Flop. Since the poly-phase structure (Figure 8.(a)) uses less number of multipliers when compared to other structures the additional fault detecting and correcting features of the adder will enhance its performance. These structures are suitable for biomedical signal processing. The 3 tap-broadcast FIR filter with lattice structure is shown in figure 8 (b).

Multipliers
The computational unit of processor core in DWT architecture contains the multiplier as the main block. The block along with adder forms the MAC unit. For convolution based DWT the MAC is the main unit while for lifting scheme shift registers and adders forms the basic units. The proposed DWT architecture uses the Braun Multiplier, the fault tolerant adder, D flip flop and MAC unit. Conventional Array multiplier is easy to design but occupies more area and power. The effective multiplication through parallel architecture can be achieved using the Braun multiplier. In this work the optimization in multiplier and flip-flops are less addressed. The optimization is taken forward for the future research, where gating principles are to be designed to reduce power further. The rest of the blocks are the coefficient units which are to be implemented using RAM cell. In the proposed FPGA based DWT structure filter coefficients are programmable while the adder, multiplier and other blocks follow the proposed design. These units also can be reconfigurable for future enhancements if required. The proposed reconfigurable architecture can be extended for IDWT processing architecture design also which comprises of multiplexer, multiplier, Flip flop, adder and sampling blocks The up samplers are also implemented using flip-flops. For reconstruction of input coefficients of both HPF and LPF are selected and using multiplier and adder.
The area overload increases as additional transistors are used for error checking and correcting.

Result and Discussion
The existing methods are mainly implemented on system on chip architectures using various devices. The results are given below. These results are based on device. But in this work the results are computed using reconfigurable architecture which is different from the technology used in existing methods in ASIC ICs. The implementation of an adder, multiplier, multiplexer, flip flop, blocks are done using Quartus software for different kits. For different technologies (Table.2 and Table.3), the adders are evaluated using the metrics like LUT, power and delay are measured and tabulated. Similarly the Table 4, 5 and 6 shows the performance of the Multiplier, D Flip flop and multiplexer for different technologies. The CMOS technology 90nm and 65nm were used for implementation.  The implementation in various technologies shows the difference in power consumption of different blocks of the DWT architectures. There are variations in power consumption, area and delay in different blocks. The implementation using Cyclone III kit i.e. 65nm CMOS technology has optimized performance when compared to other devices for different blocks.

Conclusions
The paper presents the FPGA implementation of the faster, less area self-checking and self-repairing fault tolerant adder circuit for the DWT architecture. The circuit provides low power consumption. The full adder design is suitable for Biomedical portable devices since it is fault tolerant and provide high efficiency. The adders for checking/repairing, multiplier, D-FF, and multiplexer of the DWT architecture were designed and implementation was done in Cyclone and Stratix kit using Quartus Tool.
The performance is evaluated for different CMOS technology. In future the work will be extended in optimizing the DWT structure and blocks like flip-flop and samplers.