



IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

Volume: 12 Issue: III Month of publication: March 2024 DOI: https://doi.org/10.22214/ijraset.2024.58844

www.ijraset.com

Call: 🕥 08813907089 🔰 E-mail ID: ijraset@gmail.com



# Low Power Design of Efficient Wireless Sensor Node

Vighneshkumar Uttarkar<sup>1</sup>, Poornima G<sup>2</sup> Department of Electronics and Communication, BMS College of Engineering

Abstract: The escalating demand for rapid, efficient System on Chip (SoC) designs has intensified the need for low-power solutions, particularly in Wireless Sensor Nodes reliant on battery power. This research centers on leveraging clock gating to curtail dynamic power usage in an 8-bit Arithmetic and Logic Unit (ALU). Employing the Hans-Carlson Adder for addition and a 4x4 approximation multiplier, the ALU processes data from a humidity sensor module. Simulations using Xilinx Vivado reveal a substantial 65–73% reduction in power consumption at higher frequencies. This ALU, adept at minimizing power, emerges as a viable processing element for energy-sensitive sensor node applications, ensuring prolonged operational longevity. Keywords: Hans-Carlson Adder (HCA), Parallel Prefix Adder (PPA), Latch based Clock gate, Power reduction.

### I. INTRODUCTION

Applications for wireless sensor networks (WSN) include monitoring of medical conditions, industrial inspection, military surveillance, and environmental parameter sensing. Sensors, radio communication, and a microprocessor powered by a small power source, such as a storage battery, are the basic components of WSN nodes. To increase the lifespan of the sensor node, the total amount of energy needed for the node should be kept to a minimum because radio communication uses a lot of it. Between 100 and 3000 is the range of the communication-to-computation energy price ratio.

The low power ALU design for the processing component used in the wireless sensor node is included in the scope. Clock gating technique is used to cut down on dynamic power use, which then cuts down on switching power. To further reduce computation time, a parallel prefix adder similar to the Han-Carlson adder is included. Additionally, the multiplier helps reduce the amount of space needed. By doing so, the extra space used by the clock gating circuits will be balanced. This can use in many application such as in the radio communication system.

Radio modules on wireless sensor nodes allow for communication. If two nodes are able to send and receive data back and forth, then they are directly connected. A mathematical model that measures the direct connectedness between sensor nodes is called a sensor communication model, also sometimes referred to as a transmission model. Preferably in the applications such as key distribution, location estimation, processor sensor web, sensor network query, wireless power line sensor, and telemetry. Methodology of the project is shown in below figure 1 proposed designed blocks.



Fig. 1 Proposed Designed Block



A Clock Gated Arithmetic and Logic Unit are designed with a low power. The ultimate technique to get desired gain in power is clock-gating. A Clock Gated Arithmetic and Logic Unit are designed with a low power. The ultimate technique to get desired gain in power is clock-gating. Han-Carlson Adder: It is a parallel prefix adder. Parallel Prefix adder is utilized as it is one of the high speed adders.

4 X 4 approximates Multiplier: The two novel 4x4 approximate multiplier designs with minimal power requirements and competitive error performance. The multipliers uses shift and add algorithm and this 4X4 multiplier uses 16 AND gates, 4 half adders & 8 full adders so total 12 adders.

Clock gating circuit: The ultimate technique to get desired gain in power. Clock gating refers to activating the clocks in a logic block only when there is work to be done. The clock is gated using a Latch-based clock gating circuit that generates an enable signal in line with the opcodes.

### II. LITERATURE SURVEY

- 1) Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption This paper introduces Parallel Prefix Adders (PPA) as a faster alternative to ripple carry adders (RCA). The study compares four 8-bit PPAs (LFA, KSA, BKA, HCA) against Ladner-Fischer, Kogge-Stone, Bent-Kang, Han-Carlson, and Parallel Prefix adders, assessing their performance in terms of area, delay, and power for comprehensive analysis and evaluation. The findings show that, in terms of area, latency, and power, the suggested Han-Carlson adder Parallel-Prefix Adder is superior to the other three types of Parallel-Prefix adders.
- 2) Approximate Recursive Multipliers Using Low Power Building Blocks This paper explores approximate computing for power-efficient binary multipliers, presenting two 4x4 approximate multipliers with carry manipulation. Composing 8x8 designs with varied error-performance trade-offs, they outperform state-of-the-art designs, reducing power consumption and silicon area by 46% while maintaining 81% higher accuracy.
- 3) Simulation of Enhanced Pulse Triggered Flip Flop with High Performance Application In this paper, Flip-flops, crucial in System on Chip (SOC) designs, contributes significantly to power consumption. This paper proposes an Enhanced Pulse Triggered Flip Flop (D-FF) for applications like 4-bit PIPO, 4-bit SISO, and 3-bit Asynchronous ripple counter, aiming to minimize power dissipation. Performance evaluations are conducted using TSMC 180nm technology.
- 4) Development of Processor Engine for FPGA Based Clock Gating and Performing Power Analysis In this paper, In addressing challenges posed by Moore's Law scaling for high-performance, low-power processors, this paper focuses on reducing dynamic power dissipation through latch-free clock gating. Applied to a 32-bit Arithmetic and Logic unit, the functional unit-enabled approach demonstrates an average power reduction of 65.35% and 52.24% for lower and higher frequencies, respectively.
- 5) Clock Gating A Power Optimizing Technique for VLSI Circuits In this paper, Clock gating, a power-saving method utilized in the Pentium 4 and subsequent processors, involves activating clocks in a logic block only when needed to conserve power. This paper explores various clock gating techniques for power optimization in VLSI circuits at the RTL level, addressing challenges associated with applying these techniques effectively.

### III. METHODOLOGY

Design low-power ALU employing with clock gating. The Arithmetic and Logic Unit, a collection of registers, a control unit, and other components make up the bulk of the basic processor design.

### A. 4 X 4 Approximate Multiplier

In block of ALU the one of the block is multiplier Fig. 1. The approximate 4 X 4 designs with low power consumption and competitive error performance. Carry truncation and compensating algorithms are used to calculate the output. To create 8 x 8, these patterns are combined with an accurate and OR-based 4 X 4 multiplier. The circuit in Fig. 2 serves as the foundation for further simplification in the suggested armature.

The 4 x 4 approximate multiplier, while two HAs are used to compute the three most significant outputs, OR gates are used to approximate the five least significant outputs. the initial architectural blueprint for the suggested N2 concept.

The first step is to replace the half-adder's XOR gate in column 5 with an OR gate.

Y[5] = a3b2 + a2b3



The carry of the same half adder is:

### Final approximation:

 $C56_1 = a3b2 \cdot a2b3$ 



Column 7 Column 6 Column 5 Column 4 Column 3 Column 2 Column 1 Column 0



The final design, denoted as N2, is displayed in Figure3 Comparing this very straightforward design to the OR-based version depicted in Fig. 3. Designated N2, There are just two additional AND gates when comparing this extremely simple design to the OR-based version. Higher order multipliers should use the recommended design because of its notably improved performance. The design shows two inputs each has 4 bit and one multiplied output with 8 bit. The 4 X 4 approximate multiplier performed by the N2 method.

### B. Han-Carlson Adder

A favourable trade-off exists in the Han-Carlson adder between fan out, the number of logic levels, and the quantity of black cells. As a result, Han-Carlson adder may execute at a speedy rate while using less space and power. It is exciting to implement a hypothetical Han-Carlson adder in this way. In response to these factors, a Kogge-Stone portion of a parallel prefix adder had its final rows removed, creating a Han-Carlson speculative prefix-processing stage. Every binary adder uses a complete adder, which is a basic cell that adds three single bits. Its carry and sum formulas are

$$\label{eq:sum} \begin{split} SUM &= A \; XOR \; B \; XOR \; C \; and \\ COUT &= A.B \; + A.C \; + B.C \end{split}$$

The pseudo-code of the Kogge-Stone adder can be readily modified to create the Han-Carlson adder. Below is an illustration of the creation of an 8-bit Han-Carlson PPA.







Volume 12 Issue III Mar 2024- Available at www.ijraset.com

Han-Carlson trees excel in Kogge-Stone (K-S) and Brent-Kung adders due to their sparser variant nature. These adders introduce a novel technique, fusing carry bits on divide-by-2 bits before operating on the complement. This method combines odd and even carry bit signals to generate authentic carry bits.

The second, third, and fourth stages in the Han-Carlson adder mirror the Kogge-Stone structure. Utilizing short-length wires reduces logic multiplicity, diminishing logic complexity compared to Kogge-Stone adders. Although additional stages are introduced for the carry-merge path, the benefits include lower logic complexity.

The pseudo-code of Kogge-Stone adders can be adapted to craft the Han-Carlson adder, showcasing a design that optimizes performance through sparsity while minimizing logic complexity in specific applications. Below is an illustration fig 4 of how the 8-bit Han-Carlson PPA was built.



Fig. 5 Hans-Carlson Adder

Creating and Spreading Carry signals: The mathematical equations below are used to identify the past carry, generate the next bit, and produce the carry bit:

 $\begin{aligned} Gi &= Ai \ . Bi \\ Pi &= Ai \bigoplus Bi \end{aligned}$  All carry signals are computed:  $Gi: \ j &= Gi:k + Pi:k \ . \ Gk - 1:j \\ Pi: &= Pi:k \ . \ Pk - 1:j \end{aligned}$  Evaluation of the total:  $Si &= Pi \bigoplus Gi - 1:0 \end{aligned}$ 

The Parallel prefix adder that is the Hans Carlson Adder, HC adder is 8bit adder which is designed in Brent-Kung and Kogge-Stone. The sparse K.S. adder is a pre processing part of the H.C. adder. Carry generation stage and the Brent kung adder is a post processing stage of the Hans-Carlson adder.

### C. Clock Gating

Clock gating is a widely used technique in recent SoC designs that effectively reduces excess dynamic power consumption by reversing the clock for a particular block from an active to an inactive state when the block is not in use. In SoC architectures, clock gating can happen on two separate levels:

1) Clock gating at RTL stops clocks for unused blocks, saving dynamic power.

2) Basic clock gating uses logical "AND" to selectively disable block clocks.

3) Synthesis tools identify FF groups with shared enable signals for selective clock control.

In order to do this, the enable signal needs to be constant during the rising or active edge of the clock signal.



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 12 Issue III Mar 2024- Available at www.ijraset.com



Fig. 6 Latch Based Clock Gating

Among the frequent problems encountered when implementing clock gating are,

- The clock waveform should only be turned on or off by the clock gating circuit.
- As with any other violations, the set-up and hold time violations introduced by the clock gating could be addressed during the physical design process (backend design's Timing Closure phase).
- Buffering in the data path close to the endpoint or clock skewing are two techniques that can be employed as a fix for hold violations.
- The clock gating should be examined by the designer to see if it is attempting to divide the original clock. If so, the clock gating signal's phase needs to be carefully considered by the designer.
- Glitches in the design could be observed if suitable clock gating cannot be implemented.
- More functional issues would arise if the gating signal was not effectively managed.

### D. Humidity Sensor Module

The humidity sensors are electronic devices that are used in air, soil, or small spaces to measure and report on the temperature and moisture content of the surrounding environment. The concentration of water vapour in the air is indicated by humidity measurements.



Fig. 7 Humidity Sensor Module

### E. Memory Based Parallel In Parallel Out Flip Flop

Output data of the proposed PIPO design is stored. The shift to mobile computing propels low-power research, vital as desktops yield to compact systems, demanding innovative solutions for extended battery life. All data bits for parallel in parallel out shift registers appear on the parallel outputs just after they are simultaneously entered. An 8-bit parallel in parallel out shift register is built into the following circuit, which is depicted in Fig. 8.



Fig. 8 8-bit Parallel In parallel Out (PIPO)

### F. Barrel Shifter

The barrel shifter designs are covered in this section. Following a description of the fundamental shifter and rotator designs, discussion follows on Mux-based data-reversal barrel shifters, mask-based data-reversal barrel shifters, mask-based two's complement barrel shifters, and mask-based one's complement barrel shifters.



### Shifters and Rotators

Log2(n) stages are employed in an n-bit logarithmic barrel shifter [1, 2]. A distinct stage of the shifter is controlled by each bit of the shift amount, B. If bk = 1, the data entering the stage being controlled by bk is shifted by 2k bits; an 8-bit logical right shifter that uses three stages and shifts bits in increments of 4 bits, 2 bits, and 1. The design can be made more efficient by substituting a 2-input gate with the data bit and bk as inputs for each multiplexor that has '0' as one of its inputs. there is no longer a need for connecting lines carrying zeros because the rotator routes all of the input bits to the output. Instead, connection lines are added to allow the 2k low order data bits to be sent to the 2k high order multiplexors at the stage that is under the control of bk.

There is no difference in the theoretical area or latency when switching from a non-optimized shifter to a rotator. However, the longer rotator connecting links have the potential to increase both area and delay.



A row of n multiplexors can be added before and after a right shifter to enable it to do left shift operations as well. These multiplexors reverse the data going into and coming out of the right shifter during a left shift operation. The data entering and leaving the shifter are not altered when a right shift operation is carried out. Figure 8 displays an 8-bit data-reversal logical shifter.

### A. Hans-Carlson Adder

### IV. IMPLEMENTATION DESIGN

The Han-Carlson tree performance is found in Brent-Kung and Kogge-Stone. The sparse K.S. adder is a part of the H.C. adder. In order to calculate sum and carry bits, these adders utilize a unique technique.

Evaluation of the total sum:







The schematic design for the Parallel prefix adder that is the Hans Carlson Adder, HC adder is 8bit adder which is designed in Brent-Kung and Kogge-Stone. The sparse K.S. adder is a pre processing part of the H.C. adder.

| a[7] | a[6] | a[5] | a[4] | a[3] | a[2] | a[1] | a[0]   |
|------|------|------|------|------|------|------|--------|
| 0    | 0    | 0    | 0    | 1    | 1    | 0    | 0      |
| +    |      |      |      |      |      |      |        |
| b[7] |      |      |      |      |      |      | b[0]   |
| 0    | 0    | 0    | 0    | 0    | 0    | 1    | 0      |
|      |      |      |      |      |      |      |        |
| 0    | 0    | 0    | 0    | 1    | 1    | 1    | 0      |
| +    |      |      |      |      |      | Cir  | n -> 1 |
|      |      |      |      |      |      |      |        |
| 0    | 0    | 0    | 0    | 1    | 1    | 1    | 1      |
| s[7] | s[6] | s[5] | s[4] | s[3] | s[2] | s[1] | s[0]   |

Carry generation stage and the Brent kung adder is a post processing stage of the Hans-Carlson adder. The Fig. 10 is design as per that.

### B. Barrel Shifter

A to be the input operand, B to be the amount of the shift or rotation, and Y to be the outcome of the shift or rotation. When n is an integer power of two, we define A as being an n-bit value. As a result, B is a log2 (n)-bit integer that ranges from 0 to n 1. Table 1 provides an illustration. The bit vector for A in this table 1 is represented as a7 a6 a5 a4 a3 a2 a1 a0, and B represents the amount of shift/rotation in bits. According to this table 1.

| A = A7 A6 A5 A4 A3 A         | 2 A1 A0 AND $B = 3$                       |
|------------------------------|-------------------------------------------|
| Operation                    | Y                                         |
| 3-bit shift right logical    | $0\ 0\ 0\ a_7\ a_6\ a_5\ a_4\ a_3$        |
| 3-bit shift right arithmetic | $a_7 a_7 a_7 a_7 a_6 a_5 a_4 a_3$         |
| 3-bit rotate right           | $a_2 a_1 a_0 a_7 a_6 a_5 a_4 a_3$         |
| 3-bit shift left logical     | $a_4 \ a_3 \ a_2 \ a_1 \ a_0 \ 0 \ 0 \ 0$ |
| 3-bit shift left arithmetic  | $a_7 a_3 a_2 a_1 a_0 0 0 0$               |
| 3-bit rotate left            | $a_4 a_3 a_2 a_1 a_0 a_7 a_6 a_5$         |

# TABLE I SHIFT AND ROTATE EXAMPLES FOR STATE STATE STATE

- 1) B-bit shifting the result of a right logical operation has its upper B bits set to zero and undergoes a B-bit right shift.
- 2) B-bit shifting the upper B bits of the result are set to an1, which corresponds to the sign bit of A, via the right arithmetic operation, which conducts a B-bit right shift.
- 3) The upper B bits of the result are assigned to the lower B bits of A by a B-bit rotate right operation, which also involves a B-bit right shift.
- 4) The bottom B bits of the result are set to zero by a B-bit shift left logical operation, which conducts a B-bit left shift.
- 5) The lowest B bits of the result are set to zeros when performing a B-bit shift left arithmetic operation. The result's sign bit is set to a(n-1)
- 6) The lower B bits of the result are assigned to the upper B bits of A by performing a B-bit left shift during a B-bit rotate left operation.

### C. Control Unit

In Computing, a control unit is combinational digital circuit that performs gate operations such as OR, NOR, AND, NAND, XOR, XNOR AND NOT gates. The control unit block is merged with the ALU block, there is no separate block for control unit it comes under ALU block.



### D. Memory Based Parallel In Parallel Out Flip Flop

A form of sequential logic circuit used primarily for the storage of digital data is shift registers. They are a collection of flip-flops linked together in a chain, with the output from one flip-flop serving as the input for the one after that. The majority of registers lack a distinctive internal state sequence. The total amount of bits (1 or 0) of digital data that a register can store is known as its storage capacity. In a shift register, each stage (flip flop) corresponds to one bit of storage space. All data bits for parallel in parallel out shift registers appear on the parallel outputs just after they are simultaneously entered. A 8-bit parallel in parallel out shift register is built into the following circuit, which is depicted.

In Computing, a control unit is combinational digital circuit that performs gate operations such as OR, NOR, AND, NAND, XOR, XNOR AND NOT gates. The control unit block is merged with the ALU block, there is no separate block for control unit it comes under ALU block.

### E. 4 X 4 Approximate Multiplier

The approximate  $4 \times 4$  designs with low power consumption and competitive error performance. Carry truncation and compensating algorithms are used to calculate the output. To create  $8 \times 8$ , these patterns are combined with an accurate and OR-based  $4 \times 4$  multiplier.

A similar conclusion may be made about power dissipation, where N2 clearly has an advantage. N2 is the quickest design in terms of speed, although N1 is also among the fastest.



Fig. 11 Schematic Diagram for 4 X 4 Approximate Multiplier

The implemented design designated N2. There are just two additional AND gates when comparing this extremely simple design to the OR-based version. Higher order multipliers should use the recommended design because of its notably improved performance.



The design shows two inputs each has 4 bit and one multiplied output with 8 bit. The 4 X 4 approximate multiplier performed by the N2 method as explained.

### F. Clock Gating

Clock gating is a widely used technique in recent SoC designs that effectively reduces excess dynamic power consumption by reversing the clock for a particular block from an active to an inactive state when the block is not in use. In SoC architectures, clock gating can happen on two separate levels:

- Clock gating is integrated into the SoC as a part of RTL capability at the RTL level. A block's clocks are stopped while it's not in use, hence disabling all of its functions. By suppressing those, you can save a lot of dynamic power because fewer logical blocks will be switching all the time.
- 2) The most basic and widely used kind of clock gating is the use of the logical "AND" function in control signals to selectively disable the clock for particular blocks.
- *3)* The tools used during synthesis identify groups of FFs that will share a single enable control signal, allowing the clocks for those groups of flops to be selectively turned off.

In order to do this, the enable signal needs to be constant during the rising or active edge of the clock signal.



Fig. 12 Schematic Diagram for Latch based clock gate

The design is implemented as per fig 12. The design incorporates latch-free clock gating along with a level-sensitive latch. The enable signal status is caught and stored until the entire clock pulse appears.

### G. Humidity Sensor Module

It holds the features of relative humidity sensor. To point calibrated with capacitor type sensor, excellent performance. Frequency output type can be easily integrated with user application system. The power consumption is very low with that no extra components needed.



Fig. 13 Schematic design for the Humidity Sensor Module



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 12 Issue III Mar 2024- Available at www.ijraset.com

### V. RESULTS

The 8 bit ALU is designed using Verilog HDL with and without clock gating techniques, and simulations of the design are run using Xilinx Vivado 18.2. Figure 14. Shows the simulation design and waveform.

The figure 14 simulated design of the ALU block. The ALU contains other blocks such as Hans-Carlson Adder, 4 X 4 approximate multiplier, logic unit, division, subtraction to these blocks added with latch based clock gating to which sensor is the input to these the ALU block to store the data uses the PIPO Flip flop.



Fig. 14 ALU Simulated Design

|                 | St of M M |                                                                                                                 |        |       |        |       |        |        |       |       | 7 0    |     |
|-----------------|-----------|-----------------------------------------------------------------------------------------------------------------|--------|-------|--------|-------|--------|--------|-------|-------|--------|-----|
| 4 4 4           | A T N N   |                                                                                                                 |        |       |        |       |        |        |       |       |        |     |
|                 |           | 10,901340 ws                                                                                                    |        |       |        |       |        |        |       |       |        |     |
| Name            | Value     | 15 us                                                                                                           | 120 us | 25 ur | 20. 44 | 25 us | [40 us | 45 145 | 60 us | 15 ut | 60 148 | 141 |
| 🖬 dk            | 1         | A DESCRIPTION OF THE OWNER OF THE |        |       |        |       |        |        |       |       |        |     |
| > Wiselect[2:0] | 0         | 1                                                                                                               |        | z     | τ.     | 1     |        |        |       | 8     | X 6    |     |
| > WIN1[7:0]     | 04        |                                                                                                                 |        |       |        | 04    |        |        |       |       |        |     |
| > 1112[7:0]     | 02        |                                                                                                                 |        |       |        | 62    |        |        |       |       |        |     |
| > WOUT_Y[7:0]   | 0         | 06                                                                                                              | ×      | 08    | X      | 10    |        | 09     | - X - | 02    |        |     |
| > W MEMORY[7:0] | 10        | 04                                                                                                              |        | 08    | τ.     | D:    | ι χ    | ós.    | ÷χ    | 02    |        |     |
| > W OUT147:0    | 06        |                                                                                                                 |        |       |        | 04    |        |        |       |       |        |     |
| a outie         | 0         |                                                                                                                 |        |       |        |       |        |        |       |       |        |     |
| > W OUT2[7:0]   | 08        |                                                                                                                 |        |       |        | 04    |        |        |       |       |        |     |
| > W OUT3[7:0]   |           |                                                                                                                 |        |       |        | th    |        |        |       |       |        |     |
| > M OUT4[7:0]   | 08        |                                                                                                                 |        |       |        | 68    |        |        |       |       |        |     |
| > W OUTSA(3:0)  | 2         |                                                                                                                 |        |       |        | :     |        |        |       |       |        |     |
| > WOUTSB(3.0)   |           |                                                                                                                 |        |       |        |       |        |        |       |       |        |     |
| > ¥ st(7.0]     |           | (Care 104)                                                                                                      |        | 08    | ÷χ.    | 10    | tχ     | 0e     |       | 02    |        |     |
| > ¥ load[7:0]   | 10        |                                                                                                                 |        |       |        | th.   |        |        |       |       |        |     |
| > 10 w(7:0)     | tb.:      | 04                                                                                                              | X      | 08    | X      | 6     |        | 08     | - x - | 02    |        |     |
|                 |           |                                                                                                                 |        |       |        |       |        |        |       |       |        |     |

Fig. 15 ALU Simulated Waveform

The waveform ALU is simulated with respected to implementation design. Here in the different waveform output of the each block brought out for our verification. The ALU's output is stored in memory. The design is 8 bit both input and output.

1) Without a clock gating system, perform power analysis

The Xilinx Power Analyzer is used to perform power analysis. All ALU units receive a clock input, and the inputs are sampled at each positive edge of the clock signal. The results of the power analysis are tabulated in Table 2.

The table demonstrates that clock power and IO power make up the majority of dynamic power. The power consumed by the clock signal is known as clock power, and the power created by the design block's IO signals is known as IO power.



Even though only one operation is carried out at any given time in an ALU without clock gating, clock signals and IO signals are loaded into all design blocks. This increases the use of needless power. TABLE 2

|               |                    | W                    | тноит Сі              | .OCK GATI      | NG                         |                       |                      |
|---------------|--------------------|----------------------|-----------------------|----------------|----------------------------|-----------------------|----------------------|
| Freq<br>(Khz) | Clk<br>pwr<br>(mW) | Logic<br>pwr<br>(mW) | Signal<br>pwr<br>(mW) | IO pwr<br>(mW) | Dynami<br>c<br>Pwr<br>(mW) | Static<br>Pwr<br>(mW) | Total<br>pwr<br>(mW) |
| 100           | 0.001              | 0.001                | 0.001                 | 0.018          | 0.019                      | 0.207                 | 0.225                |
| 200           | 0.001              | 0.001                | 0.001                 | 0.001          | 0.001                      | 0.206                 | 0.219                |
| 500           | 0.001              | 0.001                | 0.001                 | 0.001          | 0.001                      | 0.206                 | 0.207                |
| 1000          | 0.001              | 0.001                | 0.001                 | 0.002          | 0.002                      | 0.206                 | 0.208                |
| 5000          | 0.001              | 0.001                | 0.001                 | 0.009          | 0.009                      | 0.207                 | 0.216                |
| 10000         | 0.001              | 0.001                | 0.001                 | 0.018          | 0.019                      | 0.207                 | 0.225                |

2) Using the Functional Unit Enabled Clock gating technique, perform power analysis.

The clock power is greatly decreased by applying a gated clock to each individual functional unit of an ALU. The table displays the amount of power used for various clock frequencies. The clock signal and IO signal will only be fed onto functional units like adders and multipliers in this clock gating system when a certain unit is chosen to operate through an opcode. As a result, the table 3 depiction of a higher power drop is accurate.

|        |       |       | 17      |                          |                     |        |       |
|--------|-------|-------|---------|--------------------------|---------------------|--------|-------|
|        |       |       | WITH CL | OCK GATIN                | 1G                  |        |       |
| Frog   | Clk   | Logic | Signal  | IO pur                   | Dunamia             | Static | Total |
| (Khz)  | Pwr   | pwr   | pwr     | $(\mathbf{m}\mathbf{W})$ | Dynamic<br>Dwr (mW) | Pwr    | pwr   |
| (KIIZ) | (mW)  | (mW)  | (mW)    | (111 VV)                 | r wr (mw)           | (mW)   | (mW)  |
| 100    | 0.001 | 0.001 | 0.001   | 0.001                    | 0.0012              | 0.060  | 0.061 |
| 200    | 0.001 | 0.001 | 0.001   | 0.001                    | 0.0009              | 0.060  | 0.061 |
| 500    | 0.001 | 0.001 | 0.001   | 0.001                    | 0.0016              | 0.060  | 0.061 |
| 1000   | 0.001 | 0.001 | 0.001   | 0.002                    | 0.0021              | 0.060  | 0.062 |
| 5000   | 0.001 | 0.001 | 0.001   | 0.009                    | 0.0134              | 0.060  | 0.070 |
| 10000  | 0.001 | 0.001 | 0.001   | 0.018                    | 0.0102              | 0.060  | 0.080 |

The formula shown below can be used to compute the Power Reduction Percentage.

Power Reduction % = (Power without CG-Power with CG) / (Power without CG) x 100

| recauction /o       |                             |                          | .) (I Ower milliout C |
|---------------------|-----------------------------|--------------------------|-----------------------|
|                     | TAE                         | BLE 4                    |                       |
|                     | Power Reduction             | ON IN PERCENTAG          | GE                    |
| Frequency in<br>KHz | Power without<br>Clock Gate | Power with<br>Clock Gate | Reduction %           |
| 100                 | 0.225                       | 0.061                    | 73%                   |
| 200                 | 0.219                       | 0.061                    | 72%                   |
| 500                 | 0.207                       | 0.061                    | 70%                   |
| 1000                | 0.208                       | 0.062                    | 71%                   |
| 5000                | 0.216                       | 0.070                    | 72%                   |
| 10000               | 0.225                       | 0.080                    | 65%                   |

The Table 4 indicates that with clock gating, the dynamic power dramatically decreases for higher frequencies. As clock and IO signals are not loaded excessively at higher frequencies, we can anticipate a 65–73% saving in power with functional unit enabled clock gating.



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 12 Issue III Mar 2024- Available at www.ijraset.com

### VI. CONCLUSION

8-bit ALU is given a clock gating system, waveforms are simulated, and power analysis is carried out using the Xilinx Vivado Power Analyzer 18.2 Suite. The ALU block with memory storage for the data output that is parallel in parallel out design with the sensor of humidity HH10D makes up a large portion of the full on chip design. For the design of a multiplier in an ALU, the 4x4 approximation is utilized. For the adder block with adding logic control unit and division with minimal area and low power consumption, Hans Carlson parallel prefix adder is employed. Without clock gating and Enabled Clock gating are the two distinct clock gating approaches that are used. For the latter technique, the power drop is more clearly seen.

The power reduction is less for lower frequencies, as we saw that for 100MHz and 200MHz, it ranges between 50 and 60%, however for higher frequencies, like 500MHz and 1GHz, it ranges between 65 and 73%, and the power reduction is large. This is due to the fact that as the frequency rises, the superfluous loading of clock and IO signals, which accounted for the majority of dynamic power consumption, is considerably reduced. The larger power decrease offsets the additional space needed for clock gating circuitry. Clock gated ALUs are recommended for wireless sensor applications where power conservation is a major concern due to the large power savings.

#### A. Future Scope

The main motivation for the proposed study is ALU power reduction. The processor element and this ALU can be combined to create a low power processor. When creating much more efficient processor architecture, speed of computations must also be taken into account in addition to power. By creating adders and multipliers that are significantly faster, the speed can be increased.

These clock gating circuits take up more space. The multiplier was created with low-space-consuming folded tree architecture in order to get around this. However, in order to accomplish this, the multiplier's speed has to be reduced. Therefore, this job can be improved by creating a quick multiplier that also uses less space.

#### REFERENCES

- [1] Sharath, M., Poornima, G. (2020). Design of Energy Efficient ALU Using Clock Gating for a Sensor Node. In: Auer, M., Ram B., K. (eds) Cyber-physical Systems and Digital Twins. REV2019 2019.
- [2] C. Walravens and W. Dehaene, "Low-Power Digital Signal Processor Architecture for Wireless Sensor Nodes," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 2,
- C. S. Jasmin and A. P. Mathew, "Power efficient comparator architecture for wireless sensor nodes," 2016 International Conference on Emerging Technological Trends (ICETT), Kollam, India, 2016,
- [4] A. Dhanunjaya Reddy, " Area Efficient Speculative Han-Carlson Adder", International Journal of Scientific Research in Science and Technology(IJSRST), Volume 3, Issue 7 2017.
- [5] E. Zacharelos, I. Nunziata, G. Saggese, A. G. M. Strollo and E. Napoli, "Approximate Recursive Multipliers Using Low Power Building Blocks," in IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 3, 2022.
- [6] D. -H. Le, N. Sugii, S. Kamohara, X. -T. Nguyen, K. Ishibashi and C. -K. Pham, "Design of a low-power fixed-point 16-bit digital signal processor using 65nm SOTB process," 2015 International Conference on IC Design & Technology (ICICDT), Leuven, Belgium, 2015.
- [7] J. Shinde and S. S. Salankar, "Clock gating A power optimizing technique for VLSI circuits," 2011 Annual IEEE India Conference, Hyderabad, India, 2011.
- [8] C., Bhaskar & Jathar, Vikas. (2016). Development of processor engine for FPGA based clock gating and performing power analysis.
- [9] E. Zacharelos, I. Nunziata, G. Saggese, A. G. M. Strollo and E. Napoli, "Approximate Recursive Multipliers Using Low Power Building Blocks," in IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 3 2022.
- [10] Dr. V. Sidharthan, M. Prasannakumar, "Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption", International Journal of Scientific Research in Science and Technology(JJSRST) Volume 4, Issue 5-2018.
- [11] Nehru, Kamla et al. "Design of 64-bit low power parallel prefix VLSI adder for high speed arithmetic circuits." 2012 International Conference on Computing, Communication and Applications (2012): 1-4.
- [12] Nguyen, Xuan-Thuan & Bui, Trong-Tu & Huynh Huu, Thuan & Pham, Cong-Kha & Le, Duc Hung. (2013). An ASIC Implementation of 16-bit Fixed-point Digital Signal Processor. Journal of Science and Technology (Special Issue). 51. 282-289.
- [13] R. Kulkarni and S. Y. Kulkarni, "Power analysis and comparison of clock gated techniques implemented on a 16-bit ALU," International Conference on Circuits, Communication, Control and Computing, Bangalore, India, 2014, pp. 416-420, doi:10.1109/CIMCA.2014.7057835.
- [14] G. Shrivastava and S. Singh, "Power Optimization of Sequential Circuit Based ALU Using Gated Clock & Pulse Enable Logic," 2014 International Conference on Computational Intelligence and Communication Networks, Bhopal, India, 2014, pp.1006-1010, doi: 10.1109/CICN.2014.212.
- [15] Singh, maneesh & kumar, rajeev. (2015). Ijesrt international journal of engineering sciences & research technology simulation of enhanced pulse triggered flip flop with high performance applications. International journal of engineering sciences & research technology. 489-493.
- [16] C., Bhaskar & Jathar, Vikas. (2016). Development of processor engine for FPGA based clock gating and performing power analysis. 1-5.10.1109/ICCUBEA.2016.7860113.
- [17] E. Zacharelos, I. Nunziata, G. Saggese, A. G. M. Strollo and E. Napoli,"Approximate Recursive Multipliers Using Low Power Building Blocks," in IEEE Transactions on Emerging Topics in Computing, vol. 10, no. 3, pp. 1315-1330, 1 July-Sept. 2022, doi: 10.1109/TETC.2022.3186240.



ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 12 Issue III Mar 2024- Available at www.ijraset.com

- [18] Dr. V. Sidharthan, M. Prasannakumar, "Comparative Analysis of Adders Parallel-Prefix Adder for Their Area, Delay and Power Consumption", International Journal of Scientific Research in Science and Technology(IJSRST), Print ISSN :2395-6011, Online ISSN : 2395-602X, Volume 4, Issue 5, pp.353-357, March-April-2018.
- [19] https://docs.xilinx.com/v/u/2018.2-English/ug893-vivado-ide
- [20] https://worldofdatasheets.com/











45.98



IMPACT FACTOR: 7.129







# INTERNATIONAL JOURNAL FOR RESEARCH

IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

Call : 08813907089 🕓 (24\*7 Support on Whatsapp)