# INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Volume: 2 Issue: V Month of publication: May 2014 DOI: www.ijraset.com Call: © 08813907089 E-mail ID: ijraset@gmail.com ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) ## Hierarchical Power and Activity Analysis of an Clock Gated ALU Roopa Kulkarni<sup>1</sup>, Dr. S.Y.Kulkarni<sup>2</sup> <sup>1</sup>Research Scholar, Department of Electronics & Communication Engg., M.S. Ramaiah Institute of Technology, Bangalore, Karnataka, India. Abstract—System-level or architectural-level power analysis is an important phase of SoC or NoC, to estimate and evaluate power at the early stage of the design phase. Power at the hierarchical level depends on the lower-level design modules, its signal rates, average activity, and also the fanout of the modules. In this paper, the power obtained by the low-level modules inturn reflecting the top-level is analysed for an 16-bit ALU. This paper also makes the analysis of the average fanout and average activity of the blocks of ALU. These parameters reflect the performance of the ALU. The fanout report, activty report is obtained for the devices or cells utilized for the ALU with and without using the clock gating technique. It is observed that activity rate with clock gating for the arithmetic module is higher than without the clock gating, while for the logic module it is less. Keywords— Fanout, average activity, clock gating, dynamic power, hierarchical power. #### I. INTRODUCTION In today's era there is a need for high performance and low power designs. Speed of operation requires higher frequency which increases the power dissipation. Power management is a major criteria in the design of complex devices namely: routers, network processors, communication systems and all put together on System-on-Chip. In all these systems, the processing element, the processors or in general the ALU is the component which requires to operate at high speed and which dissipates power the most. To address this, the power management which includes the estimation and analyses is done at every stage of the design phase. This helps the design to be optimal interms of power budgeting. Power dissipation consists of two components: static power and dynamic power. Dynamic power is also the switching power which is dissipated due to the switching activity. Dynamic power is defined as $$P_{\text{dynamic}} = \alpha * c * v^2 f \tag{1}$$ In equation (1): $\alpha$ is the switching activity, c the capacitance, v the supply voltage and f the frequency of operation. From the relation it is observed that the dynamic power is proportional to the frequency and the switching activity. Thus, the possible solution to reduce power could be by reducing the switching activity and not the frequency as, devices need to operate at high speed. Thus, for the ALU, the reduction of dynamic power needs to be addressed. Power optimization can be obtained with different power efficient techniques such as: clock gating, power gating, frequency scaling, dynamic voltage scaling, adaptive dynamic and voltage scaling or use a sleep transistor to turn ON and OFF the elements in the design[4]-[6]. Power reduction can also be done by varying the threshold voltage, applying muti-voltage to the design which have [1]-[2] shown good results in the previous research. At the architectural level, clock gating is the technique which can be applied between the various functional blocks. In this paper, the clock gating technique is applied to the ALU which processes the 16-bit of data. The power analysis is done at the hierarchical level along with the fanout analyses and average activity which can be the switching activity or the signal rate of the input output signals. The design is implemented using ISE14.2. The power analysis is done using the Xpower analyzer. The later section of the paper is organized as: section II will brief on the various clock gating techniques, section III will discuss about the ALU implemented, Section IV represents the results obtained and finally the conclusion of the work carried out. #### II. CLOCK GATING TECHNIQUES Clock network is the most power consumption block or the power hungry unit of any complex design. Literature says that ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) clock network consumes nearly 30%-40% of the total power. Thus it is most important to reduce this power. Clock continuously consumes power as it toggles the register and their associated logic. Clock gating is one of the technique which caters to this reduction, as it puts off the functional block which is not in operation. This is the technique which can be applied at the system level, gate level or the RTL level. In clock gating, the clock is provided to those registers or the functional blocks which has a change in state or in oth words put off the clock to those register which has no chan; in state. The clock gating technique are: - Latch Free CG - Latch based CG #### A. Latch Free CG This technique uses a simple AND gate, which either allows the clock to be applied to the function or not the representation is as shown in figure 1. Figure 1: Latch free CG As seen from the figure 1 the inputs are provided to the functional unit but the clock is applied only when the enable input is logic high. But the drawback of this technique is the output glitches as shown in figure 2. Figure 2: Problem of glitching #### B. Latch Based CG In this method the input enable is provided through the latch to overcome the glitch. This also ensures that the clock input will be propagated correctly which otherwise with AND gate would not have been as the En may have hazards that will be propagated. The technique is as shown in figure 3. Figure 3: Latch based CG In this paper the latch free clock gating technique is applied to analyse the power as well as the average activity of the design. #### III. IMPLEMENTATION OF ALU Arithmetic and logic unit performs all the arithmetic and logical operations. Here an 16-bit ALU is implemented. The ALU block is divided into two blocks: arithmetic block performs the arithmetic operation namely addition, subtraction, division and multiplication while the logical block performs the logical operation namely ANDing, ORing, complement, XOR, Shift etc. The implementation is done using the QuestaSim of Mentor Graphics and the synthesis is done using Precision Synthesis tool. The RTL schematic is as shown in figure 4. ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) Figure 4: RTL Schematic of ALU This design provide 16-bit output for both the arithmetic as well as logical block. The status flag namely zero, sign, carry parity are the part of the output parameters. The design is implemented for the Spartan 6 with speed grade of -3. The design is analysed for a frequency range from 1 MHz to 1000MHz. The device utilization for the above implementation is tabulated as shown in the table 1. TABLE 1: Device Utilization | Resource | Used | Avail | Utilization | | |----------------|------|-------|-------------|--| | IOS | 77 | 232 | 33.19% | | | Global Buffers | 1 | 16 | 6.25% | | | LUTs | 86 | 9112 | 0.94% | | | CLB Slices | 22 | 2278 | 0.97% | | | Dffs or Latches | 34 | 18224 | 0.19% | |-----------------|----|-------|-------| | | | | | From table 1 it can be observed that Input outputs are the once which have maximum utilization which is dependent on the number of inputs. But the deciding components for the average fanout and the average activity are the FFs, LUTs and the Signals. #### IV. RESULTS The simulation of the 16-bit arithmetic and logic functional unit is performed using the QuestaSim. The simulated waveform is as shown in the figure 5. Figure 5: Simulation of 16-bit ALU using clock gating ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) The simulation results using latch free clock gating technique shown in figure 5 generates glitches whenever there is a transition at the enable input. #### A. Average Fanout Fan-out is defined as the maximum number of inputs that a basic gate can drive. The fan-out limit for an electronic gate is determined by the current drive ability. At times after logic minimization the circuit may require larger fan-out limit that is beyond the maximum fan-out of available gates. One way out is to add extra buffer stages, but this increases the number of devices. Fan-out is also defined as the ratio of external load to its input capacitance. The average fanout for a 16-bit ALU using clock gating and without clock gating is tabulated in table | | Devices | | | | | | | | | |---------------|-----------|------|-----------|------|-----------|------|-----------|-----|--| | | FF | | LUT | | Signal | | carry | | | | | W/O<br>CG | WCG | W/O<br>CG | WCG | W/O<br>CG | WCG | W/O<br>CG | WCG | | | Top<br>Module | 2.11 | 1.6 | 1.15 | 2.02 | 3 | 2.45 | 5.68 | | | | Arithmetic | 2.11 | 2.24 | 1 | 1.82 | 1.39 | 1.8 | 5.68 | | | | Logical | | 1 | 1.45 | 2.49 | 1.35 | 3.7 | | | | TABLE 2: Average Fan-out From the table it is observed that the average fan-out for FFs and Signals for the arithmetic block is more compared to logical block for clock gating as it has to drive larger capacitance with the addition clock gating. The fan-out report for all the devices utilized using clock gating and without is as shown in figure 6 and 7 respectively. Figure 6: Fan-out report without Clock Gating ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) Figure 7: Fan-out report without Clock Gating #### B. Average Activity Average activity defines the activity of the various components. This is dependent on the frequency of operations. Figure 8 and 9 depicts the average activity of the arithmetic and logical block without and with clock gating. Figure 8: Average activity of arithmetic block From the figure 8 it is observed that the average activity of the devices with clock gating is more than that without clock gating, This implies that addition of clock gating increases the average activity of the design. Figure 9: Average Activity of Logical Module From the figure 9 it is observed that the average activity of the devices reduces with clock gating. Thus the combined average activity of the ALU at the top is reflected. #### C. Power Consumption The power consumed by both the modules is shown in the figure 10 and 11. Figure 10: Power consumption of arithmetic module The power of the arithmetic block increases with frequency and also with clock gating. ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE AND ENGINEERING TECHNOLOGY (IJRASET) Figure 11: Power consumption of logic block The power of the logical block is less with clock gating. The net effect at the top or hierarchical level is that the power consumed is less with clock gating. #### V. CONCLUSION In this paper, effort is made to analyze the performance parameter along with the fanout limit and the average activity of a 16-bit ALU implemented on Spartan 6 device. Fan-out limit for the device namely FF, LUTs, Signals are analyze. From the results obtained, it is observed that the fanout of FF and carry for the top and arithmetic module remain the same. But it varies for the LUTs and Signals. The fan-out limit is high for the devices using clock gating as they have to drive more number of output. The second parameter analyzed is the average activity of the top, arithmetic as well as the logical module. The average activity of the LUT reduces by nearly 10% and that of Signal by 30% with clock gating. This also reflects the reduction in power logical block by 28% for the 100MHz frequency and 32.5% for the 500MHz frequency. On the other hand the average activity of the arithmetic block is more hence the power consumed is more. But, when looked from the top module as a whole the overall power consumed is comparatively less using clock gating and so is the switching activity. As a future scope, this implementation can be done on higher FPGA and the performance parameters can be analysed. #### REFERENCES [1]. M Anis, S Areibi, M Elmasry, "Design and optimization of multithreshold CMOS (MTCMOS) circuits", IEEE Trans. on CAD of ICs, 2003. - [2]. A Abdollahi, F Fallah, M Pedram, "A Robust Power Gating Structure and Power Mode Transition Strategy for MTCMOS Design", IEEE Trans.on VLSI, 2007 - [3]. Luca Benini, Alessandro Bogliolo, and Giovanni De Michel, "A Survey of Design Techniques for System-Level Dynamic Power Management", *IEEE*, IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 8, NO. 3, JUNE 2000 - [4]. J. P. Oliver, J. Curto, D. Bouvier, M. Ramos, and E. Boemo, "Clock gating and clock enable for FPGA power reduction", 8th Southern Conference on Programmable Logic (SPL), pp. 1-5, 2012 - [5]. J. Kathuria, M. AyouKhan, Arti Noor,"A Review of Clock Gating Technique,"MIT International Journal of Electronics and Commmunication Engineering, Vol. 1 No. 2 Aug 2011, Pg 106- - [6]. J. Shinde, and S. S. Salankar, "Clock gating-A power optimizing technique for VLSI circuits", *Annual IEEE India Conference (INDICON)*, pp. 1-4, 2011 - [7]. J. M. Rabaey, and M. Pedram, Low Power Design Methodologies, (Kluwer Academic Publishers, 1996, ISBNO-7923-9630-8). - [8]. Chinney, D. Keutzer K, "Closing the Power Gap between ASIC and Custom Tools and Techniques for Low Power Design", Chapter 2 (http://www.springer.com/978-0-387-25763-1) 45.98 IMPACT FACTOR: 7.129 IMPACT FACTOR: 7.429 ## INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Call: 08813907089 🕓 (24\*7 Support on Whatsapp)