## INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Volume: 13 Issue: V Month of publication: May 2025 DOI: https://doi.org/10.22214/ijraset.2025.70153 www.ijraset.com Call: © 08813907089 E-mail ID: ijraset@gmail.com ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com # Design and Implementation of a Power-Efficient 32-bit MAC Unit Using Vedic Multiplier and 64-bit CLA with Clock Gating for FPGA-Based Applications K. Manisha<sup>1</sup>, V. Sai Sagar<sup>2</sup>, K. Omkiran<sup>3</sup>, P. Naga Chirudeep<sup>4</sup>, G. Prem Sai<sup>5</sup> Assistant Professor, Dept. of ECE, Annamacharya Institute of Technology and Sciences, Tirupati, India 2, 3, 4, 5 UG Student, Dept. of ECE, Annamacharya Institute Of Technology And Sciences, Tirupati, India Abstract: This study presents development and implementation of a power-efficient architecture of a 32-bit MAC Unit built with Verilog HDL for FPGA applications. This architecture also connects a Vedic multiplier to boost multiplication, along with a architecture of 64-bit CLA and 64-bit MAC Unit accumulator to facilitate rapid accumulation with reduced propagation delay. To reduce power, RTL clock gating is used to eliminate unnecessary switching activities. To check correctness and performance of the proposed ISUM design, we perform simulation and verification in Xilinx Vivado. Post-synthesis verification results on important parameters such as timing, resource usage and dynamic power are presented and reveal significant improvements over traditional MAC architectures. Furthermore, the design is scalable, allowing higher bit width computations to be performed with very few modifications. The architecture is a computing element with better performance in terms of performance and energy codes, a comparative study with the correct MAC units will show that the MAC processing elements are the most suitable units for high-performance DSP and AI-driven FPGAs. Keywords: 32-bit MAC Unit, Vedic Multiplier, 64-bit CLA, Clock Gating, 64 - bit Accumulator, FPGA, Xilinx Vivado, RTL design. ### I. INTRODUCTION Emphasis on the importance of high speed and energy-efficient computation in DSP (digital signal processing) applications is never-ending. Multiply and Accumulate (MAC) unit co- processors are fundamental to enhancing the computational performance of such systems. MAC units are heavily utilized in operations such as convolution, filtering, and machine learning. Nevertheless, classical MAC architectures suffer from high power and latency overhead, which stems from high switching activity and carry propagation in traditional adders. This paper introduces a power-efficient design approach MAC unit realized with such a High-Speed Multiplier Developed from Urdhva Tiryagbhyam Method for compute the product in a very short time. It also features a 64 Bit Carry Look-Ahead Adder capable of increase the speed of addition and clock gating to Reduce power usage by turning off portions of the clock that are not being used. Both MAC architectures are synthesized on both Xilinx Vivado and FPGA, making them superbly viable for general DSP applications. FPGA, making them superbly viable for general DSP applications. Pipelining techniques are also used to achieve higher throughput and reduced overall computational latency. The new architecture is highly adjustable, with low- friction transition to larger bit-width computations. Experimental analysis of the synthesized chip proves to be an application of extreme power efficiency, resource utilization and performance comparing to the primitive architecture of MAC in the post-processing of the experimental elements. ### II. LITERATURE SURVEY - A. Advancements in Multiply-Accumulate (MAC) Units - 1) A study conducted in 2020 explored the use of Vedic multipliers in MAC units to enhance performance while reducing power consumption and area, making them particularly effective for DSP (digital signal processing) applications. - 2) Clock gating has gained popular acceptance as a successful technique for maximizing dynamic power savings by selectively turning off idle parts, decreasing overall energy consumption. ### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com - B. Vedic Multiplier for Power Efficiency - 1) A 2019 study investigated reversible logic gates in creating a 32-bit Vedic product generator, showing dramatic reductions in energy dissipation over conventional multipliers. - 2) In 2022, research compared different Vedic algorithms, including the Urdhva Tiryakbhyam algorithm, to minimize area and power consumption in ASIC designs, validating Vedic multipliers as an energy-efficient solution for low-power applications, including IoT and embedded systems. - C. 64-bit Carry Look-Ahead Adder - 1) In 2021, a study was conducted to maximize the 64-bit CLA by minimizing logic gate complexity for faster arithmetic calculations with lower power consumption. - 2) Studies in 2023 continued to enhance the design of the CLA by reducing its complexity, leading to better speed and power efficiency for high-speed MAC units. - D. Power Optimization Using Clock Gating - 1) Fine-grained clock gating was proposed in 2020 as a method to selectively turn off the clock signal to idle portions of the MAC unit to minimize dynamic power consumption. - 2) Clock gating was found to result in power savings of as much as 35% in 32-bit MAC units, as attested to by a 2022 paper, without notable loss in performance. ### III. DESIGN METHODOLOGY ### A. A 32-bit Architecture A 32-bit MAC unit has been developed using a Vedic multiplier and a Carry Look- Ahead Adder (CLA) circuit. This architecture follows a hierarchical and modular approach, decomposing the 32-bit MAC unit into smaller, more manageable components, including Vedic multipliers, N-bit adders, half adders, and full adders. The design is implemented using a top-down methodology and structured modeling techniques, where the circuit is defined by specifying its components and their interconnections through Verilog Hardware Description Language (VHDL). Figure 1 provides a visual representation of the primary building blocks of the 32-bit MAC unit. - a) 32x32 Vedic Multiplier - b) 64-bit CLA (Carry Look-Ahead Adder) - c) 64-bit Accumulator - d) Clock Gating Unit ### B. 32x32 Vedic Multiplier In this post, I will explain how a 32×32 Vedic Multiplier uses Urdhva Tiryagbhyam sutra of Vedic mathematics to multiply with a higher speed and efficiency. This mechanism reduces the creation of partial products which results in quicker calculations compared to earlier multiplication methods. It offers advantages such as Decreases energy consumption and shortens signal transmission time, making it extremely efficient for use in DSP applications. The ability of Vedic multiplication to process in parallel makes it superior in performing arithmetic operations at high speeds. As a result of these advantages, it finds extensive application in ASIC and FPGA design, making it a sure-shot option for real-time systems. Figure 1. 32-bit MAC unit ### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com ### C. 64-bit CLA (Carry Lookahead Adder) Adders are commonly used in the building block of digital circuits and these adders propagate carry most likely due to the ripple-carry adder delay. It enhances the addition speed for multi-bit numbers without any carry operations through the use of carry-lookahead logic. This logic is important for processors and DSP systems to per- form arithmetic processing at high speed. The CLA is efficient for long-word computations in comparison to the other adder architectures, thus making the CLA suitable especially in the case of real time digital processing. While it involves greater design complexity, it provides superior delay reduction and power efficiency. ### D. 64-bit Accumulator A 64-bit accumulator is a sequential digital circuit that retains and updates the results of arithmetic operations over several cycles. Serving as both a register and an adder, it holds values for iterative operations such as multiplication and filtering. This block is used in MAC processors, digital filters, and signal processing, where rapid accumulation of sums is allowed with negligible delay. For high-speed computation, a 64-bit accumulator facilitates greater precision and less truncation error, increasing system performance as a whole. ### E. Clock Gating Unit A Clock Gating Unit is a power-saving technique that selectively disables the clock signal for inactive circuit sections, reducing unnecessary switching activity. Widely used in low-power digital designs, this approach helps lower dynamic power consumption in ASICs and FPGAs. Clock gating improves the energy efficiency of the design, while retaining its functional behavior, by only sending a clock signal to components that are active. This helps save power in battery-powered devices and improve performance in high-performance computing systems. In MAC units, clock gating significantly improves power efficiency without compromising computational speed. ### IV. FPGA IMPLEMENTATION OF A VEDIC MULTIPLIER Deploying a Vedic multiplier on an FPGA takes advantage of the Urdhva Tiryagbhyam (Vertically and Crosswise) algorithm to enhance processing speed and efficiency through parallel partial product processing. In contrast to conventional methods of multiplication, the method reduces propagation delay and resource usage while maximizing performance, thus being apt for FPGA-based designs. The architecture is de-signed using hardware description languages like VHDL or Verilog and implemented on FPGA platforms such as Xilinx or Intel through synthesis and optimization. With parallel computing and pipelining being combined, Vedic multiplier maintains low latency along with high throughput, and so it is ideal for real-time applications in cryptography, digital signal processing, as well as image processing. ### A. Design and Verification of RTL (Register Transfer Level) A high-efficiency architecture of multiply-accumulate 32-bit MAC engine is proposed that can realize the high-speed binary multiplication. The modular architecture allows scalability for larger sizes and large binary formats along with the integration of various multiplier and adder modules with minimal modifications, enabling flexible configurations. To optimize multiplication performance, the design utilizes the Vedic multiplication approach, specifically adopting the Urdhva Tiryagbhyam technique, which combines vertical and crosswise computations. In a full adder circuit, the Carry Generate (G) and Carry Propagate (P) variables are essential for simplifying carry calculations. These variables are particularly important in the design of high- speed adders, such as the carry-lookahead adder. $$P_i = A_i \bigoplus B_i G_i = A_i B_i$$ The resulting sum and carry can be expressed with respect to carry generate $G_i$ and carry propagate $P_i$ as $$S_i = P_i \bigoplus C_i C_{i+1} = G_i + P_i C_i$$ The term $G_i$ represents carry generation, occurring when both $A_i$ , $B_i$ are 1 independent of the input carry. On the other hand, $P_i$ signifies carry propagation, determining whether the carry will be transferred to $C_i$ to $C_{i+1}$ ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com The logical function defining carry output at each step in a 4-stage Carry Look- Ahead Adder may be represented as C1 = G0 + P0Cin $C_2 = G_1 + P_1C_1 = G_1 + P_1G_0 + P_1P_0C_{in}$ $C_3 = G_2 + P_2C_2 = G_2 + P_2G_1 + P_2P_1G_0 + P_2P_1P_0C_{in}$ C4 = G3 + P3C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Cin Figure 2. Schematic depiction of the design in Xilinx Figure 3. 32-bit Multiplication performed for Verification of MAC Unit ### B. Logic Synthesis The subsequent stage in the design flow, following logic verification, is logic synthesis. During this phase, the high-level RTL description is converted into a gate-level representation using a standard cell library and specified design constraints. Xilinx Vivado was utilized for this process. Furthermore, a comparative evaluation of different adder architectures resulted in the following worst-case delay findings: • Ripple Carry Adder (RCA): 55 ns • Carry Look-Ahead Adder (CLA): 15 ns Carry-Save Adder (CSA): 20 ns The synthesis results for the Vedic 32x32 module, which serves as the top module in this design, provided the following key insights: | Resource | Utilization | Available | Utilization % | |----------|-------------|-----------|---------------| | LUT | 1471 | 133800 | 1.10 | | FF | 64 | 269200 | 0.02 | | IO | 131 | 400 | 32.75 | | BUFG | 2 | 32 | 6.25 | | | | I | | Table 1. Resource Utilization Report ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com ### V. RESULTS AND FUTURE WORK ### A. Result Analysis Power consumption and utilization of the new design were assessed, and the total power consumption comprised of 152.061 W resulted from 99% dynamic power (150.387 W) and 1% static power (1.674 W). The remaining dynamic power mainly based on the information that 40% came from the I/O (59.760 W) followed by logic with 33% (49.993 W) and 27% (40.635 W) in signals. The static power came mostly from the PL static component (100%, 1.674 W). Utilization of resources was also minimal, with 1.63% LUT usage (2175 out of 133800), 0.02% FF usage (64 out of 269200), 32.75% I/O usage (131 out of 400), and 3.13% BUFG usage (1 out of 32). The above findings indicate the efficiency of the design in logic and area usage, with I/O operations being the largest contributor to power consumption. Further I/O process optimization could potentially improve total energy efficiency without sacrificing minimal hard- ware overhead. Additionally, the minimal logic resource utilization implies that the design can be extended to handle more complex operations with little com-promise in performance, making it well-suited for high-efficiency computing environments. ### B. Future Scope of the work There are several directions in which future development of the 32-bit low-power MAC unit with Vedic multiplier, 64-bit CLA and clock gating can be taken. A key development, for example, is minimizing power even further by using techniques like Dynamic Voltage and Frequency Scaling and Multi-Threshold CMOS (MTCMOS), which reduce energy consumption without loss of performance. Furthermore, using parallel processing techniques can increase throughput and overall efficiency. A possible expansion of the MAC unit into 64-bit or 128-bit bit-widths would grow its capacity to process complex mathematical computations, such that it might be more fitting for high-level DSP and other resource-intensive process tasks. Developing research with emerging technologies such as quantum-dot cellular automata or memristor-based circuits may further extend power efficiency as well as data processing speeds. As MAC units are essential for AI and ML applications, it would be greatly beneficial to improve the de- sign of inference engines—particularly in power-limited situations such as mobile and edge computing. Shifting to smaller process nodes, like 7nm or 5nm, could improve the performance-to-power ratio, consistent with contemporary ASIC platforms. Lastly, enhancing design scalability would allow for mass adoption across different devices, ranging from smartphones to IoT systems, making it more adaptable and impactful. Figure 4. Power Analysis ### VI. CONCLUSION This research aims to develop an optimized, energy-efficient 32-bit MAC unit with a Multiplier Using Vedic Mathematics and a 64-bit CLA (Carry Look-Ahead Adder) using clock gating methodologies. New architecture provides the design simplicity at the higher computation speed with low propagation delay and power consumption for high performance applications. Through clock gating, the design reduces dynamic power consumption by inhibiting clock signals within idle areas, enhancing system-wide energy efficiency. Designed based on Verilog HDL and synthesized in Xilinx Vivado, the design proves to feature significant speed boosts at reduced power consumption relative to standard MAC blocks. This architecture offers a scalable and high-performance solution for contemporary computing systems, with potential for additional optimization via pipelining and sophisticated power management methods. ### International Journal for Research in Applied Science & Engineering Technology (IJRASET) ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538 Volume 13 Issue V May 2025- Available at www.ijraset.com ### VII. ACKNOWLEDGMENT This study was carried out at Annamacharya Institute of Technology & Sciences, Electronics Laboratory, Tirupati, India. We thank the management of the institution for providing the resources and facilities required for this study. We also appreciate K. Manisha, lab in-charge, for her guidance and support during the project. Her technical expertise and motivation played a crucial role in the successful execution of this work. ### REFERENCES - [1] S. Krishna Vamsi and S. R. Ramesh," An Efficient Design of 16 Bit MAC Unit using Vedic Mathematics," 2019 International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 2019, pp. 0319-0322. - [2] Kavindra Dwivedi, R.K. Sharma and Ajay Chunduri," Hybrid Multiplier-Based Optimized MAC Unit", 2018 9th International Conference on Computing Communication and Networking Technologies (ICCCNT), pp. 1-4, 2018. - [3] Hasan Krad and Aws Yousif Al-Taie," Performance Analysis of a 32-Bit Multiplier with a Carry-Look-Ahead Adder and a 32-bit Multiplier with a Ripple Adder using VHDL", Journal of Computer Science, vol. 4, no. 4, pp. 305-308, 2008, ISSN 1549-3636. - [4] R. Harsha, S. R. Anilkumar, R. Chandan, Kumar S Jeevan, N. Manjula, R. Harsha et al.," Design of Vedic multiplier using Urdhva Tiryagbhyam Sutra", International Journal of Advance Research Ideas and Innovations in Technology © 2019, vol. 5, no. 3, pp. 428. - [5] R K. Bathija, R S. Meena, S Sarkar and Rajesh Sahu," Low Power High Speed 16x16 bit Multiplier using Vedic Mathematics", International Journal of Computer Applications, vol. 59, pp. 41-44. - [6] M. V. Durga Pavan and S R Ramesh," An Efficient Booth Multiplier Using Probabilistic Approach", 2018 International Conference on Communication and Signal Processing (ICCSP), pp. 365-368, 2018. - [7] SV Mogre and DG Bhalke," Implementation of High-Speed Matrix Multiplier using Vedic Mathematics on FPGA", Proceedings International Conference on Computing Communication Control and Automation, pp. 959-963, 2015. - [8] R. Balakumaran and E. Prabhu," Design of high-speed multiplier using modified booth algorithm with hybrid carry look-ahead adder", 2016 International Conference on Circuit Power and Computing Technologies (ICCPCT), pp. 1-7, 2016. - [9] A. Eshack and S. Krishnakumar," Implementation of Pipelined Low Power Vedic Multi- plier", 2nd International Conference on Trends in Electronics and Informatics (ICOEI), - [10] pp. 171-174, 2018. - [11] S. N. Gadakh and A. K. Khade," Design and Optimization of 16×16 Bit Multiplier Using Vedic Mathematics", proc. Int. Conf. on Automatic Control and Dynamic Optimization Techniques, 2016. 45.98 IMPACT FACTOR: 7.129 IMPACT FACTOR: 7.429 ### INTERNATIONAL JOURNAL FOR RESEARCH IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY Call: 08813907089 🕓 (24\*7 Support on Whatsapp)