High-speed arithmetic operations are fundamental to modern digital systems, where multi-operand addition frequently determines the critical timing path. Conventional cascaded two-operand adders result in sequential carry propagation and increased latency. This paper presents an FPGA implementation employing carry computation to enhance timing performance. The proposed architecture decouples operand compression from carry resolution, thereby reducing critical path delay. Kogge–Stone, Han–Carlson, and Ladner–Fischer prefix structures are implemented in Verilog HDL and synthesized on an Artix-7 FPGA platform. Experimental results indicate that structured prefix networks improve computational efficiency while providing scalable trade-offs among delay, area, and power for high-throughput arithmetic systems.
Introduction
Arithmetic circuits, particularly adders, are fundamental to digital systems like processors, ALUs, and signal-processing units. Binary addition is frequently executed and often on the critical path, making low-latency, high-throughput adder design a key focus in VLSI and FPGA-based systems. Traditional multi-operand addition is performed using cascaded two-operand adders, which sequentially propagate carries, increasing critical path delay and limiting performance for wide-word operations.
To overcome this, parallel prefix adders compute carry signals hierarchically using associative operations, reducing carry propagation depth to logarithmic with respect to operand width. Well-known topologies—Kogge–Stone, Han–Carlson, and Ladner–Fischer—balance speed, wiring complexity, and fan-out in two-operand addition. However, structured three-operand addition with FPGA validation has been less explored.
The proposed methodology introduces a three-operand binary adder with:
Operand Compression Stage: Compresses three input bits at each position into intermediate sum (Si′ = Ai ⊕ Bi ⊕ Ci) and carry (Ci′ = AiBi + BiCi + AiCi) signals, enabling independent bit-level processing.
Parallel Prefix Carry Computation: Global carries are resolved using hierarchical prefix networks (Gi = Ci′, Pi = Si′) and an associative prefix operator (Gk, Pk) ? (Gj, Pj) = (Gk + PkGj, PkPj), achieving logarithmic carry resolution depth.
Final Sum Generation: Combines intermediate sums with resolved carries (Si = Si′ ⊕ Ci) to produce the final result.
The design integrates multiple prefix topologies for comparison:
Kogge–Stone: Minimal logic depth, high interconnect density.
Han–Carlson: Hybrid approach balancing depth and wiring.
Ladner–Fischer: Flexible construction with controlled fan-out, reduced routing congestion.
Tested for 16-bit and 32-bit operands with post-synthesis evaluation of LUT/FF usage, critical path delay, maximum frequency, and power consumption.
The integrated compression + prefix architecture eliminates sequential carry dependency, providing scalable, high-speed three-operand addition suitable for performance-critical digital arithmetic systems.
Conclusion
This work presented the design and FPGA implementation of a structured three-operand binary adder based on parallel prefix carry computation. The proposed architecture integrates bit-level operand compression with hierarchical carry resolution, thereby eliminating sequential carry dependency present in conventional cascaded designs. Three established prefix topologies—Kogge–Stone, Han–Carlson, and Ladner–Fischer—were implemented and evaluated on an Artix-7 FPGA platform for 16-bit and 32-bit configurations.
Experimental results demonstrate that the Kogge–Stone architecture achieves the lowest propagation delay, making it suitable for high-performance arithmetic applications. The Han–Carlson topology provides a balanced trade-off among area, delay, and power consumption, while the Ladner–Fischer structure offers moderate area efficiency with comparatively higher delay at larger operand widths. The observed timing trends confirm the logarithmic carry resolution property of prefix-based architectures, ensuring controlled delay growth with increasing word length.
Overall, the integration of three-operand compression with structured parallel prefix networks enables scalable and efficient multi-operand addition on FPGA platforms. The proposed framework offers a practical solution for high-throughput arithmetic systems requiring improved timing performance and architectural flexibility.
References
[1] P. M. Kogge and H. S. Stone, “Parallel algorithms for efficient evaluation of recurrence relations,” IEEE Trans. Computers, vol. 22, no. 8, pp. 786–793, Aug. 1973.
[2] R. P. Brent and H. T. Kung, “A structured layout approach for parallel adder design,” IEEE Trans. Computers, vol. 31, no. 3, pp. 260–264, Mar. 1982.
[3] J. Ladner and M. J. Fischer, “Prefix computation techniques for parallel processing,” J. ACM, vol. 27, no. 4, pp. 831–838, Oct. 1980.
[4] S. Knowles, “Design considerations for a family of parallel adders,” in Proc. IEEE Symp. Computer Arithmetic, 2001, pp. 277–281.
[5] R. Zimmermann, “Efficient binary adder architectures for VLSI systems and synthesis,” Integration, VLSI J., vol. 28, no. 2, pp. 171–200, 1999.
[6] M. D. Ercegovac and T. Lang, Digital Arithmetic: Architectures and Implementations. Los Alamitos, CA, USA: IEEE Computer Society Press, 2004.
[7] Y. He and C. H. Chang, “Hybrid carry-lookahead and carry-select adder with improved power-delay performance,” IEEE Trans. Circuits Syst. I, vol. 51, no. 5, pp. 901–909, May 2004.
[8] B. Ramkumar and H. M. Kittur, “Low-power and compact carry select adder design,” IEEE Trans. VLSI Syst., vol. 20, no. 2, pp. 371–375, Feb. 2012.
[9] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS digital circuit techniques,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484, Apr. 1992
[10] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. New York, NY, USA: John Wiley & Sons, 1999.
[11] S. Mathew, M. Anders, R. Krishnamurthy, and S. Borkar, “Design of arithmetic logic units in sub-32 nm technologies,” IEEE J. Solid-State Circuits, vol. 44, no. 4, pp. 1268–1277, Apr. 2009.
[12] N. H. E. Weste and D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Boston, MA, USA: Pearson Education, 2011.
[13] D. Harris and S. Harris, Digital Design and Computer Architecture, 2nd ed. San Francisco, CA, USA: Morgan Kaufmann, 2012.
[14] A. Morgenshtein, A. Fish, and I. A. Wagner, “Gate diffusion input logic for low-power digital circuits,” IEEE Trans. VLSI Syst., vol. 10, no. 5, pp. 566–581, Oct. 2002.