



# iJRASET

International Journal For Research in  
Applied Science and Engineering Technology



---

# INTERNATIONAL JOURNAL FOR RESEARCH

IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

---

**Volume: 13      Issue: XII      Month of publication: December 2025**

**DOI:** <https://doi.org/10.22214/ijraset.2025.76268>

**www.ijraset.com**

**Call:**  08813907089

**E-mail ID:** [ijraset@gmail.com](mailto:ijraset@gmail.com)

# An Early Clock Based Multi Tap Flexible H-Tree for High Performance CPU Design for Achieving Best Cross Corner Scaling

Mukul Anand<sup>1</sup>, Nikunj Tripathi<sup>2</sup>

MDG-GPM, STMicroelectronics, Greater Noida, India

**Abstract:** The High-Performance Designs need a robust Clock Tree which achieves a low skew, insertion delay and minimum power. A Conventional CTS does not provide an optimal solution to these. The Proposed methodology enables an early clock Based flow for a Multi-Tap Flexible H Tree Clock Tree Generation. The Early Clock Flow will build a preliminary Clock Tree at placement helping in achieving a clock gate enable timing estimation and better placement. The Multi Tap Flexible H Tree helped in best cross corner scaling as well as electrically and geometrically symmetry.

**Keywords:** Clock Jitter, Multi-Source CTS (Clock Tree Synthesis)

## I. INTRODUCTION

A Customized Clock Tree Synthesis (CTS) is typically designed for better cross corner scaling. It has a combination of large drivers and low RC delay routing layers (Top Layers) to reduce clock latency. Widely used such structures are the H-tree and an H-tree driving a clock mesh. The *Flexible H-Tree* form of CTS provides the *electrically symmetric* buffering and balanced wire length benefits of an H-tree, but relaxes the requirement to be *geometrically symmetric*, enabling auto-mated synthesis. This is applicable even in floorplans with placement restrictions. *Multi-Tap Clock Tree Synthesis*, also known as Multi-Source Clock Tree Synthesis, is fully integrated with the Flexible H-tree feature and extends regular clock synthesis to provide local buffering and balancing between the structured top of tree and the clock sinks.

In the PnR implementation flow, the clock tree cells are not inserted during placement stage. Therefore, PnR tool cannot consider routing congestion caused by the addition of clock tree cells and even the clock gating timing paths are inaccurate. Early Clock Flow (ECF) is inserting a preliminary clock tree during placement optimization stage, using a fast clock tree clustering by annotating the clock latencies for timing optimization. A clock tree during placement stage normally results in a better standard cell placement and congestion estimation. ECF includes clock gating timing path estimation as well. This can result in better timing because pre-CTS optimization has more timing impact than post-CTS. ECF uses Concurrent clock and Data optimization (CCOPT) with useful skew technologies, doing late as well as early skewing. This Paper has taken a CPU Design which has 5 million Gates and 1.2GHz Frequency. The size of Block was 1000um X 1000um. Cadence Innovus Flexible H Tree with Multi Tap feature was employed. This feature has less CTS run time but better electrically symmetrical buffering, balanced wirelength which helps in achieving a low skew and better geometrical symmetric CTS.

The Results show that Multi Tap Flexible H-Tree used in Pre-CTS and CTS stage gives a better result for High Frequency design like CPU's which are working at 1.2GHz and beyond.

## II. CLOCK TREE SYNTHESIS STRATEGY

### A. Overview of a Flexible H Tree

An H-tree alone is not the replacement for the Conventional Clock Tree. Conventional CTS is required to complete the buffering and balancing required between the sinks of the H-tree and the clock sinks. Fig 2 below shows the logical structure of an H Tree. The root pin is a cell instance or an external input port. It is a starting point for the signal to be distributed across the H-tree. The output pins of the H-tree sinks (in red), serve as multiple sources for multi-tap CTS, which will synthesize the sub-trees underneath each tap.

A regularly spaced grid of sinks leads to a more efficient H-tree due to the need for electrical symmetry. Clock Tree synthesis algorithm exploits the regularity of a sink grid when determining the H-tree topology.



**Cross Corner scaling:** This picture below depicts a Conventional CTS which has been balanced primarily considering a slow corner. This Tree has been split into upper and lower sections. Conventional CTS achieves a balanced tree via a mix of cell insertion, sizing and wire length adjustment that may take place in both upper and lower sections of the tree. When timed in a fast corner, the delays of different cell sizes or cell types may scale differently to one another and differently to the RC delay of the connecting wires, leading to skew. Depending on the locality of timing paths and the purpose of the delay corners, this can lead to harder setup/hold timing closure.

In Case of the H-Tree CTS, top of tree is electrically symmetric and maintains a perfect skew over all delay corners. The skew at the sinks is correspondingly reduced in the fast corner when compared to Conventional CTS.



### B. Early Clock Flow Based Placement

The ECF enables the tool to measure and optimize for the impact of multi-tap CTS clock gate cloning earlier in the flow when more powerful optimization transforms such as pre-CTS skewing, critical path placement changes, and multi-bit flop optimization are available.



After building the initial clock tree, several iterations between incremental placement, timing optimization, useful skewing, congestion repair, power optimization, will follow. Inside optimization, there are several useful skew passes.

A basic sample script for the ECF can be referred as below:

```

set_multi_cpu_usage -local_cpu 8
read_db cpu_design.enc
set_db design_early_clock_flow true
set_clock_buffer_cells {HDBLV16_BUFSKR_16 HDBLV16_BUFSM_12 HDBLV16_BUFSKF_16 HDBLV16_BUFSINV_10 HDBLV16_INV_S_10 HDBLV16_INV_S_12 HDBLV16_INV_S_14}
set_clock_inverter_cells {HDBLV16_INV_S_10 HDBLV16_INV_S_12 HDBLV16_INV_S_14}
set_icgCells {HDBLV16_CKGTPLT_VTOP1_16 HDBLV16_CKGTPLT_V7_12 HDBLV16_CKGTPLT_V5_16}
create_ccopt_clock_tree_spec -file ccopt.spec
source ccopt.spec
set_db cts_buffer_cells $clock_buffer_cells
set_db cts_inverter_cells $clock_inverter_cells
set_db cts_clock_gating_cells $icgCells
synthesize_ccopt_flexible_htrees
set_ccopt_property extract_balance_multi_source_clocks true
place_opt_design
ccopt_design
  
```

### C. Multi Tap H-Tree CTS advantage Normal H-Tree

Flexible H-tree synthesis will by default insert symmetry buffers (which might be inverters) to ensure electrical symmetry. The example below shows a Flexible H Tree that contains two symmetry buffers where three sinks are missing from a tree, which otherwise will be expected to have eight 11. The symmetry buffers serve the purpose of ensuring a balanced pin load and routing structure (that is, electrical symmetry).



Multi-tap CTS is quite simply regular CTS/CC-Opt. The multi-tap functionality is enabled by the existence of one or more clock tree source groups.

CTS will perform multi-tap assignment, cloning if required, and will place the clones as part of regular CTS placement and buffering. Balancing will take place per the skew group definitions. Typically, a single skew group will balance all sinks under the H-tree together.



### D. Image debug output details

To aid debugging, flexible H-tree synthesis can output PNG format image files that represent the synthesis algorithm view of the floorplan, sink grid, and H-tree.

| Color/Shape      | Purpose                                                                   |
|------------------|---------------------------------------------------------------------------|
| white grid       | The synthesis grid                                                        |
| red              | Grid points that are blocked for trunk cell placement                     |
| orange           | Grid points that are blocked for final cell placement                     |
| red-orange       | Grid points that are blocked for both trunk and final cell placement      |
| yellow circle    | Root pin                                                                  |
| yellow cross     | Target location for an H-tree sink to be inserted                         |
| yellow rectangle | Sink area, as per -sink_grid sink_area or user specified area with -sinks |
| brown rectangle  | Sink grid bounding box if -sink_grid_box is specified                     |
| green/blue       | Intended edges of the synthesized H-tree (<htree_name>.tree.png only)     |
| yellow dots      | H-tree drivers (<htree_name>.tree.png only)                               |

### III. EXPERIMENTAL SETUP AND RESULTS

H-tree synthesis operates on an internal synthesis grid and all features are aligned to that grid.

Image 1: - This shows the Proposed H Tree Layout for the current Design and the location of the Drivers.



Image 2: - This shows the Floorplan and the Sink Grid



Image 3: - Flexible H-Tree with Multi Tap for one Clock



Image 4: - Design view with Proposed CTS



Table 1: - Timing Numbers for the Different Placement

| Experiment                         | Placement (WNS/TNS/FEP) | CTS Opt (WNS/TNS/FEP) |
|------------------------------------|-------------------------|-----------------------|
| No ECF                             | -0.282/1301.121/12849   | -0.291/1758/21475     |
| ECF with Conventional CTS          | -0.170/-892.432/8817    | -0.220/-965.79/9287   |
| ECF with Multi Tap Flexible H tree | -0.149/-465.791/5347    | -0.182/-519.79/6321   |

Table 2: - Clock Summary Comparison

| Clocks<br>(NIC_AHB_CLK<br>_clk)  | Clock Gates | Buffer | Inverter | Max_length (um) | Standard Cell<br>Area (um^2) |
|----------------------------------|-------------|--------|----------|-----------------|------------------------------|
|                                  |             |        |          |                 |                              |
| Conventional<br>CTS              | 3419        | 0      | 4150     | 245.38          | 4770.62                      |
| Flexible H Tree<br>Multi Tap CTS | 4772        | 0      | 1593     | 123.33          | 5017.02                      |

  

|                  | Latency (ns) | Skew (ns) | CTS Routing<br>Layer | CTS Cells                                                                    | Max/Min Clock<br>Level |
|------------------|--------------|-----------|----------------------|------------------------------------------------------------------------------|------------------------|
|                  |              |           |                      |                                                                              |                        |
| Conventional CTS | 1.26         | 0.042     | M3-M6                | HDBLV16_INV_S_12<br>HDBLV16_INV_S_10<br>HDBLV16_INV_S_9<br>HDBLV16_INV_S_8   | 32/43                  |
| MultiTap CTS     | 0.849        | 0.036     | M5-M7                | HDBLV16_INV_S_16<br>HDBLV16_INV_S_14<br>HDBLV16_INV_S_12<br>HDBLV16_INV_S_10 | 14/24                  |

Table 3: - Power Comparison

| Power Comparison (mW) | Internal | Leakage | Switching | Total | Voltage (V) |
|-----------------------|----------|---------|-----------|-------|-------------|
| Conventional CTS      | 813.6    | 189.2   | 555.7     | 1559  | 0.99        |
| MultiTap CTS          | 752.1    | 172     | 615       | 1539  | 0.99        |

#### IV. CONCLUSION

Balanced CTS optimization is really a great challenge. Based on unique insight into Clock skew and Latency , this novel methodology is proposed to optimize the CTS , which can achieve better clock latency and clock power compared with the results from Conventional Method of CTS build. This Proposed CTS Methodology can also improve TNS of setup as well hold timing, for the impact of OCV is reduced from the cross-corner scaling. This approach is useful for a better leakage power reduction. Further enhancements can be made by implementing this strategy on designs with more congestion and multiple power domains.

#### REFERENCES

- [1] Kwangsoo Han, Andrew B. Kahng, Jiajia Li, "Generalized H-Tree Topology and Buffering for High-Performance and Low-Power Clock Distribution" DOI 10.1109/TCAD.2018.2889756, IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems2018.
- [2] Wei-Khee Loo, Kok-Siang Tan and Ying-Khai The, "A study and design of CMOS H-Tree clock distribution network in system-onchip".2009 IEEE 8th International Conference on ASIC. doi:10.1109/asicon.2009.5351254.
- [3] Chuan Yean Tan†, Rickard Ewetz‡, and Cheng-Kok Koh, "Clustering of flip-flops for useful-skew clock tree synthesis". 2018 23rd Asia and South Pacific Design Automation Conferences (ASP-DAC). doi:10.1109/aspdac.2018.8297374."
- [4] Meng Liu, Zhiwei Zhang, Wenqin Sun, Donglin Wang, "Liu, M., Zhang, Z., Sun, W., & Wang, D. (2017). Obstacle-aware symmetrical clock tree construction. 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS). doi:10.1109/mwscas.2017.8052973".
- [5] Q. K. Zhu, C. E. Dike, "High-Speed Clock Network Design" Kluwer Academic Publishers, 2003.
- [6] T. Lu and A. Srivastava, "Gated low-power clock tree synthesis for 3D-ICs", in Proc. ISLPED, pp.319-322, Aug. 2014.
- [7] Chao Deng, Yici Cai, Qiang Zhou, Zhiwei Chen, "An efficient buffer sizing algorithm for clock trees considering process variatons" in 6th asia symposium on quality electronic design, Aug,2015.
- [8] Minghao Lin, Heming Sun and Shinji Kimura, "Power- Efficient and Slew-Aware Three-Dimensional Gated Clock Tree Synthesis", 2016 IFIP/IEEE International Conference on Very Large-Scale Integration (VLSI-SoC).
- [9] Pinaki Chakrabarti, "Clock Tree Skew Minimization with Structured Routing", 2012 25th International Conference on VLSI Design.
- [10] Anand Rajaram, Member, IEEE, and David Z. Pan, Senior Member, IEEE, "Robust Chip-Level Clock Tree Synthesis", IEEE transactions on computer-aided design of integrated circuits and systems, vol. 30, no. 6, June 2011.
- [11] Cadence Innovus User Guide, <http://www.cadence.com>



10.22214/IJRASET



45.98



IMPACT FACTOR:  
7.129



IMPACT FACTOR:  
7.429



# INTERNATIONAL JOURNAL FOR RESEARCH

IN APPLIED SCIENCE & ENGINEERING TECHNOLOGY

Call : 08813907089 (24\*7 Support on Whatsapp)