Repository logo

Performance driven global routing for large scale macro-cell based designs

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Abstract

Technology scaling to the sub-micron feature sizes for complementary metal oxide silicon (CMOS) devices has led to decreased device cost and increased performance, but it has singled out interconnects which do not scale well. Such rapid scaling has introduced new challenges for CMOS interconnects. The main challenge is to support interconnect optimization with minimum performance cost. Thus, new design methodologies, algorithms as well as the integration of new materials, are needed to achieve the design closure. The ability to concurrently synthesize interconnects and logic to achieve the best overall solution is the key for next generation of very large scale integration (VLSI) designs. The traditional sequential design flow implies that block design be completed prior to routing. Routing and timing failures often require changes to the block design in the early stages of the design flow. Consequently, any change to the block design will have an impact on the routing. This expensive design loop may converge slowly or may not converge to a desired solution at all. To allow a net-centric design methodology, design of interconnects needs to be an integral part of the overall design flow. Routing-driven/aware design methodologies also contribute positively to the quality of the final solution. As design rules scale further down to the deep sub-micron region, achieving the design closure will get harder due to a variety of emerging problems: · Congestion management is critical since a detour around congested areas as well as the possible coupling in congested areas has a negative impact on delay. · More accurate delay analysis is needed to adequately address the nanometer effects. · Layer assignment has to be done with tighter integration of performance-driven routing-tree construction and buffer insertion to obtain an optimal routing. · Faster and more efficient algorithms are needed to accomplish the required tasks in reasonable time. Current methods attack the above issues separately in a sub-optimal fashion. A greedy sequential router or routing net-by-net suffers from slow convergence and does not scale well with design complexity. There is a certain degree of uncertainty and unpredictability to the final routing result since its result is net order dependent. A multi-commodity flow router attempts to solve this net ordering problem by routing all the nets concurrently, but often fails to minimize congestion. More importantly, it suffers from long run-time due to the size and complexity of the optimization model, routing-tree construction algorithms also suffer from sub-optimality by considering timing-driven Steiner-tree construction, layer assignment and buffer insertion separately. Addressing these issues due to the complexity of the overall global routing problem requires careful planning of runtime-quality trade-offs. In this dissertation, a new global routing solution to address the forementioned issues is implemented and presented. The approach to the global routing problem was to construct a performance-driven routing-tree simultaneously considering layer assignment and buffer insertion, and then optimizing the congestion, within the bounding box of the net to meet the projected delay requirements, using a mixed-integer programming routing model and a novel network-flow model. The global routier utilizes a new performance-driven buffered routing-tree construction algorithm, a mixed-integer programming pre-routing stage and a new network-flow routing model to complete the task. In the first stage, we present a routing-tree construction algorithm that considers multi-objectives of performance, power and congestion concurrently. In contrast to the traditional algorithms which assume underlying routing-tree already exists, a concurrent Steiner-tree construction, layer assignment and buffer insertion algorithm is used to obtain a better overall result in terms of delay violations and routing resources. As routing-trees are being constructed, critical nets are assigned to the appropriate metal layers and/or buffers are inserted to achieve a positive slack at sinks. Congestion is measured with balanced usage of routing resources among layers. An accurate delay calculation engine using an asymptotic waveform evaluation model is used to obtain delay violations at sinks. Simultaneous buffer insertion and layer assignment tends to produce routing-trees with shorter overall length. After synthesizing the routing-trees, We use a three-stage global-routing algorithm based on mixed-integer programming and a novel network-flow model. While the performance tuned routing-trees provided by our Steiner-tree construction algorithm are routed by mixed-integer programming model considering the congestion on global level, the network-flow model based router performs more localized congestion optimization. Furthermore, we introduced various methods to improve the routing quality sin terms of cross-talk and via count. The second stage provides a rough two-bend routing and manages congestion at the global level. This is achieved by utilizing zero-slack values of mixed-integer programming to determine if a routing segment can be minimized further. Since routing and congestion information was not known to the first stage, some nets might fail to route. A layer assignment algorithm based on quadratic programming is used to further tune the layer assignment. In the third stage, we center on the congestion on optimization and via reduction by bisecting the layout hierarchically for each routing layer. A novel network-flow model, which can be solved hundreds of times faster than a mixed-integer programming model, is used to minimize the congestion through a cut line. Similar to the second stage, zero-slack flows are used to determine which routing segment can be further optimized. Via reduction is achieved by assigning cost values to the network arcs such that bends have higher cost. In the fourth stage, a series of heuristics such as edge flipping, detour and a maze router are used to clear the remaining overflows and via pad violations. The results from various test cases including a subset of the routes on a commercial 64-bit microprocessor core show that our method outperforms commercial CCT router. On average, we achieved 29% less delay violations, 40% less repeater usage on the resulting routing-trees, 20% less maximum delay violation and better congestion distribution.

Description

Rights Access

Subject

electrical engineering

Citation

Endorsement

Review

Supplemented By

Referenced By