

The effect of the optimiser on the buffer design is illustrated in Fig. 2, for various types of buffer designed to drive an end load of 100,000. The interstage ratio  $\beta_s$  is the ratio of the size of a stage to the size of the preceding stage, as illustrated in the inset of Fig. 2. The final value on each curve is the ratio of the capacitance of the end load to the capacitance of the last stage and is referred to as the 'end ratio'  $\beta_e$ . The fastest buffer is a fixed taper buffer which has been optimised for speed without any regard for area. The interstage ratio is constant throughout the buffer, having a value of 2.61. This results in a buffer with an area of 62082 and a delay of 31.32. The optimised taper buffer has been designed for minimum area under the constraint that the normalised delay is 41.42, compared to 31.32 for the fastest buffer. The interstage ratios increase smoothly to the value of the end ratio. There are also fewer stages, with the result that a normalised area of 7021 is obtained, which is a factor of nine less than that of the fastest buffer. For comparison a variable taper buffer was designed under the same constraint. Even fewer stages are used, but in spite of this, the area of the variable taper buffer (8936) is considerably greater than that of the optimised taper buffer (7021).



Fig. 3 Area saving of optimised taper buffer compared with fixed taper buffer in which  $\beta_n = \beta_e = \text{constant}$

Traditional fixed taper buffers can also be designed to fit into a limited silicon area. One approach is to increase the size of all the interstage ratios, including the end ratio, by the same amount, thereby allowing fewer stages to be used and reducing the area. Fig. 3 shows the saving in area that can be obtained with the optimised tier buffer compared with this type of area constrained fixed taper buffer. The area saving is ~50% for most of the buffers considered, which is even larger than those shown in Fig. 1.



Fig. 4 Area saving of optimised taper buffer compared with fixed taper buffer in which  $\beta_n = \text{constant}$ , but  $\beta_e > \beta_n$

An alternative approach to the design of area constrained fixed taper buffers is to allow the end ratio to increase by a larger amount than the other interstage ratios, thereby allowing a further reduction in buffer area. A comparison of this type of buffer with the optimised taper buffer is shown in Fig. 4. The area saving is ~8% for buffer delay constraints in the range 1.1–2.

**Acknowledgments:** The authors would like to thank the Science and Engineering Research Council for the award of a research studentship.

© IEE 1993  
Electronics Letters Online No: 19931435

P. Routley, A. Brunschweiler and P. Ashburn (Department of

2188

15 October 1993

ELECTRONICS LETTERS 9th December 1993 Vol. 29 No. 25

Electronics and Computer Science, University of Southampton, Southampton, Southampton SO9 5NH, United Kingdom)

## References

- 1 JAEGER, R.C.: 'Comments on an optimised output stage for MOS integrated circuits', *IEEE J. Solid State Circuits*, 1975, 10, (3), pp. 185–186
- 2 VEMURU, S.R., and THORBJORNSEN, A.R.: 'Variable-taper CMOS buffer', *IEEE J. Solid State Circuits*, 1991, 10, (9), pp. 1265–1269
- 3 WALSH, G.R.: 'Methods of optimisation' (John Wiley & Sons Inc.), 1985
- 4 PRUNTY, C., and GAL, L.: 'Optimum tapered buffer', *IEEE J. Solid State Circuits*, 1992, 27, (1), pp. 118–119

## New domino logic precharged by clock and data

J.-R. Yuan, C. Svensson, and P. Larsson

*Indexing terms: CMOS integrated circuits, Logic circuits*

A clock-and-data precharged dynamic (CDPD) circuit technique in CMOS is presented. It gives a fast one-clock-cycle decision to multilevel logic and has small clock loads, low peak current, small area and low power-delay product. The technique is highly flexible in logic design. For the given example, a 324bit binary-lookahead carry chain, the speed improvement can be as high as 40–50% compared to the static circuit and 30% to the normal domino circuit arrangements while the area is reduced by 15–30%.

**Introduction:** In CMOS circuit techniques, high clock rates can be achieved by true single phase clocking (TSPC), device sizing and extreme pipelining, as described in [1]. In this Letter, we will show that a fast one-clock-cycle decision for multilevel logic can be achieved by a new circuit topology called the clock-and-data precharged dynamic CMOS circuit technique or CDPD circuit technique.

The CDPD circuit technique is evolved from the existing domino circuit technique [2], which has low maximum clock rate and from the existing TSPC circuit technique [1]. By removing redundant elements, such as domino inverters and pipeline latches, and introducing data precharged dynamic stages, the CDPD circuit technique exhibits a superior speed for multilevel logic to make decisions in one clock cycle, necessary for cases in which the results have to be used in the next clock cycle, e.g. an ALU in a traditional microprocessor. Also, the supply current peaks and the power consumption are considerably reduced compared with the existing domino technique.



Fig. 1 Domino logic and its equivalent replacement by efficient CDPD logic

(a) Domino logic  
(b) Equivalent CDPD logic

**CDPD circuit technique:** A typical  $n$ -domino chain is shown in Fig. 1a. Two features can be mentioned. First, inverters are used between two clock precharged stages, which prevents charge loss but creates delay overheads. Secondly, all nodes in the logic network, e.g. the NAND logic, in a clock precharged stage have to be precharged to prevent charge sharing, e.g. node  $M$  has to be precharged by an extra  $p$  transistor. It is also worth pointing out that the evaluation transistors marked by \* are redundant. In Fig. 1 as well as in Figs. 2 and 3, PH or PL stands for a precharged-to-high

or a precharged-to-low signal. The complete function of the circuit in the dashed-line box in Fig. 1a can be replaced by a data precharged H/L stage with only three transistors shown in Fig. 1b. A H/L (or a L/H) stage is a stage in which inputs are precharged to high (or low) and output is precharged to low (or high) by data rather than clock. It has been known that an odd number of static stages can be placed between two dynamic (clock precharged)  $n$ -stages [3]. Therefore, we can derive the replacement from, first, replacing the box by a static NOR gate and, secondly, modifying the NOR gate to the H/L stage. Not all, but most of the static gates, as will be shown later, can be modified to dynamic H/L or L/H stages.



Fig. 2 Latched CDPD logic with cascaded H/L and L/H stages

The H/L and L/H stages can be cascaded in a domino chain between two clock precharged stages, which is shown in Fig. 2. Furthermore, such a chain can be terminated by a TSPC latch [1] to hold the output data during its precharge phase. An odd number of stages is required between two clock precharged stages while an even number of stages is required between a clock precharged stage and a TSPC latch. As long as the 'odd' and the 'even' rules are satisfied, the number of data precharged or clock precharged stages is flexible. However, if a quick precharge is needed, there should be enough clock precharged stages and they should have the evaluating transistors (those with \*) implemented.



Fig. 3 pn CDPD logic chain

The chain shown in Fig. 2 is an  $n$  chain which needs a separate precharge phase. To use one clock cycle more efficiently,  $pn$  logic, particularly TSPC  $pn$  logic, is preferred. A cascaded CDPD  $pn$  logic chain is shown in Fig. 3. This is similar to a TSPC chain [1] except the H/L and L/H stages. Because of the delay introduced by these extra stages, the intermediate latch, e.g. the  $p$  latch in the dashed-line box, is optional if the evaluation delay of the successive clock precharged  $n$  stage is less than the precharge delay of the  $p$  chain. This gives logic flexibility, and when the latch is removed the circuit speed is further increased.



Fig. 4 32 bit binary-lookahead-carry chain in CDPD logic

- [1] In an  $n$ -chain:  $n$  - H/L - L/H - H/L - L/H - H/L
- [2] In a  $pn$ -chain:  $p$  - L/H - H/L -  $n$  - H/L - L/H

**Application example:** A binary-lookahead-carry (BLC) evaluation chain [4] is a good example in which to apply the CDPD logic because a one-clock-cycle decision is often important. A 32bit BLC chain is shown in Fig. 4, in which the  $g$  and  $p$  generators are included (cell 1) and inverse logic is used. In CDPD logic, the chain can be implemented either by an  $n$  chain or by a  $pn$  chain. Because the clock precharged  $n$  stage (indicated by  $n$ ) and  $p$  stage (indicated by  $p$ ) are normal, their circuits will not be given. In Fig. 5, we give two basic logic cells constructed in charge-sharing-free



Fig. 5 Two basic logic cells (see Fig. 4) in H/L and L/H stages

H/L and L/H stages. In both cases, each of the two basic cells contains eight transistors instead of 10 as in a normal static stage. This demonstrates that, in most cases, it is feasible to modify a static stage to either an H/L or an L/H stage. The worst load is given to  $p_2$  (or the inverted  $p_2$ ) because the  $p$  signal is not the critical delay path. For comparison, the same BLC chains in static, domino, CDPD  $n$  chain and CDPD  $pn$  chain techniques were simulated by using the typical parameters of a  $1.0\mu\text{m}$  CMOS process and minimum size transistors. The results are summarised in Table 1. Note that, in simulations, the static chain is terminated by a classic master-slave latch whereas the others are terminated by TSPC latches and each chain has a unit inverter load.

Table 1: Comparison between different techniques

| Technique | Transistors | Delay      | Clk load | $I$      | $P/P_s^{**}$ |
|-----------|-------------|------------|----------|----------|--------------|
| Static*   | 582         | ns<br>3.33 | 8        | mA<br>32 | 1            |
| Domino    | 788         | 2.4        | 127      | 55       | 2            |
| CDPD $n$  | 505         | 1.7        | 65       | 29       | 1.07         |
| CDPD $pn$ | 512         | 1.86       | 73       | 33       | 0.96         |

\* Terminated by a classic master-slave latch

\*\* Relative power consumption to static technique

**Conclusions:** A fast one-clock-cycle decision can be achieved by the clock-and-data precharged dynamic (CDPD) circuit technique introduced in this Letter. The CDPD circuit technique has other advantages such as smaller clock loads, lower peak current (low noise), less area (small transistor count) and lower power consumption compared to the existing domino technique. It is also very flexible in logic design due to many available options. The application example, a 32bit BLC chain, demonstrates its superior speed, small area and low power-delay product compared to both static and domino techniques.

© IEE 1993

15 October 1993

Electronics Letters Online No: 19931437

J.-R. Yuan, C. Svensson, and P. Larsson (LSI Design Center, IFM, Linköping University, S-581 83 Linköping, Sweden)

#### References

- 1 YUAN, J., and SVENSSON, C.: 'High speed CMOS circuit technique', *IEEE J. Solid-State Circuits*, 1989, SC-24, pp. 62-70
- 2 KRAMBECK, R.H., LEE, C.M., and LAW, H.S.: 'High-speed compact circuits with CMOS', *IEEE J. Solid-State Circuits*, 1982, SC-17, pp. 614-619
- 3 GONCALVES, N.F., and DE MAN, H.J.: 'NORA: A racefree dynamic CMOS technique for pipelined logic structures', *IEEE J. Solid-State Circuits*, 1983, SC-18, pp. 261-266
- 4 BRENT, R.P., and KUNG, H.T.: 'A regular layout for parallel adders', *IEEE Trans.*, 1982, C-31, pp. 260-264