Exzellenzcluster Cognitive Interaction Technology Kognitronik und Sensorik Prof. Dr.-Ing. U. Rückert

# Design of Low-power Digital Circuits in the Sub-threshold Domain

zur Erlangung des akademischen Grades eines

## DOKTOR-INGENIEUR (Dr.-Ing.)

der Technischen Fakultät der Universität Bielefeld

genehmigte Dissertation

von

## M.Sc. Saikat Chatterjee

Referent: Prof. Dr.-Ing. Ulrich Rückert

### Acknowledgment

I am indebted to my advisor, Professor Dr. -Ing. Ulrich Rückert, who helped me with his intellectual and constructive suggestions throughout the research. I am grateful to him for his guidance, especially when it was very much required. He was the source of motivation all throughout my work. It is really worth mentioning that without his great cooperation it would not be possible to accomplish my research goals. My heartiest appreciation goes to Prof. Dr. -Ing. Mario Porrmann and Dr. -Ing. habil. Thorsten Jungeblut for their guidance and support during the early days of my research.

Next, I would like to thank the Cognitronics and Sensor Systems Group in Center for Cognitive Interaction Technology (CITEC), Bielefeld. It is not only about the skills and training I have received here. The first of my special thanks go to Daniel Wolf, without whom it was impossible to continue my work seamlessly. The second special thanks is reserved for Cordula Heidbrede. It is almost impossible to describe her support in words.

I would like to convey my sincere thanks to Deutsche Akademische Austausch Dienst (DAAD) for their doctoral grant. I would also like to tanks Bielefeld University for their administrative support.

My parents and my sister had been a huge support all along my life. I owe them a lot. Their constant encouragement and support helped me to remain inspired about my research. I hope I can follow their ideology, inspiration and motivation throughout my entire life.

I am grateful to my wife and daughter: Sarbani and Shuraya for being my moral and emotional support in every way possible. I owe a lot to Sarbani, not only for her immense contributions to our family during my research stay, but also for the sacrifices she did towards my success.

The contribution from my mentor KK (Krish Kumar) have had a huge impact lately in my life. He is such a wonderful friend, philosopher and guide for me. He is one of happiest person I know when I am writing the last few lines of my doctoral thesis.

Living in a foreign country while working on doctoral research is indeed a tough job. However, friends like Eva, Julia, Alba, Alessio, Meysam, Mert, Sherzod made my life easier in CITEC. They had been more than my colleagues here. They hardly let me feel an outsider here.

In Bielefeld I have found wonderful people like Werner. He was my neighbour, friend and opa of my daughter as well. Thanking him would be unnecessary, as he is now part of our family. It is not possible to forget the friendship of Mindas. I was lucky to have him around during our sports evenings.

Thanks to DAAD, I came to know, Dr. Nazia Yasmin, who always helped me with sharing her valuable experiences in research. I owe her a lot for the immense support I have received. Doctoral research is just not about work and survival, but also about making it better. This I learnt from my childhood friend, Deep, who was also doing PhD in Erasmus Medical School, Rotterdam during this time. Thanks to him, I have met nice friends like Reza. He is one of those rare gems, without mentioning them I cannot finish the ackowledgement.

The contribution of my friends, Anupama and Mainack to my emotional health during the stay in Bielefeld is beyond description. Their confidence on me along with continuous advocacy were unimaginable. They are proud of me! And I am proud of them as well.

### Abstract

The demand of low power consumption in microelectronics circuits has increased significantly as submicron technologies scale down. Battery operated portable applications are in high demand across the industries such as automotive, medical, MEMS, telecommunication and so on. Subthreshold operations provides the potential solution to the energy consumption problem. However, it comes at the price of significant degradation of performance. Since, it needs to reduce the maximum operating clock frequency in subthreshold operation. An effective solution proposed to such problem was to have different voltage islands. Here, the non-critical power constrained blocks could run in subthreshold domain with the critical ones operating in above threshold domain. Our work proposes a solution to combat such a problem.

Our work consists of three parts; 1) a standard cell library optimized for subthreshold operation, 2) couple of level shifter circuits capable of sub-threshold to above threshold voltage conversion and 3) a subthreshold memory array. We have used the 28nm FD-SOI technology from ST Microelectronics.

The standard cell library contains basic design cell units, whose dimensions are optimized for subthreshold operation at a low fixed frequency. The optimization process is based upon a multiobjective optimization methodology. Propagation delay, switching power, static power dissipation and, noise margin are the parameters, chosen for multiobjective optimization. We generated separate sets of dimensions for each cells due to the optimization algorithm, which helped us to design the parametric cell (p-cell). We used the p-cell approach as it helps to standardize the cell design reducing the development time. The library is ready for the synthesis of low power blocks. The library is developed with two different variants of transistors, namely RVT and LVT. They have different threshold voltages. The RVT library has 21 combinatorial logic cells, 4 sequential logic cells and 11 clock circuits. The LVT library has only combinatorial cells.

We designed two different level shifter circuits, which are performance optimized with the library cells in terms of voltage and frequency. The leakage current loss was kept in mind while doing the design. We have followed the circuit mirror topology for one of the circuits. This circuit can operate convert an input of 250 mV to 1 V output. The leakage power of this circuit is 37 pW. The second level shifter circuit is a combination of both circuit mirror and cross-coupled PMOS topologies. This particular circuit can operate even at 150 mV supply, with the leakage power being 107 pW.

Memory circuits consume a lot of energy. Although it is difficult to reduce the operating voltage of memory cells, without diminishing the performance. Here, we proposed a 4x8 SRAM array, which can operate with a minimum supply of 250 mV and a maximum frequency of 3.3 MHz. During the read operation, the energy consumption of the memory cell is 0.107 fJ at 3.33 MHZ.

## Contents

| 1 | Intro        | oduction 1                                                                                                                      |
|---|--------------|---------------------------------------------------------------------------------------------------------------------------------|
|   | 1.1          | Overview                                                                                                                        |
|   | 1.2          | Motivation                                                                                                                      |
|   |              | 1.2.1 Subthreshold Design                                                                                                       |
|   |              | 1.2.2 Standard Cell Library 2                                                                                                   |
|   |              | 1.2.3 Level Shifter                                                                                                             |
|   |              | 1.2.4 Low Power Static Random Access Memory (SRAM) 4                                                                            |
|   |              | 1.2.5 FDSOI Technology                                                                                                          |
|   | 1.3          | Outcomes and Contributions 5                                                                                                    |
|   | 1.4          | Thesis Outline    6                                                                                                             |
|   | 1.5          | Publications                                                                                                                    |
| n | Doo          | ian Space Exploration                                                                                                           |
| 2 | 2 1          | Multiplicative Optimization                                                                                                     |
|   | 2.1          |                                                                                                                                 |
|   |              | $2.1.1  GAIO \dots \dots$ |
|   |              | $2.1.2  \text{NOGA-II}  \dots  \dots  \dots  \dots  \dots  \dots  \dots  \dots  \dots  $                                        |
|   | $\mathbf{r}$ | 2.1.5 SFEAZ                                                                                                                     |
|   | 2.2          | 221 Propagation Dolar 12                                                                                                        |
|   |              | 2.2.1 Flopagation Delay 15                                                                                                      |
|   |              | 2.2.2 FOWER                                                                                                                     |
|   |              | 2.2.5 Effetgy                                                                                                                   |
|   |              | 2.2.4 Noise Margin                                                                                                              |
|   | 23           | 2.2.0 Alea                                                                                                                      |
|   | 2.0          |                                                                                                                                 |
| 3 | Lev          | el Shifter 25                                                                                                                   |
|   | 3.1          | Up level shifter                                                                                                                |
|   | 3.2          | Conventional level converter                                                                                                    |
|   | 3.3          | State of Art                                                                                                                    |
|   |              | 3.3.1 cross-coupled pMOS                                                                                                        |
|   |              | 3.3.2 Current Mirror                                                                                                            |
|   | 3.4          | Scale Down Mechanism 49                                                                                                         |
|   |              | 3.4.1 Preliminary Examinations                                                                                                  |
|   |              | 3.4.2 Optimization Results                                                                                                      |

|   | 3.5 | Proposed Designs       5         3.5.1       LVT cell based design in FDSOI technology       5         3.5.2       RVT cell based design in FDSOI technology       5         3.5.3       Hybrid topology based design in FDSOI technology       5         Down       Lowerter       5 | 55678    |  |  |  |
|---|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|--|--|--|
|   | 5.0 |                                                                                                                                                                                                                                                                                       | 0        |  |  |  |
| 4 | Sim | mulation and Comparison of the Proposed Level Shifter Circuits                                                                                                                                                                                                                        |          |  |  |  |
|   | 4.1 | Simulation Results                                                                                                                                                                                                                                                                    | 1        |  |  |  |
|   |     | 4.1.1 Current Mirror Circuits                                                                                                                                                                                                                                                         | 1        |  |  |  |
|   | 4.0 | 4.1.2 Hybrid Topology Circuit                                                                                                                                                                                                                                                         | 6        |  |  |  |
|   | 4.2 | Performance comparison of the state of art designs                                                                                                                                                                                                                                    | 10<br>1  |  |  |  |
|   |     | 4.2.1 Supply voltage and Conversion Kange                                                                                                                                                                                                                                             | בי<br>רי |  |  |  |
|   |     | 4.2.2 Operating Frequency and Static Power Dissipation 7                                                                                                                                                                                                                              | 2<br>7   |  |  |  |
|   |     | 4.2.5 Switching Energy and Static Power Dissipation 7                                                                                                                                                                                                                                 | ∠<br>′2  |  |  |  |
|   |     | 4.2.4 Chip Alea                                                                                                                                                                                                                                                                       | 5        |  |  |  |
| 5 | Low | Power SRAM 8                                                                                                                                                                                                                                                                          | 1        |  |  |  |
|   | 5.1 | SRAM market trend                                                                                                                                                                                                                                                                     | 51       |  |  |  |
|   | 5.2 | Power Reduction Techniques                                                                                                                                                                                                                                                            | 2        |  |  |  |
|   |     | 5.2.1 Manipulation of supply voltage                                                                                                                                                                                                                                                  | 3        |  |  |  |
|   |     | 5.2.2 Read/Write Assist Circuitry and Bitline and Wordline Signal                                                                                                                                                                                                                     |          |  |  |  |
|   |     | Manipulation                                                                                                                                                                                                                                                                          | 3        |  |  |  |
|   |     | 5.2.3 Bitline Leakage Reduction                                                                                                                                                                                                                                                       | 7        |  |  |  |
|   |     | 5.2.4 Transistor Level Techniques                                                                                                                                                                                                                                                     | 9        |  |  |  |
|   |     | 5.2.5 Subthreshold Bitcell Design                                                                                                                                                                                                                                                     | 1        |  |  |  |
|   |     | 5.2.6 Application Specific Techniques                                                                                                                                                                                                                                                 | 8        |  |  |  |
|   | 5.3 | Operating Principle                                                                                                                                                                                                                                                                   | 0        |  |  |  |
|   | 5.4 | SRAM Array and associated circuits                                                                                                                                                                                                                                                    | 1        |  |  |  |
|   |     | $5.4.1$ Address and Data Buffers $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $10$                                                                                                                                                                                           | 1        |  |  |  |
|   |     | 5.4.2 Row Decoder Design $\dots$ 10                                                                                                                                                                                                                                                   | 13       |  |  |  |
|   |     | 5.4.3 Read/ Write Column Decoder and Write Driver 10                                                                                                                                                                                                                                  | 13       |  |  |  |
|   |     | 5.4.5 Control Circuito                                                                                                                                                                                                                                                                | 5        |  |  |  |
|   | 55  | Proposed Design                                                                                                                                                                                                                                                                       | 0        |  |  |  |
|   | 5.5 | Simulation Results 11                                                                                                                                                                                                                                                                 | .U<br>3  |  |  |  |
|   | 5.0 |                                                                                                                                                                                                                                                                                       | 0        |  |  |  |
| 6 | Sub | nreshold Library 11                                                                                                                                                                                                                                                                   | 7        |  |  |  |
|   | 6.1 | Standard Cell Organization                                                                                                                                                                                                                                                            | 7        |  |  |  |
|   |     | 6.1.1 Physical Design                                                                                                                                                                                                                                                                 | 7        |  |  |  |
|   |     | 6.1.2 Logical Design                                                                                                                                                                                                                                                                  | 8        |  |  |  |
|   |     | 6.1.3 Power Options                                                                                                                                                                                                                                                                   | 8        |  |  |  |
|   |     | 6.1.4 Dimensions                                                                                                                                                                                                                                                                      | 8        |  |  |  |

| 6.2 Design Flow                   |                            |                                              |     |  |  |  |  |  |
|-----------------------------------|----------------------------|----------------------------------------------|-----|--|--|--|--|--|
|                                   | 6.3                        | Standard-Cell-Based Development Process      | 120 |  |  |  |  |  |
| 6.4 Standard cell library designs |                            |                                              |     |  |  |  |  |  |
| 6.5 Low Power Libraries           |                            |                                              |     |  |  |  |  |  |
|                                   | 6.6                        | Library Components                           | 122 |  |  |  |  |  |
|                                   |                            | 6.6.1 Combinational Logic                    | 122 |  |  |  |  |  |
|                                   |                            | 6.6.2 Sequential Logic                       | 125 |  |  |  |  |  |
|                                   |                            | 6.6.3 Clock Tree Elements                    | 127 |  |  |  |  |  |
|                                   |                            | 6.6.4 Level Shifter                          | 127 |  |  |  |  |  |
|                                   |                            | 6.6.5 Place and Route Cells                  | 128 |  |  |  |  |  |
|                                   | 6.7                        | Subthreshold Design Methodology              | 129 |  |  |  |  |  |
|                                   | 6.8                        | Developed Standard Cell Libraries            | 130 |  |  |  |  |  |
|                                   |                            | 6.8.1 65 nm CMOS Technology                  | 130 |  |  |  |  |  |
|                                   |                            | 6.8.2 28 nm FDSOI Technology                 | 131 |  |  |  |  |  |
|                                   | 6.9                        | Characterization                             | 135 |  |  |  |  |  |
|                                   |                            | 6.9.1 Delay Modelling                        | 138 |  |  |  |  |  |
|                                   |                            | 6.9.1.1 Non-linear Delay Model (NLDM)        | 138 |  |  |  |  |  |
|                                   |                            | 6.9.1.2 Composite Current Source(CCS)        | 138 |  |  |  |  |  |
|                                   |                            | 6.9.1.3 Effective current source model(ECSM) | 138 |  |  |  |  |  |
|                                   |                            | 6.9.2 Timing Model                           | 138 |  |  |  |  |  |
|                                   |                            | 6.9.3 Power Modelling                        | 139 |  |  |  |  |  |
|                                   | 6.10                       | Liberate Tool Flow                           | 141 |  |  |  |  |  |
|                                   | 6.11                       | Comparison between the two libraries         | 141 |  |  |  |  |  |
| 7                                 | Con                        | Conclusion and Future Work                   |     |  |  |  |  |  |
|                                   | 7.1                        | Conclusion                                   | 143 |  |  |  |  |  |
|                                   | 7.2                        | Future work                                  | 144 |  |  |  |  |  |
| Lis                               | st of I                    | Figures                                      | 145 |  |  |  |  |  |
| Lis                               | st of ]                    | Tables                                       | 149 |  |  |  |  |  |
| ۸                                 |                            |                                              | 454 |  |  |  |  |  |
| Acronyms                          |                            |                                              |     |  |  |  |  |  |
| Bibliography                      |                            |                                              |     |  |  |  |  |  |
| A.                                | A. Characterization Script |                                              |     |  |  |  |  |  |
| В.                                | B. Datasheet               |                                              |     |  |  |  |  |  |

## **1** Introduction

## 1.1 Overview

The integrated circuit is probably one of the most amazing inventions of human being. In 1948, the first transistor was announced. However, Fairchild Semiconductor Corporation shipped the first commercially available planar integrated circuit in March 1961. It consisted one transistor, three resistors and a capacitor [1]. It highly had an impact worldwide. With decades of development and refinement processes, integrated circuit attained the success on the path of smaller, faster and cheaper solutions. Commercial shipments of devices with gate lengths less than 100 nm started in the year 2000, essentially denoting the end of the Microelectronics era and the beginning of the Nanoelectronics generation [2]. Right now in 2019, device development has already started in as small as 5 nm and 3 nm technologies. Even research exploration has been started for the next technology which can take the integrated circuit beyond the current point.

Certainly, this journey has been full of daunting challenges. The following issues have been identified among the long list of technical challenges:

- The rise in power density at each successive technology node;
- The necessity to find new architectures to get rid of bottlenecks at interconnects;
- Increase in cost due to difficulties in both lithography and fabrication;
- Demand for more complex structures such as SOI or dual gate transistors to circumvent short channel effects.

It was clearly stated by the 2003 ITRS that, a nexus between power consumption and architecture is highly likely to take place. The links between power and device structure need to be explored. This is the primary motivation of this thesis.

### 1.2 Motivation

### 1.2.1 Subthreshold Design

Subthreshold operation has become a well established region of operation for digital circuits when ultra low power circuit operation is in demand, and speed is of secondary importance [3]. Subthreshold operation implies that the gate-source voltage,  $V_{\rm gs}$ , and the power supply voltage,  $V_{\rm dd}$ , is below the absolute value of the transistor's threshold voltage,  $|V_{\rm T}|$  [4]. By reducing the power supply voltage down to the subthreshold region, a large decrease in static and dynamic power consumption can be achieved. However, a decrease in power supply voltage also contributes to circuit reliability issues. Process and mismatch variations gain an increased impact on circuit behaviour due to exponential current dependencies in the subthreshold region. The subthreshold current  $I_{\rm D_{sub}}$  is given by [4]

$$I_{D_{sub}} = I_0 e^{\frac{V_{gs} - V_T}{nV_{th}}} (1 - e^{\frac{-V_{ds}}{V_{th}}})$$
(1.1)

where  $I_0$  is the drain current when  $V_{gs} < V_T$ 

$$I_0 = \mu_0 C_{ox} \frac{W}{L} (n-1) V_{th}^2$$
(1.2)

The exponential current dependencies may lead to large variations in propagation delay, which make it hard to determine the design specification predictability [5]. Such specification predictabilities range from  $I_{ON} = I_{OFF}$  ratio variations to setup and hold time violations in sequential logic, and are highly dependent on circuit topology choice. Even though subthreshold operation involves several design challenges, the operating region still remains an attractive method for achieving ultra low power CMOS circuits, given that proper design techniques are employed.

### 1.2.2 Standard Cell Library

With the growing complexity of circuit design, it is becoming increasingly impractical to design logic circuits by hand. Therefore, the use of automatic synthesis tools has become mandatory. In general, synthesis tool-based designs are performed using the following steps:

- Description of circuit behaviour in some high-level language, such as VHDL and Verilog
- Compilation of behavioural description into a logical netlist using logic synthesis tools

• Translation of the logical netlist into a geometric netlist, followed by placement and routing, with Placement-and-Routing (PNR) tools

The second step presumes that the design environment already contains some descriptions of some structural logic primitives (e.g. primitives for NAND gates, latches, flipflops, etc), as those primitives will comprise the netlist produced by the synthesis tool. Similarly, the last step presumes that the translation of a netlist to geometric shapes is already defined for the design environment, i.e., the logic primitives referred to by the netlist is already present in some physical library. Hence, for the design environment, a library which contains both physical (i.e., layout) primitives and logic primitives, which correspond to those structural primitives must already be present.

Therefore, with this design method, it is mandatory that a standard cell library be present. Further, the standard cell library should, at the minimum, consist of:

- layout
- other geometric descriptions as needed by the PNR tools, if the full layout is deemed too complicated for this purpose
- list of logic primitives which correspond to those cells, including pinout
- logic description libraries, both for synthesis and simulation purposes, which features simplified timing and power dissipation modelling capabilities

The last point deserves some clarification. While more accurate information (timing and power dissipation) could be obtained through the use of a commercially available circuit simulation program such as SPICE, the runtime tends to be prohibitively lengthy for large circuits. Further, at this design stage, it is often unnecessary to obtain, for instance, a power dissipation estimate which is accurate to within 5%. Hence, the use of simplified models, with their reduced accuracy but improved simulation speed, is the norm.

Therefore, the need of the hour is to develop a standard cell library which is capable of operating in subthreshold domain. For subthreshold operation, the library needs to be optimized for an operating point in terms of input voltage and operating frequency. The optimization parameters can be chosen depending on the requirement of the design such as delay, power etc.

### 1.2.3 Level Shifter

With extremely low voltage operation in subthreshold logic, energy consumption is definitely improved significantly. But, this ultra low voltage is quite low compared to the high I/O supply voltage. This problem cannot be solved reducing the I/O supply. Due to the large impedance load and high noise immunity requirement

for I/O circuit, I/O supply voltage is not scalable. Therefore, a level shifter circuit can be a realistic solution here.

Though subthreshold operation helps to reduce the power consumption, it comes at the cost of performance. This issue can be solved partially by performance optimization of the individual cells. Multi-supply voltage systems can add an advantage here in terms of performance. The system can be divided into two parts, with different functional units being operated at different voltages. Thus, the critical algorithm running modules would run at a higher voltage maximizing the performance. Simultaneously, the power efficiency would be improved by operating all other non-critical modules at subthreshold voltage. Puri et. al reported in [6] that optimized multiple input supply voltage with multiple threshold voltage designs provide a dramatic dynamic power reduction by 40-50% as compared to the original single input design. It is important here to interface the critical cells at higher voltage effectively with the non-critical ones operating at subthreshold voltage. This is possible when the level shifter circuit can fully turn off the PFET of the driven gate and if required, ensures that no gate oxide voltage exceeds the reliability limits set by the technology node.

### 1.2.4 Low Power Static Random Access Memory (SRAM)

Static random access memory (SRAM) plays a key role in many digital systems, supporting volatile storage in applications such as instruction memory, data memory, cache, FIFOs, register files and scratchpad memories. The ability to reduce the supply voltage of SRAM modules is interesting for several reasons;

- to reduce leakage during inactive standby modes while retaining the contents of memory,
- to reduce access energy when only low throughput is required, and
- to operate at the same supply voltage as other intra-die ultra low voltage circuits.

The minimum operating voltage of SRAM is often considered as the limiting factor while scaling down the supply voltage of the digital circuits [7]. With the power budget reducing for electronics system and the memory circuits occupying a significant portion of such systems, SRAM circuits needs to be developed which are capable of running in subthreshold domain.

### 1.2.5 FDSOI Technology

Fully-Depleted SOI (FDSOI) technology is considered to be a promising candidate for low to ultra-low power system as it provides high speed at low voltage. The

main difference with the bulk CMOS is the buried oxide (BOx) which insulates the well from the channel. The silicon layer/channel is fully depleted, as it does not contain any active charge carriers. Some of the potential benefits of this structure are

- Improved Junction capacitance. Lower parasitic capacitance, i.e., lower source-drain capacitance thanks to dielectric isolation.
- Better Electro-static control of the channel, which results in a near ideal subthreshold slope of 60 mV/decade [8], and reduced Drain Induced Barrier Lowering (DIBL)
- Improved threshold variation as the channel is not doped. One of the major causes of threshold variation is Random Dopant Fluctuation (RDF). Thus, variability coefficient for transistors with same size is 2-3 times less for FDSOI [9]
- The transistor is controlled through two independent gates. The Vt can be modulated by applying back-bias to the back plane (BP).

This technology is fully compliant with already available designs for bulk technologies as it ensures a successful porting of IPs. Conventional design techniques used for dynamic power reduction, as Adaptive Voltage Scaling (AVS) and Adaptive Body-Bias (ABB), are fully compatible and much more efficient with FDSOI technology [10]. Due to simplest process steps (no halo, no threshold implant in the channel), extra cost is compensated because of SOI substrate, essentially making it a low cost technology.

## **1.3 Outcomes and Contributions**

The outcome of the work reported in this thesis, are as follows:

- The standard cell library, which was optimized for subthreshold operation at 300 mV and 200 kHz. Two separate libraries are developed using both the LVT and RVT variants of FDSOI technology.
- Pareto search based design exploration to combat the limitation of scalability in sub-nanotechnology nodes, especially in different technology.
- Robust level shifter circuits which are capable of operating from subthreshold level to above threshold level
- Development of low power SRAM cells needed for the realization of subthreshold system.

## 1.4 Thesis Outline

The whole dissertation work is subdivided into three major works. They are the level shifter development, the low power memory design and design of the standard cell library for subthreshold operations. The whole is organized as follows.

- Chapter 2 contains the design space exploration approach and the influence of different parameters on an inverter circuit.
- State of the arts of level shifter circuit is discussed in chapter 3 along with the design space exploration approach executed on level shifter circuit. Two different designs are proposed in this chapter.
- The simulation results of the proposed level shifter circuits are elaborated in chapter 4. It includes a comparative analysis of the available level shifter circuits in terms of operating range, propagation delay, switching energy, static power dissipation and chip area.
- The low power SRAM development is explained in chapter 5, along with the associated circuits required for a smooth operation of the low power memory cell.
- Chapter 6 comprises of the standard cell libraries we developed.
- The whole work is concluded along with our inference and future work scopes in chapter 7.

## 1.5 Publications

The following papers have been published from the work described in this dissertation.

- M. Vohrmann, S. Chatterjee, S. Lütkemeier, T. Jungeblut, M. Porrmann and U. Rückert, "A 65 nm standard cell library for ultra low-power applications," presented at the 22nd European Conference on Circuit Theory and Design (ECCTD), Trondheim, Norway, 2015.
- S. Chatterjee and U. Rückert, "Scaling down a level shifter circuit in 28 nm FDSOI technology," presented at the 4th Joint International EUROSOI Workshop and International Conference on Ultimate Integration on Silicon (EUROSOI-ULIS), Granada, Spain, 2018.

• S. Chatterjee and U. Rückert, "Resource Efficient Sub-VT Level Shifter Circuit Design Using a Hybrid Topology in 28 nm," presented at the International Conference on SMACD and 16th Conference on PRIME, Jena, Germany, 2021.

## **2 Design Space Exploration**

The design of transistors involves around the choice of the transistor dimensions. Theoretically, it is sufficient to maintain the ratio of the transistor width in p-channel Metal Oxide Semiconductor (pMOS) to n-channel Metal Oxide Semiconductor (nMOS) as 2:1, assuming pMOS transistors have twice the resistance of nMOS transistors [11]. However, because of the sensitive nature of the transistors, during subthreshold operation, several other performance parameters cannot be ignored while determining the dimensions. As a result, the design space exploration is an essential step to design circuits optimized for operation in subthreshold domain.

## 2.1 Multiobjective Optimization

A system can have multiple objectives [12]. It is also possible that almost all of those objectives carry equal weightage. The multiobjective optimization approach helps to pay proper attention to the objectives of the system and find out the optimal solution. The parameters which govern the system, are varied to search for trade offs among several resources. These trade off points are known as Pareto set.

In order to define an Multiobjective Optimization Problem (MOP), lets us explain some underlying concepts first. Let  $S \subset \mathbb{R}^n$  be the multidimensional search space and  $F: S \to \mathbb{R}^k$ ,  $F(s) = (f_1(s), ...., f_k(s))$  with  $k \ge 2$  the multidimensional evaluation function of the MOP. The individual vector components of the evaluation function  $f_j: S \to \mathbb{R}$  are called target functions. Each element  $s = (s_1, ..., s_n) \in S$ represents, in the context considered here, a possible alternative implementation of the integrated circuit, where  $s_i$ , for example, can be an independently chosen transistor characteristics. The values  $s_i$  are also referred to as independent design parameters. The quantified properties of a possible circuit implementation s are described by the points  $F(s) = b = (b_1, ..., b_k)$ . The entire collection of these points is defined as the image space as F(s). In the design space search considered here, the evaluation of a function value b = F(s), is equivalent with the execution of an analog circuit simulation; given the design parameters are  $s_i$  and the subsequent determination of the circuit properties  $b_j$ . Thus, the MOP to find out the optimized design space can be mathematically described as

$$\min_{x \in S} F(x) \tag{2.1}$$

F(s) is a vector valued function. Therefore, a suitable concept of comparison must be specified to determine the minima. Given two Pareto points such that  $a, b \in \mathbb{R}^k$ . Here, *adominatesb* if and only if  $a_j \leq b_j \forall j = 1, ..., k$  and  $a \neq b$ . This can be represented by  $a \prec b$ . The solution of the aforementioned MOP is thus given by

$$P = s \in S \mid \nexists t \in S, F(t) \prec F(s) \tag{2.2}$$

The solution set  $P \subset S$  is called Pareto set, the image of P under F as the Pareto front F(P).

It should be noted that the Pareto set usually provides a variety of possible solutions to an optimization problem. If the interest is to find a single solution, a subsequent selection process is necessary to find the desired solution from the Pareto set.

In order to establish the sizing of an integrated circuit as a multi-target optimization problem; the available design parameters, the target functions as well as the basic conditions must be defined first. In the conventional design of standard cells, the channel widths of the transistors are usually used as independent parameters, while the minimum permissible values for the respective technology are used for the channel lengths. This choice offers the smallest surface requirements, the lowest (parasitic) gate capacitance, and the smallest delays when operating at the nominal voltage. In the subthreshold range, however, this concept related to the channel lengths is not optimal for two reasons. On one hand, this leads to relatively large variations in transistor parameters including the threshold voltage and thus also to strong scattering or even malfunctions of the circuit behaviour (On the other hand, in the sub-threshold mode, a non-minimal channel length can have an advantageous effect on the switching characteristics by the Reverse Short Channel Effect (RSCE). For the design of sub-threshold gates as available design parameters, the widths and lengths of all transistors are in principle considered.

To solve the abovementioned MOP, appropriate algorithms are required to approximate the Pareto set. Three different algorithms are utilized for this task, namely Global Analysis of Invariant Objects (GAIO) [13], Strength Pareto Evolutionary Algorithm 2 (SPEA2) [14] and Non-dominated Sorting Genetic Algorithm,II (NSGA-II) [15]. A Python based tool developed in our lab, which connects the algorithms implemented in Python with Spectre from Cadence for the analog circuit simulations, is used for the purpose.

#### 2.1.1 GAIO

The GAIO algorithm is a multilevel algorithm, which solves the Pareto set of an MOP by "boxes". It divides the search space *S* in each step into a collection of

boxes. Within each box, the test points are randomly chosen and evaluated till the Pareto set for the entire search space is calculated. The boxes without any Pareto points are discarded for the subsequent steps, thus eliminating the unimportant areas from the design space. Here, a box containing parts of the Pareto can be removed if the test points are not favourable. The approximation of the Pareto set will therefore possibly remain incomplete. Therefore, the GAIO continuation algorithm is additionally used. This has the benefit of finding out additional Pareto points under certain circumstances in the vicinity of the known Pareto points. The number of samples per box and the recursion depth for the algorithm are defined by the user.

#### 2.1.2 NSGA-II

NSGA[16] is one of the first genetic algorithms to find Pareto-optimal solutions for MOP. However, there have been some disadvantages of using NSGA; such as the high computational complexity, the lack of elitism and the requirement of specifying a sharing parameter. NSGA-II algorithm overcomes those drawbacks.

The primary selection criterion for the NSGA-II is called fast non-dominated sort. The non-domination count  $n_p$  of each element p is calculated first where  $n_p$  is the number of elements dominating p. In addition to that, the set of elements  $S_{p}$ , dominated by p is also calculated. Each element is then sorted into non-domination fronts  $F_i$ , based on its non-domination count  $n_n = i$ . The elements with  $n_n = 0$ are selected at the beginning so as to reduce the complexity of the sorting method. Then the algorithm iterates through the elements from the domination sets  $S_p$  of each member of  $F_0$  and reduces their non-dominated count by one. When the nondominated count reaches zero for an element, it is stored in a separate list Q. These elements lead to a second non-dominated front. The above mentioned procedure is performed with all the members of Q and a third front is detected. The process is continued till all the fronts are recognized. To calculate non-domination count and domination sets for N elements having M objectives, O(MN<sup>2</sup>) comparisons are required. The domination count  $n_p$  for each element p in the second or higher level of nondomination, can be maximum N - 1. As a result, the complexity to visit a domination list becomes  $O(N^2)$ . Therefore, the overall complexity of the fast non-dominated sort is  $O(MN^2)$  [15].

To maintain the diversity of the solution, a crowded-comparison approach is applied here. The average distance of two points on either side of an element along each of the objectives is calculated so as to get an estimation of the density of elements surrounding this particular element. This quantity is called *crowding distance*. An infinite distance value is assigned to the boundary solutions of each objective. The sum of individual distance values corresponding to each objective gives the overall crowding-distance. Prior to calculate the crowding distance, each objective function is normalized. In the worst case scenario, N elements are sorted M times to calculate the crowding distance. This results in a complexity of O(MNlogN).

### 2.1.3 SPEA2

An elitist evolutionary algorithm, known as Strength Pareto Evolutionary Algorithm (SPEA), was proposed by Zitzler and Thiele [17]. Here, the Pareto sets are stepwise approximated as well. The samples are grouped into populations and subsequent population is generated from the existing population after each run, with an intention to seek out better approximation of the solution quantity. Thus the evolutionary principle helps inheriting the good properties eliminating the bad ones. A fitness evaluation process evaluates the characteristics of this individual according to the Pareto dominance and a fitness score is assigned to each individual. Subsequently the individuals suitable for inheritance for the following population are selected. The inheritance process takes place with the recombination of crossover and mutation. As a result, two new children are born to replace their parents with a certain probability.

SPEA was improvised by Zitzler et al. in 2001 and it is known as SPEA2 [14]. In SPEA2, both the number of individuals dominating an individual and the number of individuals dominated by the same, are taken into account by the rank of the individual, similar to NSGA-II. For each candidate p in the archive  $\overline{P_t}$  and the population  $P_t$ , the number of candidates dominated by p is given by S(p). The sum of all strength values S(i) gives the raw fitness R(p) of each candidate p. Here, i is a candidate dominated by p. Here, the goal is to find out a low value of R. When R is 0, it shows that the candidate is not dominated by others. Whereas, a high value of R represents a candidate, who is dominated by many other candidates as well as dominate many others. The complexity of this fitness assignment is given by  $O(N^2)$ , where N is the combined set of elements in the archive and population. A secondary election criteria is used for further selection from the fitness values, as they are usually generated in wide variety. Otherwise, it may fail in cases where domination between candidates is sparse. So, the solution density is estimated using the k-th nearest neighbour method. The distance to all candidates from  $\overline{P_t}$  and  $P_t$  is calculated in objective space and stored in a list for each individual candidate *p*. The density for each *p* is given by equation 2.3.

$$D(p) = \frac{1}{\sigma_n^k + 2} \tag{2.3}$$

It is apparent that 0 < D < 1. The density is added to the raw fitness value R(p) to generate the absolute fitness value F(p). As the distance with its *k*-th neighbour increases, the value of *D* decreases thus reducing *F*. The complexity of the density estimation is given by  $O(N^2 \log N)$  [14].

### 2.2 Resources

For the design space exploration, few significant resources are taken into account which may allow a direct quantitative comparison of different implementation alternatives. These properties of a circuit correspond to the consumption of certain resources, which are classified according to time, energy, area and robustness. It should be noted at this point that at first sight it seems unusual to describe the concept of robustness as a resource. However, there is no difference in the way in which the four terms are treated, as regards the balance and the contest between each other. For the sake of a consistent description, the robustness will also be interpreted as a resource here.

### 2.2.1 Propagation Delay



Figure 2.1: Propagation delay and rise/fall times

The time duration taken by a circuit output to change after there is a change in input signal, is considered as the propagation delay of the circuit. For the better understanding of the circuit behaviour, propagation delay is calculated at both the rising and falling edges. The measuring points are considered from the time when the input signal crosses 50% of the supply voltage till the time when the output signal crosses 50% of its attainable value.

Rise time and fall time are also considered as important parameters. The rise time is considered as the time taken by the gate output from a low level to a high level. Analogously, the fall time is measure in the opposite way. Usually for the calculation, either 10% to 90% or 20% to 80% transitions are considered.

Let us denote the rise time as  $t_{LH}$ , fall time as  $t_{HL}$  and the propagation delays as  $t_{pLH}$  and  $t_{pHL}$ , respectively. The definitions of different timing behaviours are illustrated in Figure 2.1.

$$t_{LH} = t \mid_{V_{out}=0.1V_{DD}}^{V_{out}=0.9V_{DD}}$$
(2.4)

$$t_{HL} = t \mid_{V_{out}=0.9V_{DD}}^{V_{out}=0.1V_{DD}}$$
(2.5)

$$t_{pLH/pHL} = t \mid_{V_{out}=0.5V_{DD}}^{V_{out}=0.5V_{DD}}$$
(2.6)

For the propagation delay calculation, an average of  $t_{\rm pLH}$  and  $p_{\rm pHL}$  is taken as follows:

$$t_{pd} = (t_{pLH} + t_{pHL})/2$$
(2.7)

#### 2.2.2 Power

Power dissipation in CMOS circuits can be subdivided into two components; dynamic and static dissipation.

Dynamic dissipation happens due to switching in gates and short-circuit current. The short-circuit current is generated when both pMOS and nMOS are partially ON.

There are multiple reasons behind static power dissipation. They are subthreshold leakage through OFF transistors, gate leakage through gate dielectric, junction leakage through source/drain diffusions and contention current in ratioed circuits. In our work, leakage contributes maximum to the static power dissipation.

Combining together the total power of the circuit is given by

$$P_{dynamic} = P_{switching} + P_{short\,circuit} \tag{2.8}$$

$$P_{static} = (I_{sub} + I_{gate} + I_{junct} + I_{contention})V_{DD}$$

$$(2.9)$$

$$P_{total} = P_{dynamic} + P_{static} \tag{2.10}$$

The major contribution in dynamic power dissipation comes from the switching power, given in equation 2.8. If, the average switching frequency of a gate is given by  $f_{SW}$ . The average dynamic power dissipation can be calculated as

$$P_{switching} = CV_{DD}^2 f_{SW} \tag{2.11}$$

where  $V_{DD}$  is the supply voltage and *C* is the sum of internal and load capacitances. The switching frequency  $f_{SW}$  can be expressed as an activity factor  $\alpha$  times the clock frequency *f*. As, it is not necessary that the gate will switch at every clock cycle. Therefore, the equation 2.11 can be rewritten as

$$P_{switching} = \alpha C V_{DD}^2 f \tag{2.12}$$

The activity factor is the probability that the circuit node goes through a state transition. The circuit consumes power during this transition only.

In our level shifter circuit, there is no such time duration when both of the pull up and pull down networks are partially ON while the input signal switches. Therefore, for further calculation we will not consider the short circuit current contribution to the dynamic power dissipation of our circuit.

When, the circuit is not switching, static power dissipation becomes the major concern. Specially in nanometer processes with low threshold voltages and thin gate oxides, the leakage current contribution increases significantly to the total switching power [18].

The subthreshold leakage current used to have negligible contribution to the total power consumption in the past. However, in subnanometer processes it is not any longer possible to ignore. It is essential to have an accurate model of transistor drain current, so that the leakage current in subthreshold region can be predicted. A weak inversion state exists between source and drain when the gate voltage is less than the threshold voltage. When there is any potential difference between the drain and the source, it exhibits as the voltage drop across the drain-to-substrate delpetion region. The carriers move by diffusion as opposed to drift because of almost zero potential drop along the channel. The subthreshold current can be shown as

$$I_{sub} = \mu_0 C_{ox} \left(\frac{W}{L}\right) (m-1) (v_T)^2 e^{\frac{(V_{gs} - V_t)}{mv_T}} \left(1 - e^{-\frac{V_{ds}}{v_T}}\right)$$
(2.13)

where  $\mu_0$  is the carrier mobility,  $C_{ox}$  is the oxide capacitance, W and L are width and length of the gate, *m* is the bodyy-effect coefficient and  $v_T$  is the thermal coefficient. Silicon-on-Insulator (SOI) circuits have a sharper and gradual reduction in subthreshold current, making them attractive for low-leakage designs.

For the convenient of calculation, the leakage current  $I_{leak}$  is calculated as the the mean value over all possible current components at a particular level, i.e. '0' or '1'. The static power dissipation is expressed as

$$P_{static} = V_{DD}.I_{leak} \tag{2.14}$$



Figure 2.2: Dynamic energy calculation [12]

### 2.2.3 Energy

The dynamic energy becomes primary when the load is connected to the circuit. As the load capacitance  $C_L$  get charged through the pMOS from 0 to  $I_{D_{sub}}$ , an amount of energy is drawn from the power supply. While part of this energy is dissipated in pMOS, the rest of it is stored on the load capacitor. The capacitor discharges during the high-to-low transition and the stored energy is dissipated in nMOS.

The total energy required can be calculated as

$$E_{\rm tot} = V_{\rm DD} \int_{\Box} I_{\rm tot}(t) \, dt \tag{2.15}$$

where  $I_{tot}$  is the summation of the current drawn from the sources. Using equations 2.10 and 2.15, the dynamic energy can be calculated as

$$E_{\rm dyn} = V_{\rm DD} \int_{\Box} I_{\rm tot}(t) \, dt - P_{\rm stat} \cdot T \tag{2.16}$$

Figure 2.2 elaborates the procedures to determine  $E_{dyn}$ .

### 2.2.4 Noise Margin



Figure 2.3: Voltage transfer characteristics and noise margin [12]

Noise margin is defined as the allowable noise voltage on the input of a gate which will not change the output. Two parameters; the low noise margin,  $NM_L$  and the high noise margin,  $NM_H$  are commonly used to describe noise margin. The voltage transfer characteristic of the gate input vs output can be seen in the Figure 2.3. The stable region can be specified using a set of critical regions along a greater characteristic curve. The voltage parameters separating these critical regions are described as follows:

$$V_{\rm IL} = \min\left\{V_{\rm in} \left| \frac{dV_{out}}{dV_{in}} = -1\right\}\right\}$$
(2.17)

$$V_{\rm IH} = max \left\{ V_{\rm in} \left| \frac{dV_{out}}{dV_{in}} = -1 \right\} \right\}$$
(2.18)

$$V_{\rm OL} = \min\left\{V_{\rm out} \middle| \frac{dV_{out}}{dV_{in}} = -1\right\}$$
(2.19)

$$V_{\rm OH} = max \left\{ V_{\rm out} \left| \frac{dV_{out}}{dV_{in}} = -1 \right\} \right\}$$
(2.20)

In case of inverting gates  $V_{OL} = V_{out}(V_{IH})$  and  $V_{OH} = V_{out}(V_{IL})$ . The signal to noise raise is defined using the voltage levels as follows,

$$NM_{\rm H} = V_{\rm OH} - V_{\rm IH} \tag{2.21}$$

$$NM_{\rm L} = V_{\rm IL} - V_{\rm OL} \tag{2.22}$$

$$NM = min(NM_{\rm H}, NM_{\rm L}) \tag{2.23}$$

NM indicates the worse of the two signal-to-noise ratios determined separately for high and low levels. The signal-to-noise ratio helps to determine the uncertainty region of operation which can corrupt the output. Smaller voltage distortions cannot alter the logic operations of the gate. It is important to note that, the larger is the noise margin, the better is the performance. While forming the multioptimization problem in Equation 2.1, the effect of NM is inversely used.

#### 2.2.5 Area

For the resource measurement, area requirement of a gate can be measured by either counting the number of transistors being used or calculating the accumulated area of the transistor channels i.e. the sum of the product of the channel width and the channel length of all the transistors. However, the number of transistors used in a gate being constant, it cannot be considered as a parameter. Another way is to determine the area of the layout of the standard cell. For optimization process, it is highly impractical to implement layouts with different set of widths and lengths. Therefore, the layout area is not suitable as an objective function for the optimization. In principle, only the accumulated channel area is considered as an objective criterion.

$$A_{\text{gate}} = \sum_{i} W_i L_i \tag{2.24}$$

This can be measured directly from the parameters without simulation. However, it should be noted while drawing the layouts that the actual area requirement of a gate also depends on the wiring complexity, the format of the standard cell frame, the routing grid, and other factors.

### 2.3 Inverter Study

Several resources have been discussed in Section 2.2. In this section, we explain how these resources influence the circuit performance. An inverter circuit is considered for this study. One of the transistor parameters is varied while keeping the others constant.

| Transistor parameter | Symbol Used | Default Value (nm) |  |
|----------------------|-------------|--------------------|--|
| pMOS width           | wn          | 250                |  |
| pMOS length          | lp          | 48                 |  |
| nMOS width           | wn          | 80                 |  |
| nMOS length          | ln          | 48                 |  |

Table 2.1: Details of the transistor dimensions

Since, our intention is to explore the behaviour of the circuit in subthreshold region. The circuit is simulated with 400 mV input supply. The frequency of the input is maintained at 1 MHz. Table 2.1 shows the symbols and the default values used for the transistor parameters.

Figures 2.4-2.7 represent the main reason behind our motivation to design space exploration. With the increase in width in pMOS, the propagation delay increases linearly except at the beginning. It is obvious that with the increase in pMOS width, the input capacitance increases, thereby having a linear effect. However, the initial decrease in the delay is because of reaching the optimum pMOS to nMOS ratio.

In Figure 2.4d, we can observe that the noise margin decreases till approximately around 300 nm with further increasing steadily. As explained in Equation 2.23, the minimum value out of  $NM_{\rm H}$  and  $NM_{\rm L}$  is plotted here. The initial increase in the graph is actually the contribution from  $NM_{\rm H}$ . Whereas, the latter part is a representation of  $NM_{\rm L}$ . As the pMOS width increases, the threshold point shifts to the left thereby reducing  $V_{\rm IL}$ . Consequently, the low noise margin keeps decreasing.

The transition energy changes linearly with the transistor width. The gate capacitance and drain capacitance increase with the pMOS width. The output capacitance increases, shooting up the transition energy.

The subthreshold current linearly varies with the width of the transistor. Therefore, the static power dissipation rises with the pMOS width as shown in Figure 2.4c. This is however different in case of pMOS length. The static power dissipation remains almost constant as can be seen in Figure 2.5c when the pMOS length passes over the 50 nm mark. According to the Equation 2.13, the leakage current is inversely proportional to the length of the transistor. This justifies the hyperbolic nature of the curve.

The noise margin graph shown in Figure 2.5d is again a combination of both the high and low noise margins, dominating the two parts separately.

The transition energy graph in Figure 2.5b shows that it increases intially then



Figure 2.4: Variation of wp from 250 nm to 2500 nm



Figure 2.5: Variation of lp from 30 nm to 300 nm

decreasing for a short length, starts increasing again. Although, we must observe that the fluctuation of transition energy is around 10 aJ. This is really negligible.



Figure 2.6: Variation of wn from 80 nm to 800 nm

The pull down segment is more important for subthreshold operation. The strength of this segment defines the ability of the circuit. Here, it can observed that the propagation delay reduces initially for a short time when the nMOS width is increased, then again steadily increasing all throughout The remaining parameters increase in proportion with the nMOS width. It must be observed in Figure 2.6d that, the  $NM_L$  is dominant althroughout the range.



Figure 2.7: Variation of ln from 30 nm to 300 nm

Figure 2.7 is most informative among the others. With the increase in nMOS length from 30 nm to 300 nm, the propagation delay of the circuit increases steadily due to the increase in input impedance. But, the transition energy reduces almost exponentially as shown in Figure 2.7b. The switching current decreases with the increase in nMOS length. But, the variation is around 20 aJ, therefore highly negligible. As of Figure 2.5c, Figure 2.7c can be explained using the Equation 2.13. The noise margin in Figure 2.7d, is again partially dominated by  $NM_H$  and the remaining part by  $NM_L$ .

Above, it can be seen that each and every parameter has an intense effect on the transistor performance. This accumulates into the circuit performance finally. Therefore, this multiobjective optimization builds the foundation of our work. The following chapters will reflect that.

## 3 Level Shifter

The demand of low power consumption in microelectronics circuits has increased significantly as submicron technologies scale down. Battery operated portable applications are in high demand across the industries such as automotive, medical, MEMS, telecommunication and so on. Subthreshold operations provide a potential solution to the energy consumption problem. However, it comes at the price of significant degradation of performance. Since, it needs to reduce the maximum operating clock frequency in subthreshold operation.



#### Figure 3.1: A system with several voltage domains

In recent years, the concept of voltage island methodologies have become widely adapted to combat the performance issue [19–22]. With the increased use of voltage islands within chips, functional units being operated at different voltages allow the core processor to execute the critical algorithm while running at a above threshold voltage, thus maximizing the performance. Simultaneously, all other non-critical

circuits operate at a subthreshold voltage to improve the power efficiency. This results in multiple supply voltage domains (MSV). As suggested by [23], designs optimized with multiple supply voltage with multiple threshold voltages can provide a dramatic dynamic power reduction by 40-50% as compared to the original single supply voltage design.

Even if it is considered that the whole system operates in subthreshold domain, the subthreshold cores need to be connected to the I/O circuits which are operated at a voltage much higher than the subthreshold operation. It is possible to reduce the operating voltage of I/O only to a certain extent due to the large impedance load and high noise immunity requirement. It is not also practical to use analog pad cells to connect directly to subthreshold logic. Since, very large buffers would be needed to achieve acceptable transition times considering the high parasitic capacitance of the bond pad, bond wire, device package, PCB tracks, and further off-chip load.

It is possible to vary the supply voltage of the modules during operation, when the logic gates are compatible for that. Additionally, the clock frequency must be reduced with a reduced supply voltage in order to take account of the increased gate running times. This is called Dynamic Voltage and Frequency Scaling (DVFS). DVFS allows to operate modules with high clock frequency and supply voltage in phases with high computational load in order to handle the processing of the data, while the clock frequency and the supply voltage are reduced in less compute-intensive time sections so that less power or energy is required in subthreshold operation. This technique is therefore particularly suitable for systems with varying requirements for computing power.

At the interface between subthreshold modules and conventional circuit parts, level shifters are required to switch between the different voltage levels of the signals. For DVFS systems, the level shifters must be capable of operating over a wide voltage range for one or even both supply voltages work without errors. Because a level shifter does not contribute to the logic functionality of the circuit, it must be implemented in a resource-efficient manner in order to keep the increase in delay, power dissipation and area small. In this thesis it is assumed that the higher supply voltage at the level converter always corresponds to the full or nominal supply voltage of the technology used.

## 3.1 Up level shifter

An up level shifter receives input from a low voltage level and generates an output to high voltage level. Basically it acts between two different voltage modules as a converter. It is also possible that both the modules are operated with same supply voltage. For example, in DVFS systems voltage of a module might be raised to the nominal value for the accelerated data processing, making two different modules
operating on same supply voltage.



Figure 3.2: Principal structure of the up level shifter

Figure 3.2 describes a basic up level shifting operation. An up level shifter circuit receives the input from a low level module. An inverter consisting of a pMOS (MP1) and a nMOS (MN1), is required for the level shifting operation as the system needs both the low level input A and the inverted logic AN. The level shifter block has limited ability to drive connecting loads. As a result, a buffer circuit is used at the output. MP2 and MP3 constitute the pull up path of the buffer circuit, with the pull down path consisting of MN2 and MN3. In the following sections, different implementations of level shifter circuit are explained.

## 3.2 Conventional level converter

Conventional level shifter circuits are designed based on two fundamental topologies. Type I level shifter is based on half latched pMOS devices as shown in Figure 3.3. The standby power is negligible due to complementary pull-up and pull-down network.

Here a half-latch is formed by the pMOS transistors P2 and P3, with the low voltage signals *A* and *AN* being connected to the nMOS transistors N2 and N3 respectively. The width of the nMOS transistors is widened sufficiently so as to overcome the drive strengths of the corresponding pMOS during each input transition.



Figure 3.3: Conventional level shifter circuit with cross-coupled devices [11]

The strong contention between the pull up path and the pull down path is the major drawback of this circuit. When *A* and *AN* reduce to subthreshold level, the pull down transistors become extremely weak. As a result, when  $V_{DDH}$ is in above-threshold region, they cannot overcome the strength of the pull up transistors. Therefore, nMOS-to-pMOS ratio grows exponentially as the pull-down transistors only allow a subthreshold on-current while the pull-up half-latch has above-threshold drive strength. This leads to a larger area and load capacitance, in turn increasing the energy consumption and delay.

Type II topology can be seen in Figure 3.4, which is based upon current mirror structure. Since, there is almost no overlap between pull-up and pull-down path, the contention is low. This is an advantage of this type over the other. As a result, the pull-up path needs not to be weakened for fast wide-range conversion. It is easy to achieve low delay and switching energy. However, the static current flowing through P2 and N2, produces large standby power for high output. The standby power increases with the  $V_{DDL}$ .

# 3.3 State of Art

Based on the two topologies, there had been several works. The aim was to address the problems of both the topologies along with making it robust and efficient. In the following sections, state of art designs related to these two topologies will be described.



Figure 3.4: Conventional level shifter circuit with current mirror structure [24]

## 3.3.1 cross-coupled pMOS



(a) Level Shifter

(b) Reduced Swing Converter

Figure 3.5: (a) shows the level shifter circuit along with the novel (b) RSI circuit proposed in [25]

A static and a dynamic converter design was implemented in IBM 130 nm technology by [25]. Here a novel circuit, *Reduced Swing Inverter* was proposed with

an aim to reduce the  $V_{gs}$  of pMOS transistors making nMOS transistors relatively stronger than the pMOS ones. A voltage doubler circuit is added before the input of the level shifter as shown in Figure 3.5a, so as to boost the low voltage.

The circuits consist of at least 18 transistors (excluding the voltage doubler) make it bit complicated. This augments its susceptibility to variation as well. Moreover, the RSI circuit is driven by additional inverters, thus limiting the minimum acceptable input voltage. In addition to that, the additional inverters and RSI [Figure 3.5b] consume a significant amount of energy which is not at all acceptable for low power applications. The circuit is not scalable since the pull-up ability of P1 and P2 is limited by the RSI. The operation speed of most digital core circuit can scale with the supply voltage. Therefore, the non-scalability of the LC conversion speed may pose a problem to the system when being operated at higher voltage. In order to avoid this problem, the level shifter should be able to accept both sub-threshold logic and strong inversion logic inputs and its performance should be adaptive with the input logic voltage values. Also, the cross-coupled pMOS devices are constantly weakened by the RSI, it cannot track the delay of a DVS circuit.



Figure 3.6: Half-latch based level shifter with current limiters [26]

The major disadvantage of this type was the weak contention between the pullup and pull-down path transistors. [26] added pMOS current limiters to the basic circuit as shown in Figure 3.6. That reduces the drive strength of the pull-up network to subthreshold current. This circuit is robust across process, voltage, and temperature (PVT) corners. Also it is possible to have a wide supply range on the input side. However, the required reference path for the limiter leads to a static on-current. As a result, the static power dissipation increases, especially when the  $V_{DDL}$  is increased above the threshold voltage of the transistor.

[27] presents a solution by the replacing the pMOS half-latch with a current mirror load. When the input is high, there flows a static on-current. As a result, the static power consumption increases. In [27], the authors present a dynamic threshold approach using body ties. Silicon-on-Insulator (SOI) technology is used to gain access to the bodies of each type of transistor. This method requires the manufacturing technology to allow each device's body to be set independently. The configuration shifts the threshold of each device, reducing it when the device is 'ON' and increasing it when the device is 'OFF'. But this is not practical in other technologies. For example, in bulk CMOS technologies, isolated p-wells would be needed, necessitating additional manufacturing steps and also leading to high area overhead. Furthermore, wide-range DVS input is not supported by this technique. Since, it would forward bias the diodes connected to the base, once the input becomes larger than  $V_{T_{DIODE}}$ .



Figure 3.7: Diode connected level shifter circuit [28]

[28] introduced two additional pMOS diodes to limit the power of pull-up path of cross-coupled pMOS transistors as shown in Figure 3.7. At steady state, the  $|V_{GS}|$  of the pMOS diodes is too small. Therefore, it is commensurate to the diode voltage drop  $|V_{PD}|$ . When the input is low, due to a voltage drop across M5, M4 could have a non-zero  $|V_{GS}|$ . This might lead M4 into weak inversion,

allowing a static current through M8 and M4. This does not change when the input is high. Rather, the similar roles are played by M5 and M7. Variation in the diode-connected devices leaves a strong impact on the reliability of the solution.

[29] uses the RSI concept along with a feedback path for leakage reduction. This concept includes a stack of transistors along with transistors with different thickness so as to overcome the overdrive voltage difference. Still, the circuit remains susceptible to parametric variation.



Figure 3.8: Level shifter as proposed in [30]

Figure 3.8 shows a Modified Dual Cascode Voltage Switch (MDCVS) based

circuit proposed by [30], which satisfies the needs of both high speed and low power operation. Here the pull down transistors (MN1 and MN2) are set to low threshold transistors resulting an increase in current flow at a given threshold voltage. It helps to make the pull down path strength even with the one of pull up path. To limit the pull-up path strength of the two branches, diode connected pMOS (MP5 and MP6) are utilized [28]. To trade off speed and power consumption, the multi-threshold CMOS design technique is applied. Although, the presence of the always off-biased pMOS transistors makes scalability difficult in terms of both static power and dynamic energy consumption.



Figure 3.9: Circuit-level schematic of the proposed level shifter [31]

The proposed topology in [31] has two main stages. As shown in Figure 3.9, the first stage consists of a cross-coupled differential inverter stage with diodeconnected nMOS. The second stage is a normal cross-coupled differential inverter so as to restore the final output to full swing from its 0 to  $V_{DD} - V_T$  range at the output of stage one. The level conversion happens primarily in the first stage for subthreshold circuits.

nMOS diode current limiters are added in the pull-up network by [32], so as to reduce the current contention drastically. As seen in Figure 3.10, the drain of the pull-down nMOS is used as the output node unlike as in Figure 3.7. As a result, additional pull-down devices are not required. The amazing advantage of this design is that the nMOS-to-pMOS ratio comes down to 2 here [32]. For the design optimization, HVT transistors are used for pull-up network. Whereas, pull-down network was designed using RVT transistors. Furthermore, to improve the driveability of the pull-down devices for delay reduction, the inverse narrow width effect is explored. With the decrease in channel width, the threshold voltage experiences a sharp rise. This results a higher current density at the minimum



Figure 3.10: Level shifter schematic proposed by [32]

width of the transistor. To get benefit out of this, multiple transistors are used in parallel. In the pull-down path, 5 nMOS transistors are used in parallel.

[33] proposed a level shifter design consisting of two stages. In the first stage, the pull-up path is weakened so as to enable fast and reliable level shifting for subtheshold input voltages. Full output voltage swing is achieved in the second stage. The design has symmetrical rise and fall delay. As seen in Figure 3.11, an inverter and the pull-down nMOS transistor MN6 combine to balance the delays. The INWE is utilized here to improve the switching speed and energy efficiency. The width of the fingers to construct the pull-down transistors MN1-MN4 and MN6, are maintained at minimum possible value. This reduces the current significantly, specially in the subthreshold region as the current varies exponentially with the threshold voltage. The parasitic capacitance is reduced because of the reduction in the width of pull-down transistors. As a result, the switching delay and the power consumption improve.

[34] proposed a design[Figure 3.12] which addresses the power and area cost, along with the operating range to the sub- $V_T$  region. Here, two off-biased pMOS transistors are added between the  $V_{DDH}$  and the cross-coupled pMOS transistors. Sufficient leakage current provided by these off-transistors helps to have a faultless operation. Also, to sink this current, it is enough to maintain the nMOS transistors in the pull-down network at their minimum width. To combat with the leakage current, channel stretching is applied to the nMOS transistors and two pMOS transistors are stacked in the pull-up network. Though, the minimum operating  $V_{DDL}$  achieved here is quite low, the propagation delay and power consumption suffer a noticeable loss.

[35] proposed a level shifter circuit using the commercial 90 nm CMOS ST



Figure 3.11: cross-coupled pMOS based level shifter schematic proposed by [33]

Microelectronics process technology. The technology provides three different types of transistors;- low-voltage threshold(LVT), standard-voltage threshold (SVT) and high-voltage threshold (HVT). The circuit as shown in Figure 3.13 consists of an input inverter, a voltage converter and an output inverter. LVT transistors are used in the input inverter so as to provide a fast differential input voltage. The pull down network is also designed using LVT transistors so that the strength can be increased. LVT pMOS devices(MP2 and MP3) are inserted so as to reduce the cross-bar current flow. Pull-up network is made up of HVT pMOS devices so as to make it weak. In addition to that, two diode connected HVT pMOS (MP6 and MP7) are placed between the pull-up logic and the source. The output inverter assures a rail-to-rail voltage conversion, whose pull-down network consists of a SVT nMOS. For the pull-up network of the inverter, stack HVT transistors are used thus limiting the leakage current flow through the pull-up network.



Figure 3.12: cross-coupled pMOS based level shifter schematic proposed by [34]

To overcome the issues related to the speed and voltage conversion range in [35], another Multi-threshold CMOS (MTCMOS) design is proposed by [36] as described in Figure 3.14. A diode-connected scheme containing pMOS devices, is implemented here along with the multithreshold technique. It helps to relax the contention between pull-up network and evaluation nMOS devices. Along with pronounced improvements in terms of performance, energy consumption and static power, The circuit is capable of converting from 180 mV to 1 V.

[37] adopted a weak keeper based pulsed control strategy to avoid contention. So that the delay and power consumption can be reduced. Comparatively, large number of transistors are used here for weakening pull-up path or avoiding contention. The switching energy, however, increases substantially because of this mechanism. With the pull-up network being constantly weakened, the delay scalability suffers making it unsuitable for DVS. It was followed by a simple structure proposed by the same group [38].

The output buffer structure is modified as visible in the Figure 3.15. This ensures that one of the transistors of the output buffer remains completely turned off, reducing the short circuit current there. It helps to speed up the operation as well.



Figure 3.13: MTCMOS based level shifter circuit proposed by [35]

However, the stacks of transistors used here account for a larger area.

A Dual Supply Voltage Level Shifter (DSLS) approach is proposed by [39], which is implemented in 0.18  $\mu$ m CMOS technology. The design consists of two current generators to limit current contention at the critical discharging internal nodes during the output switching. As shown in Figure 3.16, when the voltage at node Q1 crosses the switching threshold voltage of the output inverter, the output voltage of inverter (Out) starts to discharge. Accordingly, the output voltage drops down to logic '0' before the voltage of node Q1 reaches V<sub>DDH</sub>. This results in turning off the M7 transistor completely, thus turning M11 off. Consequently, M5 is turned off, not allowing the node Q1 continuously charged to reach V<sub>DDH</sub>. However, this voltage at node Q1 is not sufficiently low enough to turn off the pMOS transistor of the inverter. Therefore, static power dissipation increases especially at smaller technology node.

To reduce the short circuit current in the previous designs [35], [36]; [40] pro-



Figure 3.14: MTCMOS based level shifter circuit proposed by [36]

posed another design with self-adapting pull-up network and pMOS current limiters. The design was fabricated in the 180 nm UMC CMOS process technology. During each transition in the input, one of the branches of the pull-up network is strengthened weakening the other. This helps in fast switching and reduced energy consumption. As shown in the Figure 3.17, the output inverter is replaced by an inverting output buffer so as to assure adequate output driving strength. The output buffer is driven in a split way [38].

### 3.3.2 Current Mirror

In current mirror based level shifter circuit design, [41] proposed a novel idea which is followed by many researchers. The current mirror circuit was replaced by a Wilson Current Mirror (WCM) as shown in Figure 3.18, so as to eliminate the static current flowing through the pull-down path transistors when any of them is



Figure 3.15: level shifter circuit proposed by [38]

turned off. Additionally, it keeps the contention between pull-up and pull-down path excellently balanced. When the input is low, N3 conducts and pulls Z low. The node B is charged through P2 till P2 and P4 both turned off, since N2 is turned off. When input is high, N2 conducts leading to a current flow through N2, P3, and P2. P2 and P4 being current mirror, this current flows through P4 charging the node Z. As Z rises, P3 is turned off. Therefore, no static current can flow through N2, P3, and P2. P3, and P2. The proposed design is DVS complied and shows low sensitivity to process and temperature variations. It has amazingly low complexity as well.

Multi-threshold CMOS (MTCMOS) devices are often adopted as a solution to reduce the power consumption. [42] presented a level shifter design [Figure 3.19] based on such devices, which consists of both a WCM based design and the con-



Figure 3.16: Level shifter schematic proposed by [39]

ventional cross-coupled design. The cross-coupled part compensates the voltage drop of the WCM section. While, the input for the cross-coupled level shifter is raised up near or above threshold to reduce the power dissipation and delay. In order to achieve a higher speed, W/L ratios of the output transistors MP2, MN2, MP5 and MN4 for are increased. Low  $V_{\rm th}$  nMOS devices for NM5 and NM6 are used to improve the pull down strength of the cross-coupled section, while the active current is limited by using high  $V_{\rm th}$  pMOS for MP7 and MP8.

The design proposed by [43] has two major parts;- a current generation circuit and a level conversion circuit [Figure 3.20]. The level conversion part is based on the conventional two-stage comparator circuit. The current reference in the conventional comparator does not exist here. Since, it dissipates static current thereby increasing power consumption. The current mirror is controlled here by sensing the logic error between input and output, so that the current is enabled during transition and disabled after the output flips. There are two circuit blocks



Figure 3.17: level shifter circuit proposed by [40]

in the generator. They are a fall transition current generator and a rise transition current generator. The same concept was presented in [44] with a different name which is Logic Error Correction Circuit (LECC). LECC monitors input and output signals to create an implicit pulse. This design is further LS elaborated in [45], which is realized using a 0.35 mm process. However, the power consumption is quite low here, it comes at a cost of very high propagation delay. Also, the comparator consumes substantial switching energy.

The Multi-threshold CMOS (MTCMOS) devices are utilized by [46] along with a novel concept of input-controlled diode chain to address the voltage drop issue. The stacked transistors M8-M11 seen in Figure 3.21 constitute the diode chain. The design is implemented as double-row standard cell with a 0.18  $\mu$ m CMOS process technology, with the threshold voltage lying around 0.46 V.

[47] redesigned the input-controlled diode chain inserting one more transistor



Figure 3.18: WCM based level shifter schematic [41]



Figure 3.19: Level shifter schematic with MTCMOS implementation[42]

as visible in Figure 3.22. The inverted output(node D) is exploited to design the feedback control. Therefore, it takes care of the charge sharing and slow rise



Figure 3.20: Circuit-level schematic of the proposed level shifter [43]



Figure 3.21: Level shifter schematic proposed by [46]

transition issues of WCM based design. Another advantage of using the node D



Figure 3.22: Proposed circuit by [47] for fast and energy-efficient wide-range voltage conversion from near/sub-threshold up to I/O voltage

instead of node A in feedback, is a delay in cutting off the source current, which maintains sufficient charging strength most of the time during the rise transition. INWE is utilized while choosing the device sizes for the design so as to further reduce the delay and power consumption. For the input transistors in the design, 3.3 V native devices and 1.8 V devices with connected gates are used. Whereas, the high voltage section is built with 3.3 V normal devices. The combination of mixed  $V_{\rm th}$  devices ensures I/O voltage tolerance maintaining relatively small delay. The lowest energy consumed per transition is exhibited here at the expense of a minimum up-convertible  $V_{\rm DDL}$  of only 0.21 V.

In [33], the proposed level shifter circuit based on WCM concept has a feedbackcontrolled pMOS transistor MP3 inserted in the source side of the diode-connected pMOS transistor MP1 instead of the drain side [Figure 3.23]. As a result, when the input rises and the output charges to  $V_{DD}$ , the voltage of node A drops to ground.



Figure 3.23: WCM based level shifter schematic proposed by [33]

This forces MP2 to turn on completely so that the output is charge to full  $V_{DD}$ . This eliminates the standby leakage in the subsequent buffers. During high-to-low transition, the voltage at node A is raise by MP3 weakening the pull-up path and thereby reducing the fall delay.

[48] added two transistors MP4 and MN3 as shown in the Figure 3.24, into the standard WCM structure. During high-to-low transition of the input, MN1 and MN2 are switched off and on, respectively. With the node OUT being gradually pulled down, MP3 turns on thereby charging the node *B*. This turns on MN3 resulting the discharge at output node with a smaller transition time. While the input going low-to-high, only M1 remains turned on with MN2 and MN3 being turned off. During a period when the OUT does not correspond to the logic level of IN, all MN1, MP4, MP3 and MP1 turn ON. allowing a transition current flowing through this branch. This current is mirrored to MP2 and charges the node OUT. As soon as the OUT attains V<sub>DDH</sub>, MP3 is turned off assuring no static current flowing



Figure 3.24: Level shifter schematic proposed by [48]

through MP1, MP3, MP4 and MN1. Voltage drop across MP4 causes a smaller swing at node *B*. Therefore, this node is pulled down with smaller transition time, resulting MN3 being turned off quickly. This reduces the contention at node OUT between pull-up and pull-down transistors, thereby reducing the power consumption.

The same research group proposed an auxiliary circuit based design in [49]. It is simulated in  $0.18 \ \mu m$  CMOS technology and standard TSMC 90 nm CMOS technology as well. The circuit consists of 16 transistors and has a huge area overhead.

Figure 3.25 shows the schematic proposed by [50]. The design constitutes of a pre-amplifier stage with high- and low-LECC, an output latch stage, and an output inverter. The components of pre-amplifiers are HLECC(MN1,MN2), LLECC (MN4,MN5), current mirrors(MP1-MP2, MP3-MP4), control transistors (MN3, MN6), and an input inverter. The design in simulated in a 0.18  $\mu$ m standard CMOS technology. MTCMOS concept is used here using both 1.8 V and 3.3 V tolerant transistors.

Another MTCMOS design proposed in [51], is capable of self-controlling the



Figure 3.25: Level shifter schematic proposed by [50]



Figure 3.26: Level shifter schematic proposed by [51]

current limiter by detecting the output voltage error. The design consists of three parts as shown in Figure 3.26. The first part is the input inverter circuit where low threshold devices are used. The principal portion of the design i.e. the level shifter is the second part. Couple of stacked nMOS transistors are connected to the

current limiter whereas the gates of these devices are coupled to the low voltage inverted input signal and the high-voltage output signal. These stacked transistors act as a current reference, so as to provide the essential current for the current mirror intermittently, according to the values of the input and output. The third part is the output driver circuit consisting of two pMOS and a nMOS. The output of the second part is connected as the input of this part and the output is fed back to the second part. As a result, a cross-coupled feedback loop is formed between the two sub-circuits. Because of this feedback loop, the design can self-control the current limiter by detecting the voltage values of the input and output.



Figure 3.27: Level shifter schematic proposed by [52]

The design presented by [52] contains a modified WCM with an interest to further reduce the delay and power consumption. As described in the Figure 3.27, there are three blocks the design. The modified WCM exists in the block 1. There are two current mirrors used, instead of one. Block 2 has the delay path. There is an OR-gate in the block 3. The blocks 2 and 3 is supposed to reduce the propagation delay when the  $V_{DDL}$  value is near threshold. However, the pMOS devices in the NOR gate cannot be switched off completely. Therefore, large static current is produced thus increasing the standby power in the NOR gate. To reduce the static

current, the size of the pMOS and nMOS in the NOR gate can be reduced. But this increases the delay. Furthermore, the stacking pMOS devices in the NOR gate will account for slow high-to-low transition at the output, increasing average delay and switching energy especially for wide-range voltage conversion.

Two current mirrors are used in the design proposed by [53]. However, the leakage current increases here compared to the single mirror design. An extra pMOS is inserted in the pull-up path for the construction of the second mirror.

[54] proposed a design containing a level-shifting capacitor together with current mirror. When the logic levels of the input and output signals are not corresponding to the high-to-low transition of the input signal, the capacitor is charged. The design contains two cascaded inverters, one supplied with  $V_{DD_L}$  and the other  $V_{DD_H}$ . If, the first inverter drives the second with  $V_{DD_L}$ , the pMOS transistors in the second inverter cannot be turned off by such a low value of  $V_{DD_L}$ . The capacitor helps to attenuate this problem, since it is charged by the voltage difference of  $V_{DD_H}$  and  $V_{DD_L}$ . The current source in the circuit is turned on only when the input signal goes low to high, thereby reducing the power consumption significantly.

# 3.4 Scale Down Mechanism

In this section, we are going to describe the scale down process we followed. Our intention was to scale down the transistor dimensions of the circuit explained in 3.18, which were used in 65 nm CMOS bulk technology, and observed whether a linear approach can lead us to find the suitable dimensions for 28 nm FDSOI technology. The preliminary examinations followed by the Pareto front search is described as follows.

#### 3.4.1 Preliminary Examinations

As our intent was to optimize the design. Certain basic conditions need to be defined, which determine the operating condition of the circuit. It was useful as well to constrain the available search area.

The circuit behaviour in worst operating conditions in the subthreshold domain, is not optimistic. As a result, nominal process parameters are maintained during the optimization process with an operating temperature of 27 °C. In 28 nm FDSOI technology, the threshold voltage of the transistors is 450 mV; which is higher than that in 90 nm bulk CMOS technology.



Figure 3.28: Design space search in terms of propagation delay, total energy consumption per transition and static power dissipation

In order to understand the influence of the parameters; width and length of all the transistors are varied freely and the schematic is simulated. The parametric analysis draws a clear picture of the influence of the parameters, stating that not all the parameters contribute to the performance of the circuit. For the performance metrics, as described in section 2.2 propagation delay, transition energy, static power dissipation and noise margin are considered. Figure 3.28 shows the parameters those influence the circuit performance.

28 nm FDSOI technology from ST Microelectronics allows us to reduce the width and the length of a transistor to as minimum as 30 nm and 80 nm, respectively. Therefore, the minimum value of the transistor width is chosen as 80 nm. However, the minimum length of the transistor is chosen as 48 nm, because the design rules let us reduce the length up tp 48 nm conveniently. The upper limit is chosen as the dimensions mentioned in [41]. It is worth mentioning that the circuit is capable of

| Transistor | $W_{max}$ [µm] | $W_{\min}$ [µm] | $L_{\max}$ [µm] | $L_{\min}$ [µm] |
|------------|----------------|-----------------|-----------------|-----------------|
| P2         | 0.200          | 0.800           | 0.400           | 0.800           |
| P3         | 0.080          | 0.360           | 0.048           | 0.048           |
| P4         | 0.200          | 0.320           | 0.050           | 0.065           |
| N2         | 0.160          | 0.800           | 0.048           | 0.060           |
| N3         | 0.160          | 0.480           | 0.048           | 0.048           |

Table 3.1: Working Range of the Transistor Dimensions

generating a desired output only when the lengths of P3 and N3 are maintained at 48 nm. Table 3.1 contains the final range, within which the circuit can perform as desired.

In order to reduce the complexity of the search, robustness is considered as the constraint for the Pareto selection. Therefore, the noise margin value is constrained to as low as 90 mV for this level shifter circuit. It must be noted that, optimization process run with three metrics, generate a huge set of Pareto points. As mentioned earlier, it is an advantage. Because, the liberty of choosing a particular dimension remains in our hand. This is essential as depending on the cell library constraint, a suitable set of value can be chosen.

As the design search space boundary is defined, it was important to understand the impact of the transistor dimensions on the performance of the level shifter circuit. Because it would help us to understand how the choice of the dimension value can influence the performance. Here we took an example of the width of P2 transistor, from the figure 3.31.

As mentioned in 3.1, the width of P2 transistor was varied between 200 and 800 nm. The Pareto search generates the Pareto front based on each individual performance metrics. Then it generates the Pareto front based on the pareto points generated for each metrics. The final set is generated from this last set of Pareto fronts.



(a) Propagation delay w.r.t(b) Energy per transition wp1 w.r.t. wp1



(c) Static power dissipation(d) Propagation delay w.r.t. w.r.t. wp1 transition energy



Figure 3.29: Pareto search with wp1 varying from 200 nm to 800 nm

Figure 3.29 shows the search results ran with the variation of the width of P2, as mentioned in the figure as wp1. The results show set of data generated from each run. It is clearly visible that there are plenty of points between 200 nm and 800 nm where the circuit can behave optimally.

#### 3.4.2 Optimization Results

A careful selection from the search space of the performance metrics i.e. among  $t_{pd}$ ,  $E_{tr}$  and  $P_{stat}$  can help us to choose any desired value manually. It should be noted that parameter values thus obtained can be theoretically optimal, but may not be helpful for the whole circuit. The ratio between pull-up and pull-down path transistor should be maintained within a practical range so that the robustness will not be sacrificed. Therefore, a manual intervention is required for the final selection.

There always remains a possibility that certain parameters can be counter contributive. As a result, a thorough image search is conducted. Figure 3.30 depicts the final Pareto image which guides us towards the most optimal parameter values.

It is observed that with the increase of P2 width from 200 nm to 800 nm, propagation delay and static power dissipation increases but energy consumption per transition reduces drastically. Therefore, a higher value of P2 width is beneficial. However, for an increase in length of P2 from 400 nm to 800 nm, it is observed that energy consumption per transition increases steadily along with the steady decrease of both, propagation delay and static power dissipation. So, a value

|            | LVT                 |                                         |                                     | RVT                                     |
|------------|---------------------|-----------------------------------------|-------------------------------------|-----------------------------------------|
| Transistor | $W$ idth [ $\mu$ m] | $L \mathrm{ength} \left[ \mu m \right]$ | $W \mathrm{idth}\left[\mu m\right]$ | $L \mathrm{ength} \left[ \mu m \right]$ |
| P1         | 0.800               | 0.048                                   | 1.000                               | 0.048                                   |
| N1         | 0.080               | 0.048                                   | 0.110                               | 0.048                                   |
| P2         | 0.600               | 0.800                                   | 0.793                               | 1.600                                   |
| P3         | 0.080               | 0.048                                   | 0.100                               | 0.048                                   |
| P4         | 0.200               | 0.080                                   | 0.240                               | 0.096                                   |
| N2         | 0.160               | 0.048                                   | 0.100                               | 0.048                                   |
| N3         | 0.310               | 0.048                                   | 0.218                               | 0.048                                   |

Table 3.2: Transistor size

situated in the middle of the range is quite suitable. The increase in the width of P3 and P4 results in a decrease in static power dissipation, but an increase in total



(a) Propagation delay w.r.t. tran(b) Energy per transition w.r.t. sition energy static power dissipation



(c) Propagation delay w.r.t. static(d) Transition energy w.r.t. Proppower dissipation agation delay

Figure 3.30: Pareto front obtained after MOP

energy per transition. However in case of P3, propagation delay decreases from 80 nm to 170 nm and then starts increasing steadily. With the increase of P4 width, propagation delay reduces steadily. Therefore, P3 width lower than 170 nm and a mid level value of P4 width is highly recommended. Width variation in both the nMOS N2 and N3 does not contribute much to the propagation delay of the circuit. However, energy consumption per transition increases steadily with the increase in width of N2, and static power dissipation increases with the increase in width of N3. Figure 3.28 shows the performance variation when an individual parameter is varied. The length of P3, P4 and N2, N3 are kept constant, as a variation of those parameters do not contribute significantly to produce the desired result.

The parametric analysis of the schematic as shown in Figure 3.18, led us to the

Table 3.2, which shows the dimension of the transistors capable of producing the desired performance at room temperature.

# 3.5 Proposed Designs

### 3.5.1 LVT cell based design in FDSOI technology

The LVT cells play important role to reduce active power as well as enhancing speed. Furthermore, the bulk of the transistors in FDSOI technology can be exploited so as to reduce power consumption. While implementing the design in Figure 3.18 using LVT cells, the bulk of the pull-up transistors was initially connected to the supply voltage  $V_{DDH}$ . However, that is possible only in theory. In the LVT cells, the transistors lay on flip-wells i.e. on p-Well. Therefore, the entire pull-up network lies on the same p-well. The design consists of a low voltage



Figure 3.31: WCM based design exploiting the FDSOI technology

 $(V_{\rm DDL})$  and a high voltage $(V_{\rm DDH})$  side. As a result, the bulk of the corresponding transistors are connected to their respective sources, which are in different voltage levels. As, well taps are used to connect the bulk to a certain voltage supply. The

p-Well used being a single one for the whole design, can allow only one voltage supply to the bulk. Even if, they belong to different voltage sides. To solve this issue and get maximum benefit out of it, the bulk is finally connected to the ground. This helps to maintain the bulk of the transistors at the same voltage level without adding another supply rail in the layout. Figure 3.31 shows the final schematic.

### 3.5.2 RVT cell based design in FDSOI technology

In case of the RVT based design, the nMOS transistors are strengthen sufficiently so as to match the contention of the pull-up path. However, two more pMOS transistors are added to reduce the slope of the low-to-high transition at the output node as shown in Figure 3.32. The bulk of the pMOS transistors here are connected to the source of the transistors whereas those of the nMOS transistors are connected to ground.



Figure 3.32: Level shifter schematic designed with RVT transistors



Figure 3.33: A hybrid topology based level shifter circuit capable of converting 150 mV to 1.2 V

#### 3.5.3 Hybrid topology based design in FDSOI technology

The above mentioned designs show best performance with an input voltage not less than 300 mV. To support an input voltage lower than 300 mV, we had to think of different topologies [55].

We know that the benefit of using the Wilson current mirror is to reduce the leakage current. Therefore, it is used as the first stage of our design shown in Figure 3.33. This stage helps to raise the low input voltage to a voltage close to the threshold voltage of nMOS transistors. The cross-coupled structure in the second stage of the design helps to maintain the full swing of the output. The operating speed of the circuit can be varied by modifying the W/L ratios of the output transistors. When input *A* is high, N2 is turned on. Therefore, the current will flow through P2, P4 and N2 and it will be mirrored in P4. As N3 is off, the node *X* will be charged till P3 is turned off.

When A is low and AN is high, N2 will be off and N3 will be turned on. There will be no current flowing through P2, P3 and N2, forcing X to discharge. This will help charging the point *Y*. As a result, there will be differential inputs on nodes *X* 

| Transistor | $W \mathrm{idth}\left[\mu m\right]$ | $L \mathrm{ength}\left[\mu m\right]$ |
|------------|-------------------------------------|--------------------------------------|
| P1         | 1.080                               | 0.048                                |
| N1         | 0.110                               | 0.048                                |
| P2         | 0.080                               | 0.048                                |
| P3         | 0.080                               | 0.048                                |
| P4         | 0.080                               | 0.048                                |
| N2         | 0.150                               | 0.048                                |
| N3         | 0.150                               | 0.048                                |
| P5         | 0.080                               | 0.048                                |
| P6         | 0.080                               | 0.048                                |
| N4         | 0.500                               | 0.048                                |
| N5         | 0.080                               | 0.048                                |

Table 3.3: Transistor size

and Y. Due to the off-biased transistors, the amplitude of these signals will be at near or above threshold of the transistors. The cross-coupled stage further helps to raise the output from near or above threshold to VDDH. The drive strength of P5 and P6 are chosen such that the nodes X and Y can exceed them easily. Table 3.3 contains the dimension of the transistors.

# 3.6 Down Level Converter

Figure 3.34 shows the circuit diagram of the down level shifter, consisting of two successive inverter stages. The channel length in the first stage operated with the higher supply voltage is 48 nm and in the second stage operated with the low supply voltage is 96 nm. The double channel length in the sensitive part of the level shifter achieves a higher robustness as with all other cells in the library.

The additional transistor MN2 in the output-side inverter has been inserted into the circuit in the interest of a symmetrical switching behaviour, as will be explained below. If the pMOS transistor MP2 is switched on, its gate to source voltage attains  $V_{DDL}$ , namely when a high level (logic 1) is applied to A. MP2 can therefore, supply a current at subthreshold level. If, on the other hand, MN3 is switched on, the gate to source voltage  $V_{DDH}$  which remains at a low level (logic 0), is applied to A. Thus, a current can flow through MN3, which is far above the sub-threshold level. This highly asymmetrical current efficiency can



Figure 3.34: Circuit diagram of down level shifter

lead to non-optimal, asymmetrical signal-to-noise ratios and thus leading to poor robustness. In addition to this, the fast decay time of the level shifter leads to steep signal edges, as a result of which other subthreshold signals go through amplified conversions. This can ultimately lead to a malfunction of the circuit. Therefore, the additional transistor MN2 is inserted, which also limits the current flow in the pull down path to subthreshold level.

| Transistor | $W$ idth [ $\mu$ m] | $L\!{\rm ength}\left[\mu m\right]$ |
|------------|---------------------|------------------------------------|
| P1         | 0.120               | 0.048                              |
| P2         | 0.750               | 0.600                              |
| N1         | 0.081               | 0.048                              |
| N2         | 0.117               | 0.048                              |
| N3         | 0.080               | 0.048                              |

Table 3.4: Transistor size of down level converters

Like the up level converter, down scaling procedure is used for down level converter as well. Pareto search is conducted to find out the optimized solutions. Table 3.4 contains the final dimensions. Montecarlo simulation with 5000 samples was performed to ensure the robustness of the circuit.

# 4 Simulation and Comparison of the Proposed Level Shifter Circuits

In this chapter, we are going to explain the simulation results. To get a better understanding of our works, a comparative study of level shifter designs is portrayed in terms of the performance metrics.

# 4.1 Simulation Results





Figure 4.1: Transient behaviour of the proposed level shifter using LVT cells

The basic operation of the level shifter is shown in Figure 4.1. The plot describes the behaviour at input node A, output node Z and the buffered output node. A buffer is connected at the output node of the level shifter circuit along with a load of 4 fF at the output of the buffer. The input rise and fall times of the source connected to node A are chosen as 10 ns at 1 MHz input signal frequency. The load capacitance leaves a little effect on the performance of the buffered level shifter. However, when connected to node Z, propagation delay increases significantly

due to the low drive strength of the main shifter stage. Rise and fall times are varied proportionally with the signal frequency which is varied from 1 MHz to 500 kHz.



Figure 4.2: Propagation delay simulated in LVT and RVT based circuits across different frequencies

The performance of both the circuits is clearly visible in figure 4.2. The input voltage is varied from 200 mV to 1 V. The propagation delay increases with the decrease of the input supply voltage. The propagation delay variation is shown in Figure 4.2a. At 300 mV, the delay of the level shifter is 8.87 ns and it reduces with the increase of input voltage. However, at near or above subthreshold region, the delay does not vary much at different frequencies.


Figure 4.3: Energy per transition simulated in LVT and RVT based circuits across different frequencies



Figure 4.4: Static power dissipation simulated in LVT and RVT based circuits across different frequencies

From the Figure 4.3a, it can be shown that the total energy dissipated per transition increases with the supply voltage. In case of LVT variants, switching energy increases suddenly after the threshold point. Whereas, it is almost linear in case of RVT variants. At 300 mV supply voltage and 1 MHz signal frequency; energy consumption of 10.48 fJ was simulated per transition.

In subthreshold operation, leakage current is expected to be the significant contributor in static power dissipation, making it an important factor for circuit optimization. However, our proposed level shifter circuit with both LVT and RVT variant shows that the dissipation increases with the input voltage (Figure 4.4a) almost linearly in case of RVT variant and exponentially in LVT variant. The amount of static power, dissipated by the level shifter at different input voltages, do not change across different input frequencies. The static power dissipation of the proposed design ranges between 34 pW and 158 pW.



Figure 4.5: Distribution of the propagation delay

A 10000-point Monte Carlo simulation was carried out for a supply voltage of 300 mV with local variations, so as to examine the sensitivity of the level shifter to process fluctuations, considering global and local variations. To avoid the uncertainty of the circuit operation, 10000 samples have been chosen. The yield generated was 100%. Figure 4.5 shows the results as histograms of the propagation delay. As stated in [4], the delay of CMOS gates in the subthrehold mode follows a logarithmic distribution. As can be seen, the normalized variance ( $\sigma/\mu$ ) is calculated as 0.10. This shows that the propagation delay of our proposed level shifter shows low sensitivity towards process variation compared to the design described in [41]. The robustness can be increased if required, by increasing the transistor size.



(a) Level shifter Layout of LVT based design





Figure 4.6: Level shifter Layout

Figure 4.6 shows the layout of both the LVT and RVT cell based designs. The cell area is manually optimized while placing the gates.



Figure 4.7: Transient behaviour of the proposed level shifter using LVT cells

# 4.1.2 Hybrid Topology Circuit

The basic operation of the level shifter is shown in Figure 4.7. The plot describes the behaviour at input node A, output node Z and the intermediate nodes X and Y. A load of 4 fF is applied at the output node. The input rise and fall times of the source connected to node A are chosen as 10 ns at 200 kHz input signal frequency. The simulation time for transient response is chosen as 15  $\mu$ s.

The temperature of the simulation environment is maintained at 27 °C. Figure 4.8a shows the propagation delay as a function of VDDL. Here, it is clearly visible that between 150 mV and 300 mV, there is an exponential drop in propagation delay. This is primarily because of the two stages used in the circuit. This also confirms that the circuit is capable of working at higher operational frequencies as the VDDL increases. P1 and N1 of the inverter circuit also contributes significantly to the propagation delay. The energy per transition increases with VDDL as shown in figure 4.8b. This happens primarily because of the increase in the static current in the pull down transistors lying in the above threshold domain i.e. N2,N3,N4 and N5. The same reason also accounts for the increase of static power dissipation as visible in figure 4.8c. At 150 mV with an input frequency of 250 kHz; propagation delay, energy per transition, and static power dissipation are simulated as 200 ns, 29 aJ and 107 pW, respectively.



(a) Propagation delay variation



(b) Variation of total energy per transition



(c) Static power dissipation variation

Figure 4.8: Comparison of the performance in terms of propagation delay, energy per transition and static power dissipation across different frequency of operations



Figure 4.9: Monte Carlo simulation representation of the proposed level shifter

A 10000-point Monte Carlo simulation was performed for a supply voltage of 150 mV with local variations, so as to understand the effects of process variations

on the level shifter characteristics. The yield generated was 100%. Figure 4.9a shows the distribution of the propagation delay. As can be seen, the mean delay ( $\mu$ ) obtained is 96.37 ns and the standard deviation ( $\sigma$ ) is 41.22 ns.

Figure 4.9b illustrates the Monte Carlo simulation of energy consumption per transition. Results show that the mean value ( $\mu$ ) of the distribution is 6.28 aJ with the standard deviation ( $\sigma$ ) being 0.12 aJ. Therefore, the coefficient of variance ( $\mu/\sigma$ ) is really low. Therefore, it can be concluded that the sensitivity of our design towards process variation is really low in terms of energy consumption.

# 4.2 Performance comparison of the state or art designs

From the last decade, a lot of different level shifter designs have been proposed. Technologies started from 350 nm to 65 nm have been explored with different topologies. Here, in Table 4.1 the performance metrics of the state of the art designs have been presented in a tabular form. We have also included the performance data of our proposed designs, to get a better understanding of the quality of our designs with respect to the state of the art designs.

| Citation     | Minimum<br>VDDL<br>(mV) | V <sub>DDH</sub><br>(V) | Level Shift<br>∆V<br>(mV) | Maximum<br>Frequency<br>(MHz) | Delay<br>(ns)                                                                                     | DI <sub>f</sub><br>(%) | Energy/op<br>(pJ)                                            | Static<br>Power<br>(nW)     | Area<br>(μm <sup>2</sup> ) | Number of<br>Transistors | Topology<br>(CC/CM/HB/<br>MT/OT) | Technology<br>(nm) | Year         |
|--------------|-------------------------|-------------------------|---------------------------|-------------------------------|---------------------------------------------------------------------------------------------------|------------------------|--------------------------------------------------------------|-----------------------------|----------------------------|--------------------------|----------------------------------|--------------------|--------------|
| [26]         | 100                     | 1.2                     | 1100                      | 1                             | 50 @0.2 V                                                                                         | 10                     | 25 @(0.2 V,<br>500 kHz)                                      | 8 @200 mV                   | 3.56 <sup>1</sup>          | 10                       | CC                               | 130 nm             | 2006         |
| [28]         | 127                     | 1.8                     | 1673                      | N.A. <sup>2</sup>             | 10e3 @ 127 mV <sup>3</sup>                                                                        | N.C. 4                 | 300 @ (0.2 V,<br>100 kHz) <sup>5</sup>                       | 20E+03 3                    | N.A. <sup>2</sup>          | 10                       | СС                               | 180                | 2007         |
| DSLS1b [27]  | 350                     | 1.2                     | 850                       | 0.2                           | 252                                                                                               | 10.08                  | 17                                                           | 5610 <sup>6</sup>           | 44.60 6                    | 8                        | CC                               | 250                | 2008         |
| DSLS2 [27]   | 350                     | 1                       | 650                       | 0.2                           | 125                                                                                               | 5                      | 0.4                                                          | 4500                        | 44.10                      | 8                        | CM                               | 250                | 2008         |
| DSLS2b [27]  | 350                     | 1                       | 650                       | 0.2                           | 110                                                                                               | 4.4                    | 0.8                                                          | 4500 <sup>6</sup>           | 44.10 <sup>6</sup>         | 8                        | CM                               | 250                | 2008         |
| SSLSb [27]   | 350                     | 1                       | 650                       | 0.2                           | 161                                                                                               | 6.44                   | 3.5                                                          | N.A. <sup>2</sup>           | 44.10                      | 6                        | OT                               | 130                | 2008         |
| [29]         | 300                     | 2.5                     | 2200                      | N.A. <sup>2</sup>             | 3.97 FO4                                                                                          | N.C. 4                 | 0.102                                                        | 0.121                       | 11.11                      | 23                       | OT                               | 130                | 2008         |
| MDCVSHS [56] | 180                     | 1.0                     | 820                       | 1                             | 32                                                                                                | 6.4                    | 0.017                                                        | 2.5                         | N.A. <sup>2</sup>          | 17                       | MT,CC                            | 90 nm              | 2009         |
| MDCVSLP [56] | 180                     | 1.0                     | 820                       | 1                             | 120                                                                                               | 24                     | 0.021                                                        | 1                           | N.A. <sup>2</sup>          | 17                       | MT,CC                            | 90 nm              | 2009         |
| MDCVS65 [56] | 350                     | 1.2                     | 850                       | 1                             | 64                                                                                                | 12.8                   | 0.023                                                        | 0.084                       | N.A. <sup>2</sup>          | 17                       | MT,CC                            | 65 nm              | 2009         |
| [31]         | 180                     | 1.2                     | 1620                      | N.A. <sup>2</sup>             | 57.9                                                                                              | N.C. 4                 | N.A. <sup>2</sup>                                            | 717                         | 96                         | 11                       | CC                               | 130                | 2010         |
| [41]         | 100                     | 1                       | 900                       | 1                             | 18.4                                                                                              | 3.68                   | 0.093                                                        | 6.6                         | 1.38                       | 11                       | CM                               | 90                 | 2010         |
| [43]         | 400                     | 3                       | 2600                      | 0.001                         | 80000                                                                                             | 16                     | 5.80 @ (0.4 V,<br>10 kHz)                                    | 0.230                       | 103.50                     | 16                       | OT                               | 350                | 2011         |
| [44]         | 400                     | 3                       | 2600                      | 0.01                          | 10000                                                                                             | 20                     | 5.80 @ (0.4 V,<br>10 kHz)                                    | 0.230                       | 1880                       | 16                       | OT                               | 350                | 2011<br>2011 |
| [37]         | 230                     | 2.5                     | 2270                      | 1                             | 41.5                                                                                              | 8.3                    | 0.229                                                        | 0.475                       | 102.3                      | 35                       | CC                               | 130                | 2011         |
| [57]         | 300                     | 2.5                     | 2200                      | 8                             | 125 @ 0.3 V                                                                                       | 25                     | 1.7 @(0.3 V,<br>8 MHz)                                       | 13.6E+03                    | 111800                     | N.A. <sup>2</sup>        | OT                               | 130                | 2011         |
| [45]         | 230                     | 3                       | 2770                      | 0.01                          | 10000                                                                                             | 20                     | 5.80E+00                                                     | 0.230                       | 1880.00                    | 16                       | OT                               | 350                | 2012         |
| [38]         | 300                     | 2.5                     | 2200                      | 1                             | 58.8                                                                                              | 11.76                  | 1.91e-01                                                     | 0.724                       | 71.94                      | 12                       | CC                               | 130                | 2012         |
| [35]         | 180                     | 1                       | 820                       | 1                             | 21.8 @ 0.2 V                                                                                      | 4.80                   | 7.40E-02 @(0.2 V,<br>1 MHz)                                  | 6.4 @ 0.2 V                 | 36.50                      | 13                       | MT,CC                            | 90                 | 2012         |
| [58]         | 150                     | 1                       | 850                       | N.A.                          | 52 @ 0.15 V                                                                                       | N.A. 2                 | N.A.                                                         | 21                          | N.A.                       | 12                       | MT,CC                            | 65                 | 2012         |
| [46]         | 210                     | 3.3                     | 3090                      | 0.1                           | 166.9 @ 0.3 V                                                                                     | 3.34                   | 3.90E-02 @(0.3 V, 100 kHz)                                   | 0.160                       | 153.00                     | 17                       | CM                               | 180                | 2013         |
| [39]         | 400                     | 1.8                     | 1400                      | 1                             | 30                                                                                                | 6                      | 3.27E-02                                                     | 1.3E-02                     | 120.90                     | 16                       | CM,CC                            | 180                | 2014         |
| [52]         | 165                     | 1.2                     | 1035                      | 0.07                          | <162@ 0.3 V 7                                                                                     | 4.20                   | 1.36E+05                                                     | 0.866 <sup>8</sup>          | 16.80                      | 16                       | CM                               | 65                 | 2014         |
| [34]         | 120                     | 1.2                     | 1080                      | 72                            | 66@0.2 V                                                                                          | 13.20                  | 2.80E-02 @(0.3 V,<br>72 MHz)                                 | 0.640                       | 7.80                       | 12                       | CC                               | 65                 | 2014         |
| [48]         | 320                     | 1.8                     | 1480                      | 1                             | 31@0.4 V                                                                                          | 6.20                   | 6.80E-01@(0.4 V,<br>1 MHz)                                   | 1.160@0.4 V                 | 120.90                     | 9                        | CM                               | 180                | 2015         |
| [54]         | 50                      | 1.8                     | 1750                      | 100                           | 10.43@(0.4 V,<br>10 kHz)                                                                          | 0.02                   | 1.42E+05@(0.45 V,<br>1 MHz)                                  | 9.890@(0.4 V,10 kHz)        | 229.50                     | 14                       | OT                               | 180                | 2015         |
| [50]         | 190                     | 3.3                     | 3110                      | 0.1                           | 21.4                                                                                              | 0.43                   | 2.40E-01@(0.4 V,<br>100 kHz)                                 | 0.150                       | 95.60                      | 20                       | HB                               | 180                | 2015         |
| [47]         | 210                     | 3.3                     | 3090                      | 0.1                           | $167^{9}$<br>(0.3 V $\rightarrow$ 1.8 V)                                                          | 3.34                   | 9.54E-01<br>(0.3 V $\rightarrow$ 3.3 V)                      | 0.970                       | 153.01                     | 18                       | CM                               | 180                | 2015         |
| [59]         | 145                     | 1.2                     | 1055                      | 0.008                         | 200                                                                                               | 0.32                   | 1.20E+00                                                     | N.A.                        | 466.00                     | 10                       | OT                               | 130                | 2015         |
| [36]         | 100                     | 1                       | 900                       | 1                             | 16.6@0.2 V                                                                                        | 3.32                   | 7.70E-02 @(0.2 V,<br>1 MHz)                                  | 8.7@0.2 V                   | 37.30                      | 15                       | MT,CC                            | 90                 | 2015         |
| [53]         | 200                     | 1                       | 800                       | 1                             | 20.17                                                                                             | 4.03                   | 1.13E-01                                                     | 11.070                      | N.A.                       | 12                       | CM                               | 90                 | 2015         |
| [33] Type II | 350                     | 1                       | 650                       | 0.2                           | 3.8                                                                                               | 0.15                   | 4.00E-02                                                     | 1.000                       | 96.00                      | 8                        | CM                               | 65                 | 2015         |
| [32]         | 140                     | 1.2                     | 1060                      | 1                             | 25 @0.3 V                                                                                         | 5.00                   | 3.07E-02 @(0.3 V,<br>1 MHz)                                  | 2.500                       | 17.60                      | 27                       | CC                               | 65                 | 2015         |
| [33] Iype I  | 350                     | 1                       | 650                       | 10                            | 2.8                                                                                               | 5.60                   | 4.00E-02                                                     | 0.700                       | 96.00                      | 15                       |                                  | 65                 | 2015         |
| [60]         | 400                     | 1.2                     | 800                       | 500                           | 0.3 @0.4 V                                                                                        | 30                     | 2.15E+00 @0.4 V                                              | N.A.                        | 243.60                     | 8                        | 01                               | 65                 | 2015         |
| [49]         | 360                     | 1.8                     | 1440                      | 0.1                           | 30 @0.4 V<br>26.5                                                                                 | 6<br>0.53              | 1.57E-01<br>1.40E-01 @(0.4 V,                                | 0.30 @0.4 V<br>0.100 @0.4 V | 103.00<br>128.30           | 16                       | CM                               | 180                | 2016         |
| [42]         | 85                      | 1.2                     | 1115                      | 100                           | $(0.4 \text{ V} \rightarrow 1.8 \text{ V})$<br>21.65@ $(0.2 \text{ V} \rightarrow 1.2 \text{ V})$ | 4.33                   | 1.94E-02                                                     | 1.690                       | N.A.                       | 18                       | MT,HB                            | 65                 | 2016         |
| [61]         | 200                     | 1                       | 800                       | 5                             | 12.0                                                                                              | 12.00                  | 2 04E 01                                                     | © (0.2 V → 1.2 V)           | 1921.00                    | 16                       | OT                               | 45                 | 2016         |
| [51]         | 100                     | 12                      | 1100                      | 0.254                         | 137@02V                                                                                           | 0.70                   | 9.09E+04                                                     | 1.240                       | 31.30                      | 16                       | CM                               | 65                 | 2016         |
| [69]         | 100                     | 1.0                     | 1700                      | 0.1                           | 317                                                                                               | 0.62                   | 1.73E.01                                                     | 0.05560.437                 | 102.00                     | 14                       |                                  | 180                | 2017         |
| [62]         | 100                     | 1.0                     | 1700                      | 0.1                           | $(0.4 \text{ V} \rightarrow 1.8 \text{ V})$                                                       | 1.72                   | $(0.4 \text{ V} \rightarrow 1.8 \text{ V} @100 \text{ kHz})$ | 0.03560.4 V                 | 220.50                     | 19                       | a                                | 180                | 2017         |
| lcol         | 500                     | 1.0                     | 1500                      | 0.5                           | $(0.4 \text{ V} \rightarrow 1.8 \text{ V})$                                                       | 1.75                   | $(0.4 \text{ V} \rightarrow 1.8 \text{ V} @500 \text{ kHz})$ | 0.27060.4 V                 | 229.30                     | 12                       | CM                               | 100                | 2017         |
| [64]         | 200                     | 1.1                     | 900                       | 1                             | 66.48 @0.3 V                                                                                      | 13.30                  | 7.23E-02@0.3 V                                               | 0.088@0.3 V                 | 14.48                      | 13                       | CM                               | 40                 | 2017         |
| LVTLS        | 270                     | 1.2                     | 930                       | 1                             | 24.6                                                                                              | 4.92                   | 1.09E-04                                                     | 0.091                       | N.A.                       | 11                       | CM                               | 28                 | 2018         |
| RVTLS        | 250                     | 1                       | 950                       | 1                             | 8.87                                                                                              | 1.77                   | 1.00E-02                                                     | 0.037                       | N.A.                       | 11                       | CM                               | 28                 | 2018         |
| HBLS         | 150                     | 1                       | 850                       | 0.25                          | 200                                                                                               | 10.00                  | 2.90E-05                                                     | 0.107                       | N.A.                       | 11                       | HB                               | 28                 | 2018         |
|              |                         |                         |                           |                               |                                                                                                   |                        |                                                              |                             |                            |                          |                                  |                    |              |

Table 4.1: Measurement comparison of LS designs

<sup>1</sup> As mentioned in [41]

<sup>2</sup> Data not available <sup>3</sup> As mentioned in [58]

As mentioned in [58]
Not enough data to calculate
As mentioned in [34]
Replicated in [36]
As reported in [47]
As reported in [47]

As reported in [62]
As reported in [62]

One of the oldest of these designs dates back to 2006. Then onwards, we have studied the circuits developed during the following years. While building the table, the major focus have been around the performance parameters mentioned in Section 2.2. In addition to that, the minimum operating voltage on the lower voltage side, the maximum operating voltage one the higher voltage side and the highest operating frequency of the level shifter circuit are also considered. Here, we had two important observations.

Firstly, the minimum operating voltage as well as the maximum operating voltage vary across these level shifter circuits. As a result, it is difficult to draw a comparative picture based on just these two factors. However, if we consider the successful conversion range of the individual designs, that shows as one of the performance parameters of the circuit. The working range of the circuits has been showed in the table as  $\Delta V$ .

Our next observation was related to the operating frequency of the level shifter circuits. The operating frequency of the circuit tell us about the working speed of the circuit. However, the circuits mentioned in the table are operating in subthreshold domain making them really sensitive. As a result, the robustness of the circuit is prioritized with respect to the speed. The impact of the propagation delay therefore, needs to be measured. We defined a term  $DI_f$  to calculate the impact of delay on the operating frequency of the circuit.

The time period of the circuit is given by

$$T = \frac{1}{f} \tag{4.1}$$

where *T* denotes the time period, which is measured in seconds. The impact of delay will be measured as the ratio of propagation delay with the half cycle of a clock pulse in percentage. If this value is significantly high, the fanout of the circuit will be low. As a result, it will be possible to determine whether a circuit output can be really used within a system without any loss of signal power. Mathematically we can express  $DI_f$  as

$$DI_f = \frac{t_{pd}}{T/2} \times 100\% \tag{4.2}$$

It is evident from the 4.2 that  $DI_f$  is unitless. We can substitute *T* from 4.1 in 4.2. This gives us

$$DI_f = 2ft_{pd} \times 100\% \tag{4.3}$$

In Section 2.2, the importance of switching energy and static power dissipation had been explained in details as major contributors to the performance metrics of the circuit. If the operating frequency increases, the switching energy becomes trivial with the increase of the load. The circuit should dissipate zero power in ideal condition. However, when the input voltage is as low as close to or below the threshold voltage of the transistor, leakage current increases resulting static power dissipation.

Chip area is a factor of concern with the growing cost of fabrication. Though it varies across the technologies. The silicon realization cost is more in deep nano technologies. Considering the level shifter is just a standalone circuit, if we are still ready to compromise on this point, we cannot underestimate the area contribution of the level shifter circuit in the full chip area.

We included the number of transistors along with the area. Because the transistor numbers not only add to the area of chip but also to the complexity of the layout design. The CAD tools available in the current era can minimize significant amount of time for the layout designs. However, for complex analog circuits, it is still preferable to contribute manual effort. We must not rule out the fact that a simple design is always easy to reproduce making it convenient for commercial purpose.

To understand the designs in deep, it is important to understand the design topology. We observed that most of the existing level shifter circuits are based on either Cross-Coupled (CC) topology or Current Mirror (CM). Few research groups also explored a Hybrid (HB) topology which combined both CC and CM topologies with an interest to bring the best of both the worlds together. Apart from these three, there had been few works where CMOS with different threshold voltages were used together within a single circuit. We classified them as Multithreshold (MT) topology. The circuits which cannot be classified into either of the above mentioned categories, are labelled as Others (OT).

The technology used for the design along with the proposed year completed the table. Definitely we had the intention of observing the effects of technology on the circuit performance. At the same time, it was necessary to see whether the circuit improved with time.

If we carefully observe the Table 4.1, it can be seen that data related to some works at certain data fields are missing. Without all the information, it is impossible to make a complete comparative study of designs. To solve this problem, the study is subdivided into following parts. However, we have selected three parameters;technology, design topology and the year of proposed design, which remain constant in all the following comparisons.

## 4.2.1 Supply Voltage and Conversion Range

A subset of Table 4.1 is selected here to focus on the minimum operating voltage of the circuit and the operating range of the same. These values are plotted along with the technology used for the circuit implementation. It can be observed in the Figure 4.10 that most of the designs are implemented using 130-65 nm technologies. Over the last few years, 65 nm technology has been more often chosen for the implementations.

Only in case of few ([29], [43], [44], [37], [57], [45], [38], [46], [50] and, [47]), the designs are capable of converting the minimum input voltage to an output voltage bigger than 2.5 V. Otherwise the focus has always been to have an operating range between 1 and 1.5 V. Another observation is that with time the current mirror topology was preferred compared to others.

## 4.2.2 Operating Frequency and Propagation Delay

Our next interest is the operating frequency of the circuit along with the propagation delay. As discussed in the Section 4.2, the impact of propagation delay or  $DI_f$ will play an important role in this comparative study.

The frequency of the operation is an important topic in case of level shifter operations. In our study, we have discussed level shifter circuits capable of converting a sub-threshold level voltage to an above threshold voltage. But in subthreshold operations, the frequency of operation is considered as a trade-off. The Figure 4.11 shows us the trend of operating frequency of the level shifter circuits in last decades. We have chosen the impact of propagation delay as another reference metric in the study.

It is clear from the Figure 4.11 that, most of the designs are focused on 1 MHz operation. In recent years, there had been few designs([42], [60],[54] and, [34]) with an operating frequency of exact or nearly 100 MHz. However, the impact of delay remained approximately same over the years barring a few exceptions.

There have been designs such as [61][42][51][60][33][53][36][35][41][48][50][54] with significantly low delay. However, in most of the cases, they dissipate significant amount of power. Only the design in [50] is an exception in terms of static power dissipation.

## 4.2.3 Switching Energy and Static Power Dissipation

With our main motivation of research being low power consumption, we come to one of the most important phase of the study. Figure 4.12 shows a graphical comparison of the level shifter circuits with respect to switching energy and static power dissipation along with the chip area.

Due to non-availability of data, it is difficult to draw a comparative picture with all the three parameters. Consequently, switching energy and static power dissipation has been depicted separately in Figures 4.13 and 4.14. From both the figures, it can be observed that the switching energy and static power dissipation reduced over the last few years among the proposed designs. Even with the deep nano processes, the designs did not show any improvements.

## 4.2.4 Chip Area

The cost of silicon increases directly proportional to the chip area. Therefore, a robust design with plenty of transistors may add to the cost problem. Figure 4.15 shows that the proposed designs did not pay much heed to the chip area. [23] and [25] proposed designs with significantly low chip area.

The operating frequency of the design is another key parameter. As seen in the Table 4.1, in 65 nm technology there have been few works [42][51] with frequencies as high as than 50 MHz. The manufacturing cost increases with the area of the design. While reducing the delay and power consumption of the circuit, the complexity of the design is often ignored. This results in more transistors being added in the circuit increasing the area of the cell.



Figure 4.10: A graphical representation of the operating range of the level shifter circuits across different technologies in the last couple of decades















# 5 Low Power SRAM

Volatile storage is an important component in many digital systems, where it is needed for applications such as instruction memory, data memory, cache, FIFOs, register files and scratchpad memories. SRAM offers a perfect solution as the volatile storage.



# 5.1 SRAM market trend

Figure 5.1: A graphical representation of the values mentioned in the table 5.1

As the memory of a system occupies almost 60-80% of the chip area, an effort was always taken to reduce the SRAM area. Stanford nanoelectronics lab studied the area scaling trend [65]. It is shown in a tabular form in Table 5.1.Here it can be clearly observed how the SRAM area was reduced from year 1994 to 2000. The same trend is followed over the next decade. The data is presented in a graphical form in Figure 5.1. The Contacted Gate Pitch (CGP) and Metal Layer 1 (M1) also follow a linear trend as portrayed in Figure 5.1.

| Year | SRAM               | CGP    | M1              |  |  |  |
|------|--------------------|--------|-----------------|--|--|--|
|      | (µm <sup>2</sup> ) | (µm)   | (µm)            |  |  |  |
| 1994 | 20.5               | 0.92   | 0.88            |  |  |  |
| 1996 | 10.26              | 0.64   | 0.64            |  |  |  |
| 1998 | 5.59               | 0.48   | 0.5             |  |  |  |
| 2000 | 2.09               | 0.336  | 0.35            |  |  |  |
| 2002 | 1                  | 0.26   | 0.22            |  |  |  |
| 2004 | 0.57               | 0.22   | 0.21            |  |  |  |
| 2007 | 0.346              | 0.16   | 0.16            |  |  |  |
| 2009 | 0.148              | 0.1125 | 0.1125          |  |  |  |
| 2012 | 0.092              | 0.09   | 0.08            |  |  |  |
| 2014 | 0.0588             | 0.07   | 0.052           |  |  |  |
| 2016 | 0.04               | 0.064  | 0.048 (Samsung) |  |  |  |

Table 5.1: Table containing year wise data of SRAM area, CGP and M1 Pitch width

# 5.2 Power Reduction Techniques

SRAM circuits contribute substantially to the total power consumption of the system. Even in the standby mode, there is no exception. For example, the portable devices are typically run on one lithium-ion battery of about 3000 mWH (1000 mAH). To manage the effect of temperature variation, the peak active power has to be held under 1 W together with the low battery resources. The RF amplifier, the LCD display, and the baseband system; which contribute to the standby power consumption of smart-phones, should not be more than 0.5 to 1.0 mW [66]. Though, the contents of the memory must not be harmed in the process.

Energy efficiency is another concern in view of the internet of things consisting of billions of nodes. During the low throughput requirement, it is essential to reduce the access energy. With the view of our research, the last but not the least is to operate the SRAM at the supply voltage as the other portion of the system. That means, we need an SRAM circuit which will operate at subthreshold voltages. When scaling down the supply voltage of digital circuits, the minimum operating voltage V<sub>min</sub> of SRAM, is however often considered as the limiting factor[7].

The reduction of power consumption in SRAM circuits comes with reduced static noise margin, poor write margin, reduced I<sub>on</sub>/I<sub>off</sub> ratio (limited number of cells per bitline), and reduced bitline sensing margin [67]. The following section contains few previously proposed circuit design techniques to reduce power

consumption in SRAM.

#### 5.2.1 Manipulation of supply voltage

The focus of several researchers have been the reduction of power supply [68–70]. As the power consumption varies quadratically with the supply voltage ( $V^2$ ), lowering the supply voltage can bring the power consumption down. A microarchitectural technique is proposed in [68], where about 80% of the data cache lines can be maintained in a sleep state with a negligible loss in performance. Data Retention Voltage (DRV) concept is proposed by the researchers in [69]. DRV is defined as a function of process variations, chip temperature, and transistor sizing. Their model, implemented in 130 nm CMOS technology, shows 90% leakage-power reduction at sub-300 mV without any data loss in SRAM. The leakage power was reduced by reducing the Drain Induced Barrier Lowering (DIBL) effect in [70]. The authors in [71] proposed a transient negative bitline voltage to improve the write margin of the bitcell.

The work in [72] used two supply voltages. The higher supply voltage is chosen during the read operation, so that the a positive differential voltage between the cell and the word line can increase the read stability. Whereas, lower supply voltage during the write operation, produces a negative differential voltage between the cell and the word line which improves the write margin, thereby making the cell data easier to flip.

The ground level supply can be increased as another alternative to power supply scaling. The work in [73] introduced a charge-recycle offset-source driving scheme. The power consumption reduction is one-fourteenth to the circuit proposed in [74], according to the simulation results. During the read and write operations, source line of the SRAM bitcells in [74] are respectively set to a negative and high impedance voltage. Thus the access time is improved.

## 5.2.2 Read/Write Assist Circuitry and Bitline and Wordline Signal Manipulation

A rectangular diffusion cell and a delta-boosted array voltage as shown in Figure 5.2, is utilized in the SRAM array in [75]. The impact of process variations is a concerning factor in low voltage operation. The rectangular diffusion cell reduces the impact of process variations by decreasing the pattern fluctuation. However, it comes at the cost of a reduced static noise margin. The DBA scheme compensates the SNM. The write margin of the bitcell is reduced by the DBA scheme. Pull-up of the SRAM is built with transistors with higher threshold voltage to compensate for the write margin.

To write back the original data, a read assist circuit [Figure 5.3] is used in



Figure 5.2: Schematic of the DBA scheme proposed by [75]



Figure 5.3: Read assist technique proposed by Pilo et. al[76]



Figure 5.4: The replica structure proposed by [78]

[76], which produces full bitline amplification to half-selected columns. A sense amplifier is required for each column in this scheme. Additionally, write margin is increased during a write operation by providing a lower power supply voltage to the write-only columns.

[77] used a hierarchical bitline and a local sense amplifier. The capacitance and write swing voltage of bitlines are reduced in this scheme, which reduces the write power consumption without degrading the noise margin. The SRAM test chip consumes 26 mW while reading and 28 mW while writing when operated with 2.5 V supply at 200 MHz.

A similar method is used in [78]. It can be seen from the Figure 5.4 where a reference voltage is produced to track the delay of the bitlines. The impact of

process variation is reduced here. The WL pulse width is minimized here to the maximum required value, which helps to reduce the BL swing thereby reducing the power consumption.

The authors in [79] presented a power-line-floating technique combining with a process-variation-adaptive write replica circuit. It can be seen in the Figure 5.5b that the columns are selected to apply either of the techniques. The supply voltage requirement is reduced because of this combined technique.

The SRAM cell stability in single source microprocessors is improved by a pulsed-BL and a pulsed-WL technique [80]. The BLs are discharged to 100-300 mV lower than the nominal supply voltage. The pulsed-WL scheme is shown in Figure 5.6a. Thus the cell current decreased, but at the cost of SNM. A read-modify-write scheme as shown in Figure 5.6b, compensates the reduction of WM. To adapt to process and temperature variations, the techniques are made programmable. The cell failure rate is 15 times improved by the pulsed-WL technique. According to the simulation results, 26 times read stability is achieved with an overhead of 4-8%, when both these schemes are applied.

#### 5.2.3 Bitline Leakage Reduction

In SRAM memories, bitline leakage is the source of several problems. It contributes to the leakage power and temperature in the standby mode. When the complement of the data is stored by the non-accessed cell in the accessed cell, the leakage current becomes maximum. The bitline leakage introduces an extra delay in the memory cell during the read operation as it might be defied by the read current I<sub>cell</sub>. The bitline leakage might contribute to a false read.

The subthreshold leakage of the non-accessed cells is reduced in [70] by reducing the voltage of the non-accessed WLs to a negative value. As a negative  $V_{GS}$  is created here on their access transistors. An additional circuitry is required here to produce the negative voltage.

Usage of high threshold-voltage transistors for the access transistors is proposed in [81] to eliminate the impact of BL leakage on performance and noise margin. The leakage currents of the bitlines is reduced by applying a negative WL voltage to the non-accessed transistors and reducing the supply voltage of BL and bitcells from the nominal supply. The BL delay is improved by 23% as compared to the conventional designs. Though, the disposition of multiple supply voltages brings a reliability issue. The approach proposed in [82] measures the actual leakage current and compensates accordingly, thereby adding an extra delay into the process.

Two extra transistors are used in the 6T cell [Figure 5.7] to compensate the BL leakage current [83]. This strategy enforces the worst-case leakage not only on one BL, but also on both. This guarantees the same leakage on both bitlines. The



(b) Write and read replica circuit





(a) Pulsed WL scheme proposed by(b) PWL with Read-Modify-Write [80] (PWL-RMW)



proposed bitcell although is 40% larger, the SRAM memory area is 6% smaller as it integrates 256 rows per column.

## 5.2.4 Transistor Level Techniques

It has been seen that increasing the channel length in some CMOS technologies such as 90 nm CMOS technology [84], the performance in the subthreshold region is improved when the channel length is increased. In low voltage applications, this technique is definitely advantageous.

A logic gate structure as shown in Figure 5.8, is proposed in [85] which reduces the input gate signal swing. The power consumption decreases here as the logic gate reduces the signal swing on high capacitive lines in SRAM. The memory is fabricated in 250 nm CMOS technology using the proposed logic gate which dissipates 0.9 mW at 1 V with a frequency of 100 MHz. The disadvantages of this technique are a low noise margin and the need for level shifter.



Figure 5.7: The memory cell proposed by [83]



Figure 5.8: Half-swing pulse-mode AND gate proposed by [85]



Figure 5.9: SRAM cell proposed in [89]

#### 5.2.5 Subthreshold Bitcell Design

A common access transistor is shared for the read and write operations within the conventional 6T SRAM. Therefore, while reducing the supply voltage lower than 0.7 V [86], SRAM parameters such as noise margin are severely degraded. To achieve the read and write operations through different access transistors, additional transistors are introduced to the conventional 6T SRAM bitcell.

The stability of the bitcell during the read and write operations is predicted by Monte Carlo simulations. It consumes a significant time and resources A fast analytical method is proposed in [87] to address this concern, which estimates the failure probability of the SRAM cell due to parametric variations.

In 2005, an SRAM circuit capable of low voltage operation is introduced [88]. A Fast Fourier Transform (FFT) processor is designed with the SRAM subsystem, which is capable of operating at 180 mV at 164 Hz while consuming 90 nW of power. It is shown here how the bitcells are susceptible to process variation which makes the read and write operations of the 6T SRAM bitcell difficult while operating below 500 mV. A multiplexer-tree based decoder decreasing the number of cells connected to bitlines, is used to alleviate the problem of process variations. Significant area overhead remains a drawback of this design. The performance for commercial applications [89][90] was also not acceptable.

A 6T SRAM [Figure 5.9] with gated-feedback write-assist technique [89] is fabricated in 130 nm CMOS technology which shows robust operation below 200 mV. However, it shows a 36% improvement in energy consumption over the one presented in [88] whereas occupying half of the area. Random Dopant Fluctuation (RDF) is the sole contributor to the process variation in the subthreshold domain. A



Figure 5.10: Read SNM free 7T cell proposed by Takeda et. al [91]

single-ended cell with a gated-feedback write-assist along with transistor upsizing is used to mitigate the effect of RDF. As proposed here, the noise margin variation can be reduced significantly if the transistor sizes are increased 6.5 times at 0.3 V.

To overcome the speed limits of conventional SRAMs, a 7T read-SNM free SRAM [Figure 5.10] is proposed in [91]. The high-speed and low-voltage operation is achieved here by reducing the threshold voltage of the nMOS transistors to the threshold voltage of logic gates. The read-SNM is significantly improved due to the addition of the 7th transistor, which also eliminates the half-selected issue during the write operation. The cell stability is improved by means of the reduction of the voltage level of the WL. However, the performance of this circuit limits below 0.5 V. The area overhead of the bitcell is 11% more than the conventional 6T SRAM. Because the reduced performance, only 8 bitcells can be connected to the BLs.

The 7T SRAM proposed in [81], has an nMOS transistor at the VSS node of the 6T bitcell, which reduces the BL swing to  $V_{\rm DD}/6$  leading to 90% write power reduction.

A 10T bitcell design[Figure 5.11] using a full-swing single-ended read process is proposed by the authors in [90][92]. During the read process, the stored data is buffered so that the read-SNM improves significantly. As a result, the worst read-SNM is equal to 6T hold-SNM. The cell area occupies 66% more than the conventional 6T cell. At 0.6 V, the leakage power is 2.25 times less than the conventional 6T cells. The WL is boosted by 100 mV above the nominal supply voltage to improve the impact of process variation. A floated supply voltage during helps to achieve the write operation in subthreshold domain. Below 400 mV supply voltage, both the read and write operations consume  $3.28 \,\mu\text{W}$  at 475 kHz.

The 10T SRAM bitcell proposed in [93] has an improved bitcell stability. The



Figure 5.11: A 10T SRAM cell schematic proposed by [90]



Figure 5.12: Schmitt-triggered based 10T SRAM cell in [93]



Figure 5.13: SRAM proposed by [67]

effect of process variation is assuaged by using a Schmitt-trigger technique [Figure 5.12] which produces a built-in feedback mechanism. This bitcell has a 1.56 times improved SNM as compared to the conventional 6T bitcell. As shown in simulation, a feedback mechanism is more useful than transistor upsizing in a conventional 6T bitcell. The tescthip fabricated in the 130 nm CMOS technology shows robust functionality at a supply of 160 mV.

Kim et al. [67] propose a design combining several techniques to overcome the challenges of the conventional 6T bitcell operating at low voltage, as shown in Figure 5.13. Four extra transistors are added to the 6T bitcell to decouple the read path. The WM is improved by exploiting the reverse short channel effect. A virtual ground replica method is proposed to improve BL sensing margin. Here the BL leakage is independent of the data stored in the bitcell. As a result, a high number of bitcells can be accommodated in each column. The measurement results show that 1024 cells on a BL is functional with the supply of 0.20 V at 120 kHz (27°C).

The authors in [94] proposed a subthreshold multi-threshold 9T bitcell. During the read operation, this design allows the retention nodes to remain disconnected from the BL. The length of the back-to-back transistors are increased to enhance the stability and reduce the power consumption. The limited number of bitcells per column guarantees that the samples don't fail due to BL leakage. Due to the less susceptibility of pMOS transistors to process variations [94], they are used as access transistors. The minimum energy per operation happens in the range from 0.30 V to 0.35 V from 529 fJ to 620 fJ for an array of 64 x 32 blocks.

Another 10T bitcell design is proposed in [95]. The design allows bit-interleaving with the column-wise write access control. The read and write processes are



Figure 5.14: The SRAM circuit proposed by [98]

separated here as it allows a differential read path. During the hold mode the GND of the bitcell is virtually forced to  $V_{DD}$  whereas the virtual GND is forced back to 0 during the read operations. This is essentially done to reduce the leakage current. This design can operate successfully below 300 mV. As suggested by the authors, the supply voltage can be scaled down to 160 mV with aggressive word line boosting. This bitcell design is further exploited in [96], [97] with the leakage measured as 1.83 pW/bit at 250 mV supply and 25°C.

To minimize the area and supply voltage, a read-BL swing expansion technique is proposed in an L-shaped 7T SRAM [98]. The decoupled 1T read port of this circuit, as shown in Figure 5.14, improves the WM significantly. A boosted BL here secures the sensing margins. This circuit is fabricated in 65 nm as a part of 256 row 32 kb L7T SRAM array operating successfully at 260 mV supply.

The 12T subthreshold SRAM in [99] proposed a data aware power cut-off write assistance, which eliminates read disturb half-select problem. The bitcell as depicted in Figure 5.15, in a 4 kb memory array can successfully execute a read operation at 350 mV. At 300 mV it can perform the write operation. The design is implemented in 40 nm general purpose CMOS technology. The memory can



Figure 5.15: 12T SRAM proposed by [99]

operate at a maximum frequency of 11.5 MHz consuming  $22 \mu W$  total power with 350 mV supply. At 450 mV input supply, it achieves the minimum energy consumption of 1.6 pJ.

The symmetrical and differential 8T bitcell [Figure 5.16] proposed in [100], uses a zigzag shape layout with a motive to achieve a compact area and fully symmetric device placement for a litho-friendly layout. The advantage of this design with respect to the conventional 8T SRAM [101], is higher access speed because of the differential sensing. Moreover, the cell area is reduced by 15%. The measurement shows that the minimum supply required for the 256 row 32 kb memory array is 430 mV, and for the 32 row 4 kb memory array is 250 mV. They are fabricated in 54 nm CMOS technology. A 256 row 64 kb memory is also fabricated in 90 nm CMOS technology using the same design, which can operate with a minimum supply of 230 mV.

A data randomized system level approach is proposed in [102] to reduce the SRAM supply voltage for image and video applications. Here the distribution of the 0 and 1's is made close to 50% by randomizing the stored data in columns so that the worst case scenario can be avoided. This approach helped the 8T bitcell in [101] to operate at 200 mV.

The 9T SRAM bitcell [Figure 5.17] in [103] proposed a BL leakage equalization and Content Addressable Memory (CAM) assisted performance boosting technique. The CAM assisted boosting technique improved the write performance.



Figure 5.16: SRAM proposed in [100]



Figure 5.17: SRAM design in [103]



Figure 5.18: SRAM design in [105]

The slow data development after data flipping is concealed by the inserted little CAM. Subsequently it improved the overall operating frequency of the circuit. A 16 kb SRAM with this design is fabricated in the 65 nm CMOS technology, which consumes a minimum energy of 0.33 pJ at 400 mV supply.

The authors in [104] presented a two port disturb-free 9T subthreshold SRAM memory with separate single ended read BL and write BL. The variation tolerant line up write assist technique improves the writeability of the proposed circuit. The 72 kb SRAM array proposed here is fabricated in 40 nm CMOS technology which can operate at 260 MHz with 1.1 V input supply and 450 kHz with 320 mV.

## 5.2.6 Application Specific Techniques

The last but no less important room for the improvement in energy consumption is to manipulate the target application space such as image processing. Along with the savings already achieved through supply voltage scaling, the application space exploration while designing SRAMs, can result in additional energy savings. These savings can be reaped at the algorithm and architectural levels.

Wang et. al [105] proposed an embedded subthreshold SRAM [Figure 5.18] for a quality scalable and high profile video decoder. Power-gating techniques and multi-output dynamic circuits are developed along with employing the conventional 7T bitcell, to achieve low energy along with small area overhead and high operating frequency. The minimum supply voltage V<sub>DD</sub> is reduced here by exploiting the power gating scheme and 7T bitcell topology. It introduced a small


Figure 5.19: SRAM proposed by [106]

area overhead. The address decoder is built exploiting the multi-output dynamic circuits to improve the frequency of operation. The chip is realized on silicon using 90 nm CMOS technology. To achieve an energy efficient scalable video decoding, the memory consumes 42.8 pJ/cycle for QCIF, 78 pJ/cycle, and 235 pJ/cycle for HD720 while operating on 300, 400 and 700 mV input supply, respectively.

Another topology targeting highly correlated data handling applications such as video and image, was presented in [106][Figure 5.19]. The BL switching activity is reduced by bit-wise prediction. No half-selected cells are utilized here with each row presenting one word. The column multiplexing ratio is maintained at one as well, with each column being assigned to a sense amplifier. If a correct prediction is performed during a read operation, no voltage difference is introduced across the read buffer connected to the BL. As a consequence of this, with correct prediction, none of the BLs are discharged along with preventing the switching activity on the BLs. A statistically gated sense amplifier approach is developed to improvise it further, taking advantage of the biased transition probabilities on the bitlines. The energy consumption per access is reduced up to 1.9 times because of these techniques, as compared with the traditional 8T SRAM.

# 5.3 Operating Principle



Figure 5.20: Figure showing a standard 6T SRAM schematic

The conventional 6T SRAM is shown in Figure 5.20. It consists of two back-toback inverters storing 1 bit on the complementary retention nodes (Q and  $\overline{Q}$ ). To perform the write operation, the WL is raised, while forcing complementary values on the bitlines, BL and  $\overline{BL}$ . To have a successful write operation, principally the access transistors M5 and M6 need to be stronger than M3 and M4.

Both the BL and BL are raised to the supply voltage followed by raising WL during the read operation. The current through M5 and M6 is combined on the bitline capacitance. As a result, a smaller differential voltage is produced which is then amplified by the sense amplifier producing a latch. As the bitline capacitance is primarily large, the access transistors M5 and M6 remains able for some time to pull the drain current of M1 and M2. Therefore, the retention transistors M1 and M2 must be made stronger than M5 and M6 so that no cell will be overwritten during the read process. Factually, weak M5 and M6 secure read stability while, strong M5 and M6 ensure write stability. It becomes increasingly difficult to retain reliable access to the cell when the cell are or the input supply is reduced.

The static noise margin of an SRAM cell is calculated as shown in the Figure 5.21. The DC transfer function for the two back-to-back inverters are plotted here against each other. The diagonal of the largest square that can fit within the two curves on both sides of the tripping point, denotes the SNM value. If such a square cannot be formed on either side, the cell is considered unstable and will fail to retain the data.

Seevinck et. al [107] proposed an efficient method to simulate SNM, which can



Figure 5.21: The butterfly graph representing the SNM of a SRAM circuit

be applied to calculate SNM for both the read and write operation. When both wordlines and BLs are forced to the supply voltage,  $SNM_{read}$  is formed. Whereas,  $SNM_{write}$  is formed when WLs are at the supply voltage and BLs are forced to complementary logic values causing the cell to be overwritten. It is not possible to fit a square between the two DC curves in canes of  $SNM_{write}$ , thereby often representing it as a negative value.

# 5.4 SRAM Array and associated circuits

### 5.4.1 Address and Data Buffers

The address and input data must remain stable during the read and write operations. It is essential for a non-erroneous memory operation. To achieve this, the address and the data signals are latched. The latches are disconnected from any outside changes. The purpose is served by a D-latch for each signal as shown by Figure 5.22. Any change on the input propagates to the output when the control signal (CTL) is high. While CTL being deactivated, the input is disconnected by the pass-gate (PG1) from the rest of the circuit. Therefore, the data remains stored in the loop formed by INV1, INV2, and PG2. The output data buffer is connected



Figure 5.22: D-latch design



Figure 5.23: Tristate designs used

to a tristate buffer which avoids two outputs being connected to the bus at the same time.

The tristate buffer and tristate inverter implementation is shown in the Figure 5.23. As observed in Figure 5.23a, the output may enter the high-impedance mode depending on the state of the  $\overline{OE}$ . When  $\overline{OE}$  is low, the output attains the high-impedance mode. With  $\overline{OE}$  raising to V<sub>DD</sub>, the DATA signal propagates to output. Another tristate buffer is shown in Figure 5.23b. There are only two states. The output gets the inverted input when the CTL goes high. Else, the output remains in a high-impedance mode.

#### 5.4.2 Row Decoder Design

The function of the row decoder is to activate one out of N rows in the memory array. Its design consists of two stages; pre-decoder and post-decoder. The outputs of the pre-decoder are combined to create the outputs of the post-decoder. Six principal parameters characterize the longest path, speed, and power consumption [108] in the decoder design.

Choice of logic gates: For the implementation purpose, the logic gates varied from dynamic logic to static logic to pulsed and self-resetting logic are used. Clocked decoding is also considered as an alternative to CMOS gates. The most common CMOS based implementations use NAND gate followed by an inverter.

Logic depth: The logic depth is dependent on the number of WLs which need be to decoded. Along with that, the average fan-in of the logic (NAND, INV) gates along the decode path is also a significant contributor to the logic depth.

Fan-in: The decoder delay is minimized with a fan-in of two [109]. The fan-out of the internal nodes increases with the increase of the fan-in of each NAND gate. The gates which are connected to higher fan-outs, must be proportionally sized up. However, it produces an area overhead. Additionally, the gate delay increases when the fan-in increases.

Fan-out and wire length: The fan-out of each decoder stage and the maximum wire-lengths driven by each stage are dependent on the architecture of the decoder.

Device sizes within pull-up and pull-down networks: The total delay along the decode path must be optimized using different sizing techniques, such as logical effort [110]. Optimal device widths depend on the logic, fan-in, and fan-out of the gate used and the parasitic wiring being driven by each gate.

A 7-to-128 row decoder can be seen in Figure 5.24. Before the control signal (CLK-EN) is set, all the outputs of the decoder have to be deactivated. The enable signals (En1 and En2) are activated by the CLK-EN signal, allowing one of the outputs associated with the input address of the decoder to be activated. The control circuitry sets the timing of the CLK-EN signal.

### 5.4.3 Read/Write Column Decoder and Write Driver

A  $2^{K}$  input multiplexer is used in a read column decoder. The inputs here are the BLs with the output being fed to sense amplifier as input. The read column decoder connects several columns to a single sense amplifier reclining the area constraints of the sense amplifier circuit. Figure 5.25 shows a read column decoder. It can be seen that the sense amplifier is assigned to two columns. The R0 and R1 signals chose between the two columns, with the sense amplifier getting inputs from the corresponding BLs. The SAE0 and SAE1 signals select the particular sense amplifier to activate, with its output being fed to the output bus. As a result, one out of four columns are read during a read operation.



Figure 5.24: 7-128 row decoder



Figure 5.25: Read and write decoder with write driver

The BLs are precharged to  $V_{DD}$  during the precharge phase. Then the read operation starts. With the activation of WL, the BLs begin to produce the differential voltage. With the activation of either R0 or R1, the differential voltage is transferred to the corresponding sense amplifier inputs. When SA0 or SA1 is triggered, one sense amplifier is switched on indicating the end of read operation.

During the write operation, the input data and its complement are connected to the BLs of one of the four columns by the W0, W1, WriteEnable0, and WriteEnable1. When the WL is switched on, the data on the BLs flip the data on the bitcells. This ends the write operation. The write driver contains two NAND gates. The dimensions of these NAND gates are selected in a way so that they will be strong enough to discharge the BL capacitance to 0.

#### 5.4.4 Sense Amplifier

The sense amplifier circuit in the SRAMs primarily amplifies a small analog differential voltage to a full-swing digital output signal. This evades a full-swing discharge on the high capacitive BLs, thereby saving a considerable amount of power consumption.

The area of the sense amplifier in SRAM circuits is of high concern. The archi-



Figure 5.26: Schematic showing the sense amplifier circuit

tectures where there is no column multiplexing, are needed the sense amplifier to fit within a column pitch. However, this restraint is diminished when column multiplexing is used by assigning each sense amplifier to multiple columns. Common-mode noise may appear to both the sense amplifier inputs due to high sensitivity to process variations in the subthreshold domain. If a sense amplifier circuit is designed which is capable of operating in subthreshold domain, differential sensing reduces the impact of the common-mode nose which may exist on both the BLs. The schematic of a common sense amplifier is shown in Figure 5.26. The sensing operation starts when the operation point of the sense amplifier is set by precharging and equalization of both inputs of the sense amplifier to the identical precharge voltage level (V<sub>DD</sub>). This is followed by triggering the decoded WL of a read-accessed cell, which starts to build up the differential voltage on the BL and  $\overline{BL}$ . After a sufficient differential voltage is developed on the inputs, the sense amplifier enable(SAE) is issued. Subsequently, the small signal is amplified into full swing output, resulting the output data being available on the data bus.

### 5.4.5 Control Circuits



Figure 5.27: Control circuitry

The timing control circuitry contributes to the timing of the precharge, row-decoder enable, SAE, and write-enable signals along with ensuring the read and write operation. The control circuitry is implemented mainly based on delay-line timing control [111] and asynchronous replica timing techniques [109]. Figure 5.27a shows the schematic of the delay-line timing loop. The FSM is set by a control signal,

which is usually the main clock. The total delay elements  $(T_{delay1} - T_{delayN})$  in the FSM reset path defines the total timing. The delay elements are mostly constructed by a chain of logic circuits (INV, NAND, NOR). Using non-minimal length devices in the delay chain, the delay time can be extended. The control signals for the read/write control is generated by the timing intervals constructed by the delay elements. However, the delay variations of the SRAM bitlines which are induced by the process variations in modern nanotechnologies, may not be tracked by the delay of the delay loop.

A tighter tracking of the bitline discharge delay is provided by the asynchronous replica timing circuit. It also relieves the effect of process variations. The schematic of this timing method is shown in Figure 5.27. There is a dummy column, which is used to track the same number of SRAM cells in each column as the reference delay element. The replica signal path copies the capacitive loads on the BLs and the associated delays of the real signal path. As a result, it can cater more precise timing signals. As similar to the delay-line method, the FSM here is set by the control signal (Ctl-in). The word lines are initiated by the output both in the row decoder and the dummy row. The dummy column resets the FSM after its BL is discharged. The SRAM begins the precharge phase once the FSM is reset. Subsequently, the sense amplifier completes its operation by driving the data on the data bus.

Figure 5.28 describes a typical organization of an SRAM module. A twodimensional array with rows and columns is formed with SRAM unit cells. These unit cells have a capacity of 1-bit and they are called bit cells. A chip select (CS), write enable (WE), a clock (CLK), and an address (ADDR) signals are typically used for controlling purpose. During the read or write operation, the address is divided into a row address and a write address. The row address is decoded by the row decoder and a signal, known as the wordline (WL) is enabled along the appropriate row. For read access, each cell of the active row yields data using the bitline (BL) signals, aligned along the columns. The sense amplifier normally amplifies the value from each cell. The word address is used by a multiplexer before or after the sense amplifiers, to select a subset of the columns to produce the output as a data word. During the write operation, the bitlines are driven actively so as to overpower the cell and write a new value, logic '1' or '0'. Essentially the number of rows and columns in the memory array plays an important has a direct effect on the access energy in an SRAM module. During a read access when the entire row is accessed, the switching energy per access in a cycle is given by:

$$E_{rd} \approx E_{ctl\&dec,r} + N_C C_{WL_{bit}} V_{DD}^2 + N_C [N_R C_{BL_{bit}} V_{DD} \Delta V_{BL} + E_{sense\&output}]$$
(5.1)

 $N_R$  and  $N_C$  respectively denote the numbers of rows and columns here. The switching energy from the decoder is addressed by  $E_{ctl\&dec,r}$ .  $C_{WL_{bit}}$  connotes



Figure 5.28: Diagram of a SRAM architecture



Figure 5.29: Figure showing 9T SRAM circuit proposed in [94]

the wordline capacitance per bit,  $\Delta V_{BL}$  being the bitline swing and  $E_{sense\&output}$  signifying the energy from the sense amplifier and other output stages. From Equation 5.1, it can be observed that the bitline capacitance of the bitcell will be the dominant factor when both  $N_R$  and  $N_C$  will be increased to a large number. A thin cell layout is commonly used to reduce the impact of the bitcell BL capacitance. To reduce the bitline swing, it is essential to use amplifiers capable of amplifying smaller differential voltage.

# 5.5 Proposed Design

We proposed a 9T SRAM cell in our work whose topology is inspired by [112]. The topology was implemented in 90 nm CMOS technology by Luetkemeier et al. [94] in 2012. As explained in the previous chapters, the linear scaling was not possible with this circuit. Therefore, the cell geometries need to be optimized for the

implementation in 28 nm FDSOI technology. The optimization method explained in section is used here to find the optimized dimension which ensure a balanced and efficient trade-off among speed, leakage and reliability while operating at an input of 250 mV.

The behaviour of the transistors M1-M6 are as same as in a conventional 6T SRAM cell. The differential read signal is produced by the transistors M7-M9. We have used transistors with standard threshold voltage to combat the leakage. The transistor dimensions are shown in Table 5.2. The simulation results of the SRAM cell and the memory array will be explained in section 5.6.

| Transistor Type  | Transistor Name | Width(nm) | Length(nm) |
|------------------|-----------------|-----------|------------|
| Pull-up pMOS     | M3 and M4       | 200       | 48         |
| Pull-down nMOS   | M1 and M2       | 80        | 48         |
| Access nMOS      | M5 and M6       | 200       | 48         |
| Read-assist pMOS | M7 and M8       | 400       | 48         |
| Read-assist nMOS | M9              | 100       | 48         |

Table 5.2: The SRAM cell transistor dimensions

The maximum amount of noise voltage or the Static Noise Margin (SNM), can be introduced at the internal nodes of the SRAM inverters so that it can still retain its data. It is important to measure the stability of the cell. Here BL and  $\overline{BL}$ , both are connected to  $V_{DD}$  and both the access transistors are kept active. As mentioned in [107], the Voltage Transfer Characteristics (VTC) and its inverse are plotted. VTC is plotted by sweeping  $V_Q$  and plotting  $V_Q$  vs  $V_{QB}$ . Inverse of the VTC is plotted as  $V_{QB}$  vs  $V_Q$  while sweeping  $V_{QB}$ . Figure 5.30 shows the final plot. The SNM is the length of the side of largest square that can embedded inside the folds of the butterfly curve.

The sense amplifier circuit is shown in Figure 5.31. It is capable of operating between 200 mV and 1.2 V for temperatures ranging from -20°Cto 85°C.



Figure 5.30: Butterfly plot of the proposed memory cell



Figure 5.31: The schematic of the sense amplifier used in our work



# 5.6 Simulation Results

Figure 5.32: The 8x4 memory array using the 9T SRAM design

The SRAM bitcell is simulated with supply voltage ranging from 250 mV to 600 mV to observe the stability. The circuit is able to perform with a maximum frequency of 3.33 MHz. The simulation results can be seen in the Figure 5.33 where an input supply of 300 mV is applied with an operating frequency of 1 MHz.

Figure 5.32 shows the entire schematic of the memory array what we developed. We developed a 8x4 memory array structure for test purpose. Although, the



Figure 5.33: Transient simulation result of the 9T bitcell



Figure 5.34: Transient simulation result of the memory array

memory is highly scalable. The memory circuit is simulated with frequencies from 1 MHz to 3.33 MHz. Figure 5.34 shows the transient response at the room temperature.

During the read operation, energy consumed at 3.33 MHz is 0.107 fJ while at 1 MHz it is 0.168 fJ. The energy consumption during write operation varies between 0.125 fJ and 0.129 fJ when the operating frequency is varied from 1 MHz to 3.33 MHz. The circuit behaviour worsens when the supply is increased. With our primary concern being low voltage operation, this can be ignored. However, at higher supply voltages the circuit is capable of running at higher frequencies. Table 5.3 shows a comparison of the different low power SRAM implementations. Our main focus was to design a low power SRAM cell exploiting the FDSOI technology. The Emin value shows the significant reduction of energy consumption in our design. At 325 mV and 133 kHz, the energy consumption per cycle is at its lowest value, which is 0.94 pJ.

| Design    | Technology<br>(nm) | Transistor<br>Count | Size   | E <sub>min</sub><br>(pJ) | Frequency           | V <sub>min</sub><br>(mV) | Bitcell<br>per BL |
|-----------|--------------------|---------------------|--------|--------------------------|---------------------|--------------------------|-------------------|
| [89]      | 130                | 6                   | 2 kb   | 0.78                     | 21.5 kHz            | 210                      | 16                |
| [102]     | 65                 | 8                   | 32 kb  | 1                        | 400 kHz             | 200                      | 256               |
| [101]     | 65                 | 8                   | 256 kb | 136                      | 25 kHz              | 350                      | 256               |
| [94]      | 65                 | 9                   | 2 kb   | 0.57                     | $220\mathrm{kHz^1}$ | 220                      | 64                |
| [106]     | 65                 | 8                   | 128 kb | 17.6                     | N.A. <sup>2</sup>   | 370                      | 256               |
| [98]      | 65                 | 7                   | 32 kb  | 5.6                      | 1.8 MHz             | 260                      | 256               |
| [91]      | 90                 | 7                   | 64 kb  | N.A. <sup>2</sup>        | 50 MHz              | 440                      | 8                 |
| [99]      | 40                 | 12                  | 4 kb   | 1.91                     | 11.5 MHz[3]         | 350                      | 16                |
| [100]     | 65                 | 9                   | 4 kb   | N.A. <sup>2</sup>        | 2 MHz               | 250                      | 256               |
| [103]     | 65                 | 9                   | 16 kb  | 2.07                     | 1.17 MHz            | 260                      | 256               |
| [104]     | 40                 | 9                   | 72 kb  | 0.267                    | 600 kHz             | 325                      | 32                |
| [88]      | 180                | 10                  | 16 kb  | N.A. <sup>2</sup>        | 164 Hz              | 180                      |                   |
| [90]      | 65                 | 10                  | 256 kb | 1.75                     | 400 kHz             | 380                      | 256               |
| [67]      | 130                | 10                  | 480 kb | N.A. <sup>2</sup>        | 120 kHz             | 200                      | 1024              |
| [93]      | 130                | 10                  | 480 kb | 0.235                    | $600\mathrm{kHz^4}$ | 160                      | 256               |
| This work | 28                 | 9                   | 32 b   | 1.07e-4                  | 3.3 MHz             | 250                      | 4                 |

Table 5.3: Comparison of SRAM designs

1 @200 mV input

<sup>2</sup> Data not available <sup>3</sup> Write frequency = 3 MHz

4 @400 mV input

# 6 Subthreshold Library

The standard cell is one of the primitive representative of an intellectual property (IP). A standard cell library is a collection of cells. They are supposed to work together in a standard cell layout [113]. The principal components of a standard cell library are the basic logic gates. However, different functional blocks are sometimes included in the cell libraries. Even if, the logic function of a standard cell is simple, the layout can be optimized carefully while designing. The motive behind the layout optimization can be for example area reduction, power consumption to name a few.

# 6.1 Standard Cell Organization

It is important to standardize the standard cell layout, which reduces some degrees of freedom. A row in a standard cell layout contains several cells. They are put together with some connections such as  $V_{DD}$  and  $V_{SS}$ . In addition to that, the cells must be designed to be electrically compatible. The compatibility of the cells in a standard cell library is maintained at several levels of abstraction, such as cell area, pin placement, delay related to a specific load, circuit topology and power consumption. The basic library parameters are gone through by the standard cell layout systems from a database, allowing different libraries to be designed to different specifications, such as cell heights. Although, the cells in each library must comply with each other.

# 6.1.1 Physical Design

The physical design of a standard cell is governed by the placement and routing algorithm. Cells with the same height are adjoined horizontally connecting the  $V_{DD}$  and  $V_{SS}$  wires of one cell with the adjacent one. The input and output signals remain on the top and bottom the cell respectively. These input and output pins must be placed on one of the layers which the place-and-route system uses for the cell connections. The width of the cell may vary. However, the width of the cell has to be maintained in a way so that the pins must lie on the grid. In case of over-the-cell routing, a certain part of the cell is kept free of wires on the layer for the over-the-cell wires.

### 6.1.2 Logical Design

The logic functions of the cell library is emphasized while designing. It is chosen in a way such that it can perform sufficient range of functionalities. Usually enough gates are designed in a library so that functions can be implemented in more than one way.

# 6.1.3 Power Options

A logic function can be implemented in different ways depending on the priorities for power or delay. These different designs should be available in a standard library. By changing the transistor sizes of a gate, it is possible to provide lowpower and high-speed cell. It is also possible to provide more sophisticated version of cells such as sleep transistors in a library. In such scenario, the compatibility of the gate circuits along a path is ensure by the tool, which generates the logic given to the standard cell.

### 6.1.4 Dimensions

Standard cells must be designed with a common cell height and a common power bus width. There are some libraries which work with single- and double-height cells. For standard cell libraries, it is usually preferred to set a fixed cell height. Cells which struggle to fit that height, are optimized.

# 6.2 Design Flow

The Figure 6.1 shows the flow diagram of standard cell library based circuit designs, which begins with formal description and finishes with the completion of physical layout (prior to fabrication). Here few observations should be made. Firstly, the library must contain descriptions of the cells which are required for the synthesis; especially translation into a netlist of logic primitives.

Next, considering there will be repeated simulation of the synthesized circuit, the models in logic primitive must be simple enough to reduce simulation time. Essentially, the information will contain timing and power dissipation parameters of the cells. In our setup, as we are using Cadence tools, it requires the library file in Library Exchange Formats (LEF). This file is compiled into both synthesis and simulation library for synthesis and simulation of synthesized circuit respectively.

The LEF file also contains design rules pertinent to placement and route process (such as metal and via spacing). Additional routing rules can be included in the LEF file if required. Effectively, it fastened the Place and Route (PNR) process as well.



Figure 6.1: Standard Cell Based Design Flow

# 6.3 Standard-Cell-Based Development Process



Figure 6.2: Standard Cell Library Development Process Flow

A standard cell library development process involves a lot of steps starting from layout to porting to simulation, synthesis, and PNR libraries. The steps are presented in the Figure 6.2 as a flowchart.

The whole process as depicted in the Figure 6.2, can be subdivided into three subprocesses. They are schematic design, layout design and porting to different libraries such as synthesis, simulation or PNR. As shown in Figure 6.1, the synthesis

takes place during the logic synthesis phase. The simulation library is required both in design and synthesis phases. And, the PNR library is used during PNR phase.

# 6.4 Standard cell library designs

Digital designs are synthesized using the standard cell library. Therefore, it is evident that the quality of the designs depend highly on the standard cell library being used[114]. The first and foremost criteria to design a good library is the selection of basic functions. The basic function can extend from the primitive gates to small IP modules frequently used for complex designs.

Next comes the geometry of the cells, which also includes the dimensions as well as the topology. The relative sizes of the individual transistors of the gate are determined first as it is involved to find a suitable topology for the cells.

To determine the dimensions of the gates, the primary requirement is to find the optimal pMOS to nMOS width ratio (P/N). Simple heuristics [115], [116] have been used by library designs in the past to find P/N ratio for each logic gate by simulating a chain of identical gates. The result that yields the best average delay, is selected as the final dimension. A theoretical framework is provided by [117] which selects the optimal pMOS to nMOS width ratio for minimum delay in a general logic network. Here the efforts have been made to develop a methodology for size selection with a fixed topology. The researchers in [118] focuses primarily on area optimization. The [119] proposes a model which computes the timing optimal P/N ratio. Their model explicitly utilizes the gate delay models representing the dependence of delay on the pMOS to nMOS ratio and the load.

# 6.5 Low Power Libraries

Design of custom libraries is trivial for microelectronics designs. However, designing a library dedicated for low power operation has been tried for last couple of decades. Some basic aspects such as low activity of the internal nodes, reduced parasitic capacitances and operational capability with very low VDD, must be satisfied while designing a low power standard cell library [120]. Piguet et al proposed another library with memory cells in 2001 [121].

An automated methodology was proposed by Abouzeid et. al [122], enabling the design of ultra-low voltage digital circuits exclusively using standard EDA tools. The library developed here was optimized in terms of energy and delay at 350 mV. A BCH decoder circuit was designed and synthesized with this library, which performs at 300 mV input and 600 kHz frequency with a dynamic energy consumption 14 times reduced from 1.1 V. A Schmitt-Trigger based standard cell library is proposed by Lotze et al [123]. They claimed that the effective on-to-off ratio thus can be considerably improved. A  $8 \times 8$  bit multiplier chip is implemented using this library which can function at a minimum possible supply voltage of 62 mV with a power consumption of 17.9 nW at an operating frequency of 5.2 kHz.

[124] used the efficiency of standard cell design with the ultra-low power, high speed performance and variation resilience of full custom work and combine it in a generic design flow, suitable for commercially available tools. Differential transmission gates have been used here in an extended standard cell flow, taking into account variability, speed, energy, and scalability. An ARM Cortex M0 core is realized in 40 nm CMOS process with the proposed library, which is capable of operating in 330-500 mV 10-48 MHz range. The energy consumption of the core is sub-20 pJ/cycle. There are 241 cells in this custom library. However, the minimum energy operation does not occur at the lowest supply voltage. The noise margin was tried to match with increasing the device sizes at low voltages. Also, at low voltages the device sizes were increased to match the noise margin.

A digital cell library is presented in [125] to obtain both high energy efficiency and optimized performance. The proposed library has 59 cells and operate in the near-threshold voltage region. The performance is increased by applying an asymmetric gate length scheme to multi-fan-in logic gates. 1 poly 7 metal 65 nm technology is used for the development of this library. The testchip developed using this library, can operate at 500 mV and 20 MHz.

There are few groups who started to explore the design of standard cell library, dedicated to subthreshold operations. Pons et al.[126] has proposed a library for ultra low power operations which was evaluated in 180 nm planar bulk CMOS technology A 32 bit processor was developed using the proposed library which can operate at 400 mV and at 1.0 V of input as well.

# 6.6 Library Components

### 6.6.1 Combinational Logic

The entire library is designed with a restriction on gates with a maximum degree of two. A multi-objective optimization approach has been performed for each cell for the minimum drive strength. To enhance the optimization of the synthesis tools, each logic function is implemented with different drive strengths. Gates with larger drive strengths are implemented by connecting the corresponding gates with the smallest drive strength in parallel, while they are fed with the same input. Standard CMOS design rules are used to realize the cells for AOI22, INV, NAND2 NOR2 and OAI22. The buffer (BUF) is implemented using a two stage implementation with each stage consisting of parallel connected inverters. To ease the layout design of the minority-3 gate (MIN3) and the (inverted) 2-to-1 multiplexer (MUXI2), schematic variants consisting of symmetric n-channel Metal Oxide Semiconductor (nMOS)- and pMOS-parts are chosen. In case of the MIN3 gate, this variant is also known as mirrored gate implementation.



Figure 6.3: The schematic and the layout of the NAND gate



Figure 6.4: The schematic and the layout of the NAND gate with 2x strength

As explained earlier, the implementations of gates of different strengths are described in the Figures 6.3 and 6.4.

### 6.6.2 Sequential Logic



Figure 6.5: Schematic and layout of the C2MOS flip-flop

Sequential circuit elements, i.e., flip flops or memory cells, can be subdivided into static and dynamic elements. Dynamic flip-flops have low power loss over their static counterpart due to the lack of feedback elements and smaller delay times (clock to output delay). However, in subthreshold region because of low supply voltage, the stored dynamic charge is very low, resulting in corruption or loss due to noise effects, coupling with other systems, leakage currents, or radioactive radiation. In the proposed subthreshold library, C2MOS flip-flops are considered because of a very good combination of delay times, energy consumption, lower operating voltage limit and critical load. Again, due to benefits for subthreshold operation, static CMOS technology is used for the flip-flop realization (see



Figure 6.6: Schematic and layout of the clock-gating cell

Figure 6.5). Each of the two-part C2MOS gates can be regarded as an inverted 2-to-1 multiplexer. By means of two inverters, inverted and non-inverted clock signals (CLKI, CLKN) are generated locally from the input clock signal (CLK) to minimize the load for the clock tree. Additionally, optimization of the clock delay inside the flip-flop can be achieved easily by this approach. The output (Q) is buffered by a separate inverter. Therefore, the set-up and hold times of the flip-flops are maintained independent of the output load and higher drive strengths can be realized by increasing the inverters only. Overall, the flip-flop requires 26 transistors in this proposed C2MOS implementation.

### 6.6.3 Clock Tree Elements

Distribution of global clock signals to the sequential elements is necessary in synchronous digital circuits to ensure uniform time intervals. Multi-stage clock tree drivers are necessary as the ability of a single clock input is limited to drive a large number of flip-flops present in the circuit. The duty cycle of the clock is maintained at 50%. The clock buffer of this subthreshold library is implemented with a two-stage structure. The output stage of a clock buffer is composed of parallel connected inverters to implement different drive strengths. Likewise, the input stage is composed of parallel connected inverters. The number of inverters is chosen such that the ratio of total transistor width of output to input stage ranges between 2 and 3. The width of the pMOS transistors is optimized to match the delay times for rising and falling edges at a supply voltage of 300 mV. To aid synthesis tools during logic and clock tree optimization, 12 clock driver cells with different drive strengths are implemented. In addition, a special clock gating cell is implemented as depicted in Figure 6.6. To suppress glitches at the output of the clock gate, latch-based clock gate circuits are implemented. This ensures that the output signal can only be activated during the low phase of the clock, so that no faulty clock edges can occur. The dimensioning of the latch is identical to the latch cells of the logic library. However, the pMOS transistor widths of the output side NAND gates and inverters are optimized so that a symmetrical rising and falling behaviour can be achieved with the clock drivers.

### 6.6.4 Level Shifter

Level shifter cells are needed for the logic level translation of signals between subthreshold and above threshold domains. For the 65 nm subthreshold library, the supply voltage is used as low as 250 mV and as high as 1.2 V with the temperature maintained at 25 °C. For this purpose, up- and down-level shifter are needed for up- and down-scaling of the logic level. The up-level shifter circuit and its function has already been discussed in [**bib:Lutkemeier2010**] where static current flow is restricted by using a Wilson current mirror. That circuit shows good scaling behaviour of the switching time with the increase of supply voltage. The circuit diagram of the down-level shifter, which is optimized for 65 nm bulk CMOS technology as well, is shown in Figure 6.7. The additional transistor M6 is added to avoid the direct connection between the supply voltage and the gates connected to M3.



Figure 6.7: Schematic and layout of the down level shifter

### 6.6.5 Place and Route Cells

In addition to functional elements the subthreshold library contains cells that are needed by the place-and-route tools. These are so called decap cells, filler cells and tie-hi, tie-lo cells.

Decap cells provide large decoupling capacitances which are used for stabilization of the supply voltage. Practically, the gate capacitances of large-scale transistors are used as the capacity here. In contrast, filler cells do not contain any transistors. These cells are used to provide a complete structure of the n-well, doping regions, substrate and well contacts and the supply lines (power rail).

Tie-hi and tie-lo cells are used to apply constant logic levels to signal nets without a direct connection to the power and ground nets. Consequently, the use of tie cells increases the robustness of a digital circuit towards voltage spikes (e.g. caused by electrostatic discharge (ESD)) on the power and ground nets. The requirements for dimensioning tie cells are as follows. First, the output signal should quickly reach the stable final value after the startup of the circuit. Secondly, the output impedance should be low even after reaching the final voltage level. Also, the voltage level should be robust against external interference, e.g., due to capacitive coupling of neighboured nets. To meet those requirements maximum transistor widths are chosen for the pMOS and nMOS of the tie cells ( $W_{p,max} = 2.3 \,\mu m$ ,  $W_{n,max} = 0.65 \,\mu m$ ).

### 6.7 Subthreshold Design Methodology

Designing standard cells for subthreshold operation involves quite a few challenges. Because, the circuits become more sensitive towards process variations, temperature, and changes in supply voltage. To find the optimized cell dimensions for subthreshold operation, one of the most common approaches have been to focus on the DC transfer characteristics. Since, it determines the noise margin of the cell. However, when the robustness of the circuit is chosen to optimize the cell dimensions, the results would be too dismal. At the same time, effect of timing and power consumption would hardly have any impact on such optimization. This was taken into account where along with noise margin (NM), energy consumption ( $E_{gate}$ ) and propagation delay ( $t_{pd}$ ) were additionally considered during optimization. The multiobjective approach has been explained in details in Section 2.1.

The design parameters of pMOS and nMOS transistors are varied during the optimization procedure. So that an optimal trade-off point among these objectives can be found. As suggested by [127], a fixed gate length of 90 nm for both of the transistors is beneficial for robust cell behaviour Therefore, the widths of nMOS ( $W_n$ ) and pMOS ( $W_p$ ) remain available for the optimization. To account for the logic level degradation, the transistor stack is set to a maximum of two which results in the lowest possible supply voltage limit [128].

Since standard cell design for subthreshold operation typically results in transistors with a much bigger gate length and width, an area optimized standard cell frame is used and implemented as parametric cell (pcell). This frame allows for larger pMOS ( $W_{p,max} = 2.3 \,\mu$ m,  $W_{n,max} = 0.65 \,\mu$ m) designs without the need of transistor splitting. Using this standard cell frame, substrate- and well-tap cells can be separately connected ( $V_{dd,s}, V_{ss,s}$ ) to exploit the benefits of backgate-biasing techniques.

# 6.8 Developed Standard Cell Libraries

### 6.8.1 65 nm CMOS Technology

This work has been published in [129]. The proposed library here, has been designed using a commercial 65 nm low-power technology, which offers six different transistor types for pMOS and nMOS devices. The gate-oxide thickness can be chosen for either low-power (LP) or general purpose applications (GP). Besides two different gate-oxide options, the technology offers three transistor types with distinct threshold voltages (Low V<sub>t</sub> (LVT), Standard V<sub>t</sub> (SVT), High V<sub>t</sub> (HVT)). To find a trade-off between low-power properties and achievable clock frequency, LP transistors with standard threshold voltage (SVT, V<sub>t</sub> = 450 mV) are chosen for the standard cell designs.

Since standard cell design for subthreshold operation typically results in transistors with a much bigger gate length and width, an area optimized standard cell frame is used and implemented as parametric cell (pcell). This frame allows for larger pMOS ( $W_{p,max} = 2.3 \,\mu$ m,  $W_{n,max} = 0.65 \,\mu$ m) designs without the need of transistor splitting. Using this standard cell frame, substrate- and well-tap cells can be separately connected ( $V_{dd,s}, V_{ss,s}$ ) to exploit the benefits of backgate-biasing techniques. The dimensions of each cells along with their Boolean functions are presented in Table 6.1.

| Gate  | Boolean                                                       | $\mathbf{W}_{\mathrm{n}}$ | $\mathbf{W}_{\mathrm{p}}$ |
|-------|---------------------------------------------------------------|---------------------------|---------------------------|
|       | Function                                                      | [µm]                      | [µm]                      |
| AOI22 | $\overline{(A \wedge B) \vee (C \wedge D)}$                   | 0.250                     | 2.050                     |
| BUF   | А                                                             | 0.265                     | 2.050                     |
| INV   | $\overline{\mathbf{A}}$                                       | 0.265                     | 2.050                     |
| MIN3  | $\overline{(A \wedge B) \vee (B \wedge C) \vee (C \wedge A)}$ | 0.250                     | 2.050                     |
| MUXI2 | $\overline{(A \wedge \overline{S}) \vee (B \wedge S)}$        | 0.250                     | 2.150                     |
| NAND2 | $\overline{\mathrm{A}\wedge\mathrm{B}}$                       | 0.230                     | 1.710                     |
| NOR2  | $\overline{\mathbf{A} \vee \mathbf{B}}$                       | 0.240                     | 2.150                     |
| OAI22 | $\overline{(A \vee B) \wedge (C \vee D)}$                     | 0.245                     | 2.200                     |

Table 6.1: Dimensions of combinatorial gates in 65 nm CMOS technology [129]

#### 6.8.2 28 nm FDSOI Technology

The standard cell library development work was further extended using 28 nm FDSOI technology from ST Microelectronics. This technology offers two transistor variants; Low  $V_t$  (LVT) and Regular  $V_t$  (RVT). RVT transistors have a threshold voltage around 480 mV. The advantage with this type of transistor is that they have very low leakage current. Though the switching time is slow. On the other hand, the LVT transistors have high leakage and faster switching time. The threshold voltage of LVT cells is at around 400 mV.

For the optimization at 25 °C, a supply voltage of 300 mV has been chosen[130]. Here as well, the P-cell framework was followed, which allows pMOS designs as big as  $W_{p,max} = 1.88 \,\mu\text{m}$  and,  $W_{n,max} = 0.136 \,\mu\text{m}$ .

| Gate  | Boolean<br>Function                                           | <b>W</b> <sub>n</sub><br>[μm] | W <sub>p</sub><br>[µm] |
|-------|---------------------------------------------------------------|-------------------------------|------------------------|
| AOI22 | $\overline{(A \wedge B) \vee (C \wedge D)}$                   | 0.080                         | 0.620                  |
| BUF   | А                                                             | 0.080                         | 0.741                  |
| INV   | $\overline{\mathrm{A}}$                                       | 0.080                         | 0.741                  |
| MIN3  | $\overline{(A \wedge B) \vee (B \wedge C) \vee (C \wedge A)}$ | 0.080                         | 0.717                  |
| MUXI2 | $\overline{(A \wedge \overline{S}) \vee (B \wedge S)}$        | 0.080                         | 0.630                  |
| NAND2 | $\overline{\mathrm{A}\wedge\mathrm{B}}$                       | 0.081                         | 0.878                  |
| NOR2  | $\overline{\mathbf{A} \vee \mathbf{B}}$                       | 0.080                         | 0.860                  |
| OAI22 | $\overline{(A \vee B) \land (C \vee D)}$                      | 0.080                         | 1.041                  |

Table 6.2: Dimensions of combinatorial gates in 28 nm FDSOI technology

Table6.2 shows the dimensions used for the combinatorial logic cells. Barring AOI22, MIN3, MUXI2 and, OAI22, all the other cells exist in different drive strengths. There are 8 variants of inverters, 10 variants of buffers and, 2 variants of each NAND and NOR gates. Combinatorial cells implemented in RVT and LVT variants, have same dimensions.

For the sequential logic elements, similar approach is followed as mentioned in Subsection 6.6.2. There are four flipflops in the library. They are D-latch, D-flipflop, D-flipflop with set and D-flipflop with reset. A pair of inverters are used here to generate inverted clock signals for the flop operations. For the area constraint, the length of the transistors are always maintained at 48 nm.

Figure 6.8 shows the schematic of the D-latch. The dimensions of the transistors are shown in Table 6.3.



Figure 6.8: D latch

| Transistor | PMOS<br>Width<br>[µm] | NMOS<br>Width<br>[µm] |
|------------|-----------------------|-----------------------|
| CLK1       | 0.902                 | 0.080                 |
| CLK2       | 1.008                 | 0.080                 |
| А          | 0.630                 | 0.080                 |
| В          | 0.750                 | 0.080                 |
| С          | 0.741                 | 0.080                 |
| D          | 0.630                 | 0.080                 |
|            |                       |                       |

Table 6.4 contains the dimensions of the transistor of the D-flipflop circuit as shown in Figure 6.9.



Figure 6.9: D Flipflop

| Transistor | PMOS<br>Width<br>[µm] | NMOS<br>Width<br>[µm] |
|------------|-----------------------|-----------------------|
| CLK1       | 0.875                 | 0.080                 |
| CLK2       | 1.045                 | 0.080                 |
| А          | 0.630                 | 0.080                 |
| В          | 0.750                 | 0.080                 |
| С          | 0.630                 | 0.080                 |
| D          | 0.750                 | 0.080                 |
| E          | 0.750                 | 0.080                 |
| F          | 0.630                 | 0.080                 |
| G          | 0.630                 | 0.080                 |



Figure 6.10: D Flipflop with Reset

Same schematic [Figure 6.10] is used for both the variants of D-flipflop with Set and Reset signals. The dimension of the transistors are shown in Table 6.5.

| Transistor | PMOS<br>Width<br>[µm] | NMOS<br>Width<br>[µm] |
|------------|-----------------------|-----------------------|
| CLK1       | 0.875                 | 0.080                 |
| CLK2       | 0.983                 | 0.080                 |
| А          | 0.630                 | 0.080                 |
| В          | 0.860                 | 0.080                 |
| С          | 0.630                 | 0.080                 |
| D          | 0.860                 | 0.080                 |
| Е          | 0.630                 | 0.080                 |
| F          | 0.750                 | 0.080                 |
| G          | 0.630                 | 0.080                 |

Table 6.5: Dimensions of D-Flipflop with Reset

There are 10 clock driver cells of different driver strengths. As mentioned earlier, there are two stages in these cells where one inverter is connected to another
inverter. Width of the nMOS is kept at 80 nm. The dimensions of each of them are shown in Table 6.6.

| Cell    | Stage 1 PMOS<br>Width<br>[µm] | Stage 2 PMOS<br>Width<br>[µm] |
|---------|-------------------------------|-------------------------------|
| BUF_X1  | 0.470                         | 0.463                         |
| BUF_X2  | 0.589                         | 0.424                         |
| BUF_X3  | 0.680                         | 0.425                         |
| BUF_X4  | 0.605                         | 0.425                         |
| BUF_X5  | 0.614                         | 0.400                         |
| BUF_X6  | 0.653                         | 0.400                         |
| BUF_X7  | 0.598                         | 0.400                         |
| BUF_X8  | 0.628                         | 0.400                         |
| BUF_X9  | 0.653                         | 0.400                         |
| BUF_X10 | 0.614                         | 0.400                         |

Table 6.6: Dimensions of clock buffers of different strength

To reduce the chip area, manual efforts are added to reduce the cell layouts. Figures 6.11 and 6.12 shows few cell layouts from both the RVT and LVT libraries.

#### 6.9 Characterization

It is already mentioned before how complicated it is to extract the full functionality of the individual cells. Apart from that, functional or delay simulation requires too long time. This is also true for power extraction. It is also difficult to detect the timing constraints automatically. Characterization solves all these problems by generating a simplified model containing timing, power and signal integrity informations, using only foundry device models and the extracted netlists. The circuit behaviour can be accurately emulated as high quality models of a standard cell library are created by means of characterization.

To characterize a library, extracted netlists are generated for the cells which the library contains. It is followed by the specification of different important parameters such as maximum transition time, PVT-corners and so on. The last



(a) Buffer with a strength of 10





(c) D flipflop



(d) MIN3 circuit





Figure 6.11: Few RVT cells from the standard cell library

but no least important task is to select the foundry models. The cells are then simulated with a tool similar to SPICE to obtain the required data. These obtained data is fed into the models thus completing the characterization. In the scope of our work, we have considered three models for library characterization. These are Non-linear Delay Model (NLDM), Composite Current Source (CCS) and Effective



Figure 6.12: Few LVT cells from the standard cell library

current source model (ECSM).

#### 6.9.1 Delay Modelling

Timing attributes applied to a circuit are essential for characterization. It includes both delay and constraints. They are both treated in the same fashion for the purpose of characterization. On the other hand capacitance also influences in the delay of the circuit.

#### 6.9.1.1 Non-linear Delay Model (NLDM)

The input-to-output delay and output transition times are characterized by this model with sensitivity, to input transition time, output load and side input states. A circuit simulator is used with appropriate stimulus to cause output transition to obtain these characteristics. It is based on constant voltage source. There is no effect of IR-drop or any inductive loss on delay modelling NLDM produces reasonably accurate results when used on technologies above or equal to 90 nm.

#### 6.9.1.2 Composite Current Source(CCS)

The CCS technology includes a current-based driver model and a receiver model to provide accurate delay calculation and signal integrity analysis. CCS instructs Liberate to characterize composite current source (CCS) delay data. The ccsn argument will instruct Liberate to characterize composite current source noise (CCSN) data. The ccsp argument will instruct Liberate to characterize composite current source (CCSP) power data. The ccsp option is required when advanced power constructs are needed. Delay calculator of static timing analysis engine looks up, interpolate or extrapolates these two models in liberty. CCS driver model captures output current flowing through load capacitor. Thus CCS model forces characterization engine to have a non-zero capacitance connected to cell's output.

#### 6.9.1.3 Effective current source model(ECSM)

This model uses characterized measurements of current and voltage (I/V curves) over multiple time intervals, with different combinations of input slew and output loading capacitance. I/V curves are used to create more accurate output driver models. These drivers are represented as a voltage controlled current source. This model is more convenient for characterization. ECSM libraries are readily available from Library vendors like Artisan, TSMC, Virage Logic and Virtual Silicon.

#### 6.9.2 Timing Model

The basic linear delay model can be represented as follows:

$$Celldelay = IntrinsicDelay + TransitionDelay + SlopeDelay$$
 (6.1)

The intrinsic delay of a cell is defined as the propagation delay of the cell without load, when it is driven by another identical loadless cell. Both the driving cell and the driven cell must not have any load. Here the driven cell is of our interest to measure the intrinsic delay.

Transition delay is defined as the additional delay of a cell driving a capacitive load, but which is driven by another identical loadless cell. Since the cell is supposed to drive a capacitive load, the rise and fall times of its output increases in comparison to the loadless scenario, thus producing the transition delay.

Slope delay is the delay of a loadless cell which is driven by an identical cell with transition delay. The driving cell drives a capacitive load exhibiting the transition delay, and hence the output slope of the driving cell is less steep than the one without a load. The less steep slope causes additional delay for the driven cell being evaluated.

The Equation 6.1 is known as Linear Delay Model as the delays are modelled to be linearly proportional with the corresponding depending element. Such as,

$$Transition Delay = Output Resistance \times load Capacitance$$
(6.2)

#### $SlopeDelay = SlopeSensitivity \times TransitionDelayofInput$ (6.3)

The output resistance here does not refer to any resistance, rather it refers to some designated linearity factor. is defined as the derivative change in current with respect to the change in voltage applied to a node.

Along with the delays, the timing constraints displayed by the cells are required to model the sequential cells. It includes setup and hold time, recovery and/or removal time, and the permissible range of clock pulse widths.

Setup time is defined as the minimum amount of time before the clock edge in which an input signal must remain stable so that it can be reliably sampled.

Recovery time is defined as the amount of time which has to pass by before the active clock edge until an asynchronous signal is deactivated. The removal time is defined as the minimum allowable time between the active edge of the clock while the asynchronous pin is active and the inactive edge of the same asynchronous control pin.

Although, these constraints are equally important for modelling, the hold time is given special attention when it is too long.

#### 6.9.3 Power Modelling

There are two types of power dissipation in a cell, namely static and dynamic. Static power is the power dissipated when the cell is not operating. It is also known as the leakage power. Dynamic power is dissipated whenever there is any activity in the cell. It has been explained in details in Section 2.2.2.



Figure 6.13: Liberate Tool Flow

A cell is characterized by its static power dissipation for different states. Then the result is divided by the total simulation time so as to calculate the static power. For example, if an INVERTER over a simulation of 70 ns, is switched on for 30 ns and switched off for the remaining time. The average static power is calculated as,

 $AverageStaticPower = (30 \times P_{static.output=High} + 40 \times P_{static.output=Low})/70$  (6.4)

Dynamic power is subdivided into two categories. They are switching power and internal power. Switching power is the energy dissipated because of the charging and discharging of the external load. Internal power dissipation is the additional amount of power dissipation which occurs due to the charging and discharging of the internal capacitances of the cell itself.

The dynamic power profile is generated varying either input voltage transition or the output load while one of them held constant. Therefore, the cell is characterized each time for each input transition time and output load combination.

#### 6.10 Liberate Tool Flow

As mentioned earlier, we have used Liberate from Cadence for the characterization of our standard cell libraries. For the characterization, the tool requires three important inputs. These are the SPICE netlist, the SPICE models and the constraints essential for the characterization of the cells. Once Liberate completes the process, it produces a single new database (.ldb) file in the liberty format, a liberty format .lib file that describes all this electrical and timing behaviour and a datasheet. The datasheet contains the characterization summary in terms of each cell with respect to different timing and power constraints. The .ldb file is reused for timing analysis throughout the development phases of chip design.

This whole flow is automated using a tool command language (tcl) script. It allows us to include cells in the library without significant effort for characterization. It also allows us to adapt the script conveniently for other libraries. The script is shared in Appendix 7.2 for reference. Also, the characterization result for NOR\_X1 cell is shared in Appendix 7.2.

#### 6.11 Comparison between the two libraries

The 65 nm library was developed in CMOS process. Whereas, the 28 nm library was built on FDSOI technology. These two processes are technologically different. In FDSOI technology, the thin barium oxide layer offers an advantage in terms of leakage. This was indeed on of the main reasons to explore and develop a standard cell library in this process.

If we compare the cells in terms of area. It can be clearly observed from Tables 6.1 and 6.2, that area is reduced by 90%. Essentially, it has an effect on full chip area.

The 65 nm library is capable of operating at 250 mV and 250 kHz. In case of the 28 nm library, the optimized performance of the cells can be obtained with a 300 mV supply voltage at 250 kHz. The input supply can be reduced further. But it comes at the cost of larger cell area and more leakage. The leakage of the FDSOI cells is 1000 times lower than the CMOS ones. As our intention was to reduce the power consumption as much as possible, we preferred to have a solution with lower leakage.

## 7 Conclusion and Future Work

For a growing number of digital integrated circuit applications, energy or power consumption is the most critical resource, while performance requirements are moderate. Examples of such applications are RFID transponders, wireless sensor networks or biomedical implants. Sub-threshold operation is capable of meeting these requirements very well. However, the challenge is that such circuits have an increased sensitivity to different disturbing influences of both the manufacturing and the operating parameters. Therefore, special design methods are necessary to achieve a resource-efficient and robust implementation.

This doctoral research work was aimed at achieving two major objectives. The first objective was to design a standard cell library, which is optimized for subthreshold operations and scalable as well. It also involved exploration of downscaling of cells from 65 nm bulk CMOS technology. To optimize the circuit performance, several parameters were selected and a multiobjective optimization process was applied.

The other objective was to exploit the FDSOI technology. So that, the benefit of low leakage can be utilized without forfeiting the performance.

### 7.1 Conclusion

The scaling process from 65 nm bulk CMOSto 28 nm FDSOI technology is not linear. The design space exploration was required to start from scratch. The multiobjective optimization was more helpful in this regards. In order to have an optimal solution, propagation delay, transition energy, static power dissipation and, noise margin were selected as the deciding performance metrics.

As the technology nodes grow narrower, the transistor leakage becomes more. As a result, it takes the primary concern while designing the library cells. This was also the reason, why the operating voltage could not be reduced beyond a certain amount, without loosing on the performance. Here, a special attention was given to combat the leakage power.

However, there are advantages as well to work on sub-nano process nodes. Due to reduced transistor dimensions, the cell area reduces significantly Considering the relation between the cost of silicon with the chip area, this is certainly beneficial.

The lower threshold variant of the FDSOI technology helps to operate at higher

circuit frequency. Though it comes at the cost of leakage power. With the multiobjective optimization it was possible to control the leakage in this scenario.

The benefit of using the multiobjective optimization is also that several sets of values are obtained. Because it returns a Pareto front. As a result, depending on the requirement such as high speed or low power, one value can be chosen.

During this work, total 41 gates have been designed and fully characterized for voltages between 300 mV and 1 V. There are 23 combinational and 4 sequential logic elements. Additionally there are 11 clock tree cells and 3 level shifter circuits. Additional manual efforts were needed to optimize the cell layouts in terms of area and design rule check.

Finally, it can be concluded that the results of the work have produced a standard cell library, which can be used to synthesize a digital circuit. Any existing solution in wider technology nodes, cannot be scaled down to 28 nm FDSOI technology. Though, the cell area and leakage were possible to reduce from the 65 nm library cells. But the cells in 65 nm library could be optimized at 200 mV, which was 300 mV in case of 28 nm FDSOI technology.

### 7.2 Future work

Suggestions for future research stemming from this work are outlined next.

- 1. The backgate biasing needs to be exploited further. FDSOI technology offers both forward and reverse biasing. It can be explored to enhance the cell performance.
- Additional cells can be included in the library. As subthreshold design has particular power efficient target applications. Depending on the netlist of such applications, it is possible to include more logics or flipflops.

# **List of Figures**

| 2.1  | Propagation delay and rise/fall times                                    | 13 |
|------|--------------------------------------------------------------------------|----|
| 2.2  | Dynamic energy calculation [12]                                          | 16 |
| 2.3  | Voltage transfer characteristics and noise margin [12]                   | 17 |
| 2.4  | Variation of wp from 250 nm to 2500 nm                                   | 20 |
| 2.5  | Variation of lp from 30 nm to 300 nm                                     | 21 |
| 2.6  | Variation of wn from 80 nm to 800 nm                                     | 22 |
| 2.7  | Variation of ln from 30 nm to 300 nm                                     | 23 |
| 3.1  | A system with several voltage domains                                    | 25 |
| 3.2  | Principal structure of the up level shifter                              | 27 |
| 3.3  | Conventional level shifter circuit with cross-coupled devices [11] .     | 28 |
| 3.4  | Conventional level shifter circuit with current mirror structure [24]    | 29 |
| 3.5  | (a) shows the level shifter circuit along with the novel (b) RSI circuit |    |
|      | proposed in [25]                                                         | 29 |
| 3.6  | Half-latch based level shifter with current limiters [26]                | 30 |
| 3.7  | Diode connected level shifter circuit [28]                               | 31 |
| 3.8  | Level shifter as proposed in [30]                                        | 32 |
| 3.9  | Circuit-level schematic of the proposed level shifter [31]               | 33 |
| 3.10 | Level shifter schematic proposed by [32]                                 | 34 |
| 3.11 | cross-coupled pMOS based level shifter schematic proposed by [33]        | 35 |
| 3.12 | cross-coupled pMOS based level shifter schematic proposed by [34]        | 36 |
| 3.13 | MTCMOS based level shifter circuit proposed by [35]                      | 37 |
| 3.14 | MTCMOS based level shifter circuit proposed by [36]                      | 38 |
| 3.15 | level shifter circuit proposed by [38]                                   | 39 |
| 3.16 | Level shifter schematic proposed by [39]                                 | 40 |
| 3.17 | level shifter circuit proposed by [40]                                   | 41 |
| 3.18 | WCM based level shifter schematic [41]                                   | 42 |
| 3.19 | Level shifter schematic with MTCMOS implementation[42]                   | 42 |
| 3.20 | Circuit-level schematic of the proposed level shifter [43]               | 43 |
| 3.21 | Level shifter schematic proposed by [46]                                 | 43 |
| 3.22 | Proposed circuit by [47] for fast and energy-efficient wide-range        |    |
|      | voltage conversion from near/sub-threshold up to I/O voltage             | 44 |
| 3.23 | WCM based level shifter schematic proposed by [33]                       | 45 |
| 3.24 | Level shifter schematic proposed by [48]                                 | 46 |
| 3.25 | Level shifter schematic proposed by [50]                                 | 47 |

| 3.26<br>3.27<br>3.28 | Level shifter schematic proposed by [51]                                   | 47<br>48 |
|----------------------|----------------------------------------------------------------------------|----------|
|                      | consumption per transition and static power dissipation                    | 50       |
| 3.29                 | Pareto search with wp1 varying from 200 nm to 800 nm                       | 52       |
| 3.30                 | Pareto front obtained after MOP                                            | 54       |
| 3.31                 | WCM based design exploiting the FDSOI technology                           | 55       |
| 3.32                 | Level shifter schematic designed with RVT transistors                      | 56       |
| 3.33                 | A hybrid topology based level shifter circuit capable of converting        |          |
|                      | 150 mV to 1.2 V                                                            | 57       |
| 3.34                 | Circuit diagram of down level shifter                                      | 59       |
| 4.1                  | Transient behaviour of the proposed level shifter using LVT cells .        | 61       |
| 4.2                  | Propagation delay simulated in LVT and RVT based circuits across           |          |
|                      | different frequencies                                                      | 62       |
| 4.3                  | Energy per transition simulated in LVT and RVT based circuits              |          |
|                      | across different frequencies                                               | 63       |
| 4.4                  | Static power dissipation simulated in LVT and RVT based circuits           |          |
|                      | across different frequencies                                               | 63       |
| 4.5                  | Distribution of the propagation delay                                      | 64       |
| 4.6                  | Level shifter Layout                                                       | 65       |
| 4.7                  | Transient behaviour of the proposed level shifter using LVT cells .        | 66       |
| 4.8                  | Comparison of the performance in terms of propagation delay,               |          |
|                      | energy per transition and static power dissipation across different        |          |
|                      | frequency of operations                                                    | 67       |
| 4.9                  | Monte Carlo simulation representation of the proposed level shifter        | 68       |
| 4.10                 | A graphical representation of the operating range of the level shifter     |          |
|                      | circuits across different technologies in the last couple of decades .     | 74       |
| 4.11                 | A graphical representation of the operating frequency of the level         |          |
|                      | shifter circuits along with the propagation delay across different         |          |
|                      | technologies in the last couple of decades                                 | 75       |
| 4.12                 | Table containing the operating frequency and the propagation delay         |          |
|                      | of the level shifter circuits                                              | 76       |
| 4.13                 | Table containing the operating frequency and the propagation delay         |          |
|                      | of the level shifter circuits                                              | 77       |
| 4.14                 | Table containing the operating frequency and the propagation delay         |          |
|                      | of the level shifter circuits                                              | 78       |
| 4.15                 | Table containing the operating frequency and the propagation delay         | -        |
|                      | of the level shifter circuits                                              | 79       |
| 51                   | $\Delta$ graphical representation of the values mentioned in the table 5.1 | 81       |
| 5.1                  | Schematic of the DBA scheme proposed by [75]                               | 84       |
| 5.2                  | beneficiate of the DDA scheme proposed by [75]                             | 01       |

| 5.3  | Read assist technique proposed by Pilo et. al[76]                   | 85  |
|------|---------------------------------------------------------------------|-----|
| 5.4  | The replica structure proposed by [78]                              | 86  |
| 5.5  | (a) Power line floating scheme (b) write and read replica circuit   |     |
|      | proposed in [79]                                                    | 88  |
| 5.6  | (a) Pulsed WL circuit (b) read modify write scheme proposed in [80] | 89  |
| 5.7  | The memory cell proposed by [83]                                    | 90  |
| 5.8  | Half-swing pulse-mode AND gate proposed by [85]                     | 90  |
| 5.9  | SRAM cell proposed in [89]                                          | 91  |
| 5.10 | Read SNM free 7T cell proposed by Takeda et. al [91]                | 92  |
| 5.11 | A 10T SRAM cell schematic proposed by [90]                          | 93  |
| 5.12 | Schmitt-triggered based 10T SRAM cell in [93]                       | 93  |
| 5.13 | SRAM proposed by [67]                                               | 94  |
| 5.14 | The SRAM circuit proposed by [98]                                   | 95  |
| 5.15 | 12T SRAM proposed by [99]                                           | 96  |
| 5.16 | SRAM proposed in [100]                                              | 97  |
| 5.17 | SRAM design in [103]                                                | 97  |
| 5.18 | SRAM design in [105]                                                | 98  |
| 5.19 | SRAM proposed by [106]                                              | 99  |
| 5.20 | Figure showing a standard 6T SRAM schematic                         | 100 |
| 5.21 | The butterfly graph representing the SNM of a SRAM circuit          | 101 |
| 5.22 | D-latch design                                                      | 102 |
| 5.23 | Tristate designs used                                               | 102 |
| 5.24 | 7-128 row decoder                                                   | 104 |
| 5.25 | Read and write decoder with write driver                            | 105 |
| 5.26 | Schematic showing the sense amplifier circuit                       | 106 |
| 5.27 | Control circuitry                                                   | 107 |
| 5.28 | Diagram of a SRAM architecture                                      | 109 |
| 5.29 | Figure showing 9T SRAM circuit proposed in [94]                     | 110 |
| 5.30 | Butterfly plot of the proposed memory cell                          | 112 |
| 5.31 | The schematic of the sense amplifier used in our work               | 112 |
| 5.32 | The 8x4 memory array using the 9T SRAM design                       | 113 |
| 5.33 | Transient simulation result of the 9T bitcell                       | 114 |
| 5.34 | Transient simulation result of the memory array                     | 114 |
| 6.1  | Standard Cell Based Design Flow                                     | 119 |
| 6.2  | Standard Cell Library Development Process Flow                      | 120 |
| 6.3  | The schematic and the layout of the NAND gate                       | 123 |
| 6.4  | The schematic and the layout of the NAND gate with 2x strength .    | 124 |
| 6.5  | Schematic and layout of the C2MOS flip-flop                         | 125 |
| 6.6  | Schematic and layout of the clock-gating cell                       | 126 |
| 6.7  | Schematic and layout of the down level shifter                      | 128 |
| 6.8  | D latch                                                             | 132 |

| 6.9  | D Flipflop                                   | 133 |
|------|----------------------------------------------|-----|
| 6.10 | D Flipflop with Reset                        | 134 |
| 6.11 | Few RVT cells from the standard cell library | 136 |
| 6.12 | Few LVT cells from the standard cell library | 137 |
| 6.13 | Liberate Tool Flow                           | 140 |

# **List of Tables**

| 2.1 | Details of the transistor dimensions                                 |
|-----|----------------------------------------------------------------------|
| 3.1 | Working Range of the Transistor Dimensions                           |
| 3.2 | Transistor size                                                      |
| 3.3 | Transistor size                                                      |
| 3.4 | Transistor size of down level converters 59                          |
| 4.1 | Measurement comparison of LS designs                                 |
| 5.1 | Table containing year wise data of SRAM area, CGP and M1 Pitch       |
|     | width                                                                |
| 5.2 | The SRAM cell transistor dimensions                                  |
| 5.3 | Comparison of SRAM designs                                           |
| 6.1 | Dimensions of combinatorial gates in 65 nm CMOS technology [129] 130 |
| 6.2 | Dimensions of combinatorial gates in 28 nm FDSOI technology 131      |
| 6.3 | D-Latch transistor dimensions                                        |
| 6.4 | D-Flipflop transistor dimensions 133                                 |
| 6.5 | Dimensions of D-Flipflop with Reset                                  |
| 6.6 | Dimensions of clock buffers of different strength                    |

# Acronyms

| $V_{\rm DDH}$<br>$V_{\rm DDL}$ | high supply voltage.<br>low supply voltage. |
|--------------------------------|---------------------------------------------|
| ABB                            | Adaptive Body-Bias.                         |
| AVS                            | Adaptive Voltage Scaling.                   |
| BL                             | bitline.                                    |
| CAM                            | Content Addressable Memory.                 |
| CC                             | Cross-Coupled.                              |
| CGP                            | Contacted Gate Pitch.                       |
| СМ                             | Current Mirror.                             |
| CMOS                           | Complementary Metal Oxide Semiconductor.    |
| DIBL                           | Drain Induced Barrier Lowering.             |
| DRV                            | Data Retention Voltage.                     |
| DSLS                           | Dual Supply Voltage Level Shifter.          |
| DVFS                           | Dynamic Voltage and Frequency Scaling.      |
| DVS                            | Dynamic Voltage Scaling.                    |
| FDSOI                          | Fully Depleted Silicon-on-Insulator.        |
| GAIO                           | Global Analysis of Invariant Objects.       |
| HB                             | Hybrid.                                     |
| HVT                            | high threshold.                             |
| INWE                           | Inverse Narrow Width Effect.                |

| IP      | intellectual property.                      |
|---------|---------------------------------------------|
| LECC    | Logic Error Correction Circuit.             |
| LEF     | Library Exchange Formats.                   |
| LVT     | low threshold.                              |
| M1      | Metal Layer 1.                              |
| MDCVS   | Modified Dual Cascode Voltage Switch.       |
| MOP     | Multiobjective Optimization Problem.        |
| MT      | Multi-threshold.                            |
| MTCMOS  | Multi-threshold CMOS.                       |
| nMOS    | n-channel Metal Oxide Semiconductor.        |
| NSGA    | Non-dominated Sorting Genetic Algorithm.    |
| NSGA-II | Non-dominated Sorting Genetic Algorithm,II. |
| OT      | Others.                                     |
| pMOS    | p-channel Metal Oxide Semiconductor.        |
| PNR     | Place and Route.                            |
| RDF     | Random Dopant Fluctuation.                  |
| RSCE    | Reverse Short Channel Effect.               |
| RSI     | Reduced Swing Converter.                    |
| RVT     | regular threshold.                          |
| SNM     | Static Noise Margin.                        |
| SOI     | Silicon-on-Insulator.                       |
| SPEA    | Strength Pareto Evolutionary Algorithm.     |
| SPEA2   | Strength Pareto Evolutionary Algorithm 2.   |
| SRAM    | Static Random Access Memory.                |
| SVT     | standard threshold.                         |
| VTC     | Voltage Transfer Characteristics.           |

WCM Wilson Current Mirror. WL wordline.

## Bibliography

- M. Wright. Milestones That Mattered: The planar IC-revolution underestimated. URL: https://www.edn.com/electronics-news/4319121/Miles tones-That-Mattered-The-planar-IC-revolution-underest imated (visited on 04/27/2006).
- [2] P. A. Gargini. "Silicon nanoelectronics and beyond". In: *Journal of Nanoparticle Research* 6.1 (2004), pp. 11–26.
- [3] D. Liu and C. Svensson. "Trading speed for low power by choice of supply and threshold voltages". In: *IEEE Journal of Solid-State Circuits* 28.1 (1993), pp. 10–17.
- [4] A. Wang, B. H. Calhoun, and A. P. Chandrakasan. Sub-threshold design for ultra low-power systems. Vol. 95. Springer, 2006.
- [5] S. Aunet and H. K. O. Berge. "Statistical simulations for exploring defect tolerance and power consumption for 4 subthreshold 1-bit addition circuits". In: *International Work-Conference on Artificial Neural Networks*. Springer. 2007, pp. 455–462.
- [6] R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastava, and S. Kulkarni. "Pushing ASIC performance in a power envelope". In: *Proceedings of the 40th annual Design Automation Conference*. ACM. 2003, pp. 788–793.
- [7] M. H. Abu-Rahma and M. Anis. Nanometer Variation-Tolerant SRAM: Circuits and Statistical Design for Yield. Springer, 2013.
- [8] S. A. Vitale, P. W. Wyatt, N. Checka, J. Kedzierski, and C. L. Keast. "FDSOI process technology for subthreshold-operation ultralow-power electronics". In: *Proceedings of the IEEE* 98.2 (2010), pp. 333–342.
- [9] K. Cheng, A. Khakifirooz, P. Kulkarni, S. Ponoth, J. Kuss, D. Shahrjerdi, L. Edge, A. Kimball, S. Kanakasabapathy, K. Xiu, et al. "Extremely thin SOI (ETSOI) CMOS with record low variability for low power system-on-chip applications". In: 2009 IEEE International Electron Devices Meeting (IEDM). IEEE. 2009, pp. 1–4.
- [10] O. Weber, F. Andrieu, J. Mazurier, M. Casse, X. Garros, C. Leroux, F. Martin, P. Perreau, C. Fenouillet-Beranger, S. Barnola, et al. "Work-function engineering in gate first technology for multi-V T dual-gate FDSOI CMOS on UTBOX". In: 2010 International Electron Devices Meeting. IEEE. 2010, pp. 3–4.

- [11] N. H. Weste and D. Harris. CMOS VLSI design: a circuits and systems perspective. Pearson Education India, 2015.
- [12] S. Lütkemeier. "Ressourceneffiziente Digitalschaltungen für den Subschwellbetrieb". PhD thesis. Bielefeld University, 2013.
- [13] M. Dellnitz, O. Schütze, and T. Hestermeyer. "Covering Pareto Sets by Multilevel Subdivision Techniques". In: *Journal of Optimization Theory and Applications* 124.1 (Jan. 2005), pp. 113–136. DOI: 10.1007/s10957-004-6468-7.
- [14] E. Ziztler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength Pareto evolutionary algorithm. Tech. rep. 103. Computer Engineering and Networks Laboratory (TIK), Swiss Federal Institute of Technology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland, May 2001.
- [15] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. "A fast and elitist multiobjective genetic algorithm: NSGA-II". In: *IEEE Transactions on Evolutionary Computation* 6.2 (2002), pp. 182–197. DOI: 10.1109/4235.996017.
- [16] N. Srinivas and K. Deb. "Multiobjective optimization using nondominated sorting in genetic algorithms". In: *Evolutionary computation* 2.3 (1994), pp. 221–248.
- [17] E. Zitzler and L. Thiele. "Multiobjective evolutionary algorithms: a comparative case study and the strength Pareto approach". In: *IEEE Transactions* on Evolutionary Computation 3.4 (1999), pp. 257–271. DOI: 10.1109/4235. 797969.
- [18] M. Anis and M. Aburahma. "Leakage current variability in nanometer technologies". In: *Fifth International Workshop on System-on-Chip for Real-Time Applications (IWSOC'05)*. 2005, pp. 60–63. DOI: 10.1109/IWSOC. 2005.78.
- [19] D. E. Lackey, P. S. Zuchowski, T. R. Bednar, D. W. Stout, S. W. Gould, and J. M. Cohn. "Managing power and performance for system-on-chip designs using voltage islands". In: *Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design*. 2002, pp. 195–202.
- [20] J. Hu, Y. Shin, N. Dhanwada, and R. Marculescu. "Architecting voltage islands in core-based system-on-a-chip designs". In: *Proceedings of the 2004 international symposium on Low power electronics and design*. 2004, pp. 180– 185.
- [21] R. Puri, D. Kung, and L. Stok. "Minimizing power with flexible voltage islands". In: 2005 IEEE International Symposium on Circuits and Systems. IEEE. 2005, pp. 21–24.

- [22] L.-F. Leung and C.-Y. Tsui. "Energy-aware synthesis of networks-on-chip implemented with voltage islands". In: *Proceedings of the 44th annual Design Automation Conference*. 2007, pp. 128–131.
- [23] L. Stok, R. Puri, S. Bhattacharya, J. Cohn, D. Sylvester, A. Srivastava, and S. Kulkarni. "Pushing ASIC performance in a power envelope". In: *Closing the Power Gap Between ASIC and Custom: Tools and Techniques for Low Power Design* (2007), pp. 323–356. DOI: 10.1007/978–0–387–68953–1\_13.
- J. Doutreloigne, H. De Smet, J. Van den Steen, and G. Van Doorselaer. "Low-power high-voltage CMOS level-shifters for liquid crystal display drivers". In: *ICM'99. Proceedings. Eleventh International Conference on Microelectronics* (*IEEE Cat. No.99EX388*). 1999, pp. 213–216. DOI: 10.1109/ICM.2000. 884843.
- [25] I. J. Chang, J.-J. Kim, and K. Roy. "Robust level converter design for subthreshold logic". In: *Proceedings of the 2006 international symposium on Low power electronics and design - ISLPED '06* (2006), p. 14. DOI: 10.1145/ 1165573.1165579.
- [26] T.-H. Chen, J. Chen, and L. T. Clark. "Subthreshold to Above Threshold Level Shifter Design". In: *Journal of Low Power Electronics* 2.2 (Aug. 2006), pp. 251–258. DOI: doi:10.1166/jolpe.2006.071.
- [27] A. Chavan and E. MacDonald. "Ultra Low Voltage Level Shifters to Interface Sub and Super Threshold Reconfigurable Logic Cells". In: *IEEE Aerospace Conference Proceedings* 79968 (2008), pp. 1–6. DOI: 10.1109/AERO. 2008.4526473.
- [28] Hui Shao and Chi-Ying Tsui. "A robust, input voltage adaptive and low energy consumption level converter for sub-threshold logic". In: ESSCIRC 2007 - 33rd European Solid-State Circuits Conference. IEEE, 2007, pp. 312–315. DOI: 10.1109/ESSCIRC.2007.4430306.
- [29] Y.-S. Lin and D. M. Sylvester. "Single stage static level shifter design for subthreshold to I/O voltage conversion". In: *Proceeding of the thirteenth international symposium on Low power electronics and design - ISLPED '08.* ACM Press, 2008, p. 197. DOI: 10.1145/1393921.1393973.
- [30] A. Hasanbegovic and S. Aunet. "Low-power subthreshold to above threshold level shifter in 90 nm process". In: 2009 NORCHIP. IEEE, 2009, pp. 1– 4.
- [31] S. N. Wooters, B. H. Calhoun, and T. N. Blalock. "An Energy-Efficient Subtreshold Level Converter in 130-nm CMOS". In: *IEEE Transactions* on Circuits and Systems II: Express Briefs 57.4 (Apr. 2010), pp. 290–294. DOI: 10.1109/TCSII.2010.2043471.

- [32] W. Zhao, A. B. Alvarez, and Y. Ha. "A 65-nm 25.1-ns 30.7-fJ Robust Subthreshold Level Shifter With Wide Conversion Range". In: *IEEE Transactions* on Circuits and Systems II: Express Briefs 62.7 (July 2015), pp. 671–675. DOI: 10.1109/TCSII.2015.2406354.
- [33] J. Zhou, C. Wang, X. Liu, and M. Je. "Fast and energy-efficient low-voltage level shifters". In: *Microelectronics Journal* 46.1 (Jan. 2015), pp. 75–80. DOI: 10.1016/j.mejo.2014.10.009.
- [34] B. Mohammadi and J. N. Rodrigues. "A 65 nm single stage 28 fJ/cycle 0.12 to 1.2V level-shifter". In: 2014 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2014, pp. 990–993. DOI: 10.1109/ISCAS.2014. 6865304.
- [35] M. Lanuzza, P. Corsonello, and S. Perri. "Low-Power Level Shifter for Multi-Supply Voltage Designs". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 59.12 (Dec. 2012), pp. 922–926. DOI: 10.1109/TCSII.2012. 2231037.
- [36] M. Lanuzza, P. Corsonello, and S. Perri. "Fast and Wide Range Voltage Conversion in Multisupply Voltage Designs". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 23.2 (Feb. 2015), pp. 388–391. DOI: 10.1109/TVLSI.2014.2308400.
- [37] Y. Kim, D. Sylvester, and D. Blaauw. "LC<sup>2</sup>: Limited Contention Level Converter for Robust Wide-Range Voltage Conversion". In: 2011 Symposium on VLSI Circuits - Digest of Technical Papers (2011), pp. 2010–2011. ISSN: 2158-5601.
- [38] Y. Kim, Y. Lee, D. Sylvester, and D. Blaauw. "SLC: Split-control Level Converter for dense and stable wide-range voltage conversion". In: *European Solid-State Circuits Conference* (Sept. 2012), pp. 478–481. DOI: 10.1109/ ESSCIRC.2012.6341359.
- [39] S. R. Hosseini, M. Saberi, and R. Lotfi. "A Low-Power Subthreshold to Above-Threshold Voltage Level Shifter". In: *IEEE Transactions on Circuits* and Systems II: Express Briefs 61.10 (Oct. 2014), pp. 753–757. DOI: 10.1109/ TCSII.2014.2345295.
- [40] M. Lanuzza, F. Crupi, S. Rao, R. De Rose, and G. Iannaccone. "Low energy/delay overhead level shifter for wide-range voltage conversion". In: *International Journal of Circuit Theory and Applications* 38.7 (2016), pp. 689– 708. URL: http://doi.wiley.com/10.1002/cta.2294.
- [41] S. Lütkemeier and U. Rückert. "A subthreshold to above-threshold level shifter comprising a Wilson current mirror". In: *IEEE Transactions on Circuits* and Systems II: Express Briefs 57.9 (2010), pp. 721–724. DOI: 10.1109/TCSII. 2010.2056110.

- [42] Y. Cao, W. Ye, X. Zhao, and P. Deng. "An energy-efficient subthreshold level shifter with a wide input voltage range". In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2016, pp. 726–729. DOI: 10.1109/ISCAS.2016.7527343.
- [43] Y. Osaki, T. Hirose, N. Kuroki, and M. Numa. "A level shifter circuit design by using input/output voltage monitoring technique for ultra-low voltage digital CMOS LSIs". In: 2011 IEEE 9th International New Circuits and Systems Conference, NEWCAS 2011 (2011), pp. 201–204. DOI: 10.1109/NEWCAS. 2011.5981290.
- [44] Y. Osaki, T. Hirose, N. Kuroki, and M. Numa. "A level shifter with logic error correction circuit for extremely low-voltage digital CMOS LSIs". In: 2011 Proceedings of the ESSCIRC (ESSCIRC). IEEE, 2011, pp. 199–202. DOI: 10.1109/ESSCIRC.2011.6044899.
- Y. Osaki, T. Hirose, N. Kuroki, and M. Numa. "A low-power level shifter with logic error correction for extremely low-voltage digital CMOS LSIs". In: *IEEE Journal of Solid-State Circuits* 47.7 (2012), pp. 1776–1783. DOI: 10. 1109/JSSC.2012.2191320.
- [46] J. Zhou, C. Wang, X. Liu, X. Zhang, and M. Je. "A fast and energy-efficient level shifter with wide shifting range from sub-threshold up to I/O voltage". In: *Proceedings of the 2013 IEEE Asian Solid-State Circuits Conference, A-SSCC 2013* (2013), pp. 137–140. DOI: 10.1109/ASSCC.2013.6691001.
- [47] J. Zhou, C. Wang, X. Liu, X. Zhang, and M. Je. "An Ultra-Low Voltage Level Shifter Using Revised Wilson Current Mirror for Fast and Energy-Efficient Wide-Range Voltage Conversion from Sub-Threshold to I/O Voltage". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 62.3 (Mar. 2015), pp. 697–706. DOI: 10.1109/TCSI.2014.2380691.
- [48] S. R. Hosseini, M. Saberi, and R. Lotfi. "An energy-efficient level shifter for low-power applications". In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2015, pp. 2241–2244. DOI: 10.1109/ISCAS. 2015.7169128.
- [49] S. R. Hosseini, M. Saberi, and R. Lotfi. "A High-Speed and Power-Efficient Voltage Level Shifter for Dual-Supply Applications". In: *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems 2 (2016), pp. 1–5. DOI: 10. 1109/TVLSI.2016.2604377.
- [50] R. Matsuzuka, T. Hirose, Y. Shizuku, N. Kuroki, and M. Numa. "A 0.19-V minimum input low energy level shifter for extremely low-voltage VL-SIs". In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS). Vol. 2015-July. IEEE, 2015, pp. 2948–2951. DOI: 10.1109/ISCAS.2015. 7169305.

- [51] L. Wen, X. Cheng, S. Tian, H. Wen, and X. Zeng. "Subthreshold Level Shifter With Self-Controlled Current Limiter by Detecting Output Error". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 63.4 (Apr. 2016), pp. 346–350. DOI: 10.1109/TCSII.2015.2504025.
- [52] S.-c. Luo, C.-j. Huang, and Y.-h. Chu. "A Wide-Range Level Shifter Using a Modified Wilson Current Mirror Hybrid Buffer". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 61.6 (June 2014), pp. 1656–1665. DOI: 10.1109/TCSI.2013.2295015.
- [53] N. Maroof, M. Sohail, and H. Shin. "An energy efficient sub-threshold to above-threshold level shifter using a modified Wilson current mirror". In: *International Journal of Electronics* 7217.October (2015), pp. 1–12. DOI: 10.1080/00207217.2015.1092596.
- [54] E. Maghsoudloo, M. Rezaei, M. Sawan, and B. Gosselin. "A power-efficient wide-range signal level-shifter". In: 2015 IEEE 13th International New Circuits and Systems Conference (NEWCAS). IEEE, 2015, pp. 1–4. DOI: 10.1109/ NEWCAS.2015.7182025.
- [55] S. Chatterjee and U. Rueckert. "Resource Efficient Sub-VT Level Shifter Circuit Design Using a Hybrid Topology in 28 nm". In: SMACD / PRIME 2021; International Conference on SMACD and 16th Conference on PRIME. 2021, pp. 1–4.
- [56] A. Hasanbegovic and S. Aunet. "Low-power subthreshold to above threshold level shifters in 90 nm and 65 nm process". In: *Microprocessors and Microsystems* 35.1 (2011), pp. 1–9. DOI: 10.1016/j.micpro.2010.11.003.
- [57] I. J. Chang, J. J. Kim, K. Kim, and K. Roy. "Robust level converter for subthreshold/super-threshold operation:100 mV to 2.5 V". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 19.8 (2011), pp. 1429–1437. DOI: 10.1109/TVLSI.2010.2051240.
- [58] M. W. Chen, M. H. Chang, Y. H. Chu, and W. Hwang. "An energy-efficient level converter with high thermal variation immunity for sub-threshold to super-threshold operation". In: *International System on Chip Conference* 1 (2012), pp. 5–10. DOI: 10.1109/SOCC.2012.6398368.
- [59] Y. Huang, A. Shrivastava, and B. H. Calhoun. "A 145mV to 1.2V single ended level converter circuit for ultra-low power low voltage ICs". In: 2015 IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S). Vol. 6. 3. IEEE, 2015, pp. 1–3. DOI: 10.1109/S3S.2015.7333489.

- [60] J. C. García, J. A. Montiel–Nelson, J. Sosa, and S. Nooshabadi. "High Performance Single Supply CMOS Inverter Level up Shifter for Multi–Supply Voltages Domains". In: *Design, Automation & Test in Europe Conference & Exhibition (DATE), 2015.* IEEE Conference Publications, 2015, pp. 1273–1276. DOI: 10.7873/DATE.2015.0033.
- [61] Y. Ho, S.-y. Hsu, and C.-y. Lee. "A Variation-Tolerant Subthreshold to Superthreshold Level Shifter for Heterogeneous Interfaces". In: *IEEE Transactions on Circuits and Systems II: Express Briefs* 63.2 (Feb. 2016), pp. 161–165. DOI: 10.1109/TCSII.2015.2415714.
- [62] M. Lanuzza, F. Crupi, S. Rao, R. De Rose, S. Strangio, and G. Iannaccone. "An Ultralow-Voltage Energy-Efficient Level Shifter". In: IEEE Transactions on Circuits and Systems II: Express Briefs 64.1 (Jan. 2017), pp. 61–65. URL: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?a rnumber=7426416%20http://ieeexplore.ieee.org/document/ 7426416/.
- [63] E. Maghsoudloo, M. Rezaei, M. Sawan, and B. Gosselin. "A High-Speed and Ultra Low-Power Subthreshold Signal Level Shifter". In: *IEEE Transactions* on Circuits and Systems I: Regular Papers 64.5 (May 2017), pp. 1164–1172. URL: http://ieeexplore.ieee.org/document/7792122/.
- [64] Z. Yong, X. Xiang, C. Chen, and J. Meng. "An Energy-Efficient and Wide-Range Voltage Level Shifter With Dual Current Mirror". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 25.12 (Dec. 2017), pp. 3534–3538. URL: http://ieeexplore.ieee.org/document/ 8031975/.
- [65] H.-S. P. Wong, C.-S. Lee, J. Luo, and C.-H. Wang. CMOS Technology Scaling Trend. URL: https://nano.stanford.edu/cmos-technologyscaling-trend (visited on 07/15/2019).
- [66] R. Islam, A. Brand, and D. Lippincott. "Low power SRAM techniques for handheld products". In: ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005. 2005, pp. 198–202.
- [67] T.-H. Kim, J. Liu, J. Keane, and C. H. Kim. "A high-density subthreshold SRAM with data-independent bitline leakage and virtual ground replica scheme". In: 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. IEEE. 2007, pp. 330–606.
- [68] N. S. Kim, K. Flautner, D. Blaauw, and T. Mudge. "Circuit and microarchitectural techniques for reducing cache leakage power". In: *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems 12.2 (2004), pp. 167–184.

- [69] H. Qin, Y. Cao, D. Markovic, A. Vladimirescu, and J. Rabaey. "SRAM leakage suppression by minimizing standby supply voltage". In: *International Symposium on Signals, Circuits and Systems. Proceedings, SCS 2003.(Cat. No.* 03EX720). IEEE. 2004, pp. 55–60.
- [70] K. Kanda, T. Miyazaki, M. K. Sik, H. Kawaguchi, and T. Sakurai. "Two orders of magnitude leakage power reduction of low voltage SRAMs by row-by-row dynamic V/sub dd/control (RRDV) scheme". In: 15th annual IEEE international ASIC/SOC conference. IEEE. 2002, pp. 381–385.
- [71] S. Mukhopadhyay, R. M. Rao, J.-J. Kim, and C.-T. Chuang. "SRAM writeability improvement with transient negative bit-line voltage". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 19.1 (2009), pp. 24– 32.
- [72] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D. Murray, N. Vallepalli, Y. Wang, B. Zheng, and M. Bohr. "A 3-GHz 70MB SRAM in 65nm CMOS technology with integrated column-based dynamic power supply". In: *ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference*, 2005. IEEE. 2005, pp. 474–611.
- [73] H. Yamauchi, T. Iwata, H. Akamatsu, and A. Matsuzawa. "A 0.8 V/100 MHz/sub-5 mW-operated mega-bit SRAM cell architecture with chargerecycle offset-source driving (OSD) scheme". In: 1996 Symposium on VLSI Circuits. Digest of Technical Papers. IEEE. 1996, pp. 126–127.
- [74] H. Mizuno and T. Nagano. "Driving source-line cell architecture for sub-1-V high-speed low-power applications". In: *IEICE transactions on electronics* 79.7 (1996), pp. 963–968.
- [75] M. Yamaoka, K. Osada, and K. Ishibashi. "0.4-V logic-library-friendly SRAM array using rectangular-diffusion cell and delta-boosted-array voltage scheme". In: *IEEE Journal of Solid-State Circuits* 39.6 (2004), pp. 934– 940.
- [76] H. Pilo, J. Barwin, G. Braceras, C. Browning, S. Burns, J. Gabric, S. Lamphier, M. Miller, A. Roberts, and F. Towler. "An SRAM design in 65nm and 45nm technology nodes featuring read and write-assist circuits to expand operating voltage". In: 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers. IEEE. 2006, pp. 15–16.
- [77] B.-D. Yang and L.-S. Kim. "A low-power SRAM using hierarchical bit line and local sense amplifiers". In: *IEEE journal of solid-state circuits* 40.6 (2005), pp. 1366–1376.
- [78] B. S. Amrutur and M. A. Horowitz. "A replica technique for wordline and sense control in low-power SRAM's". In: *IEEE Journal of solid-state circuits* 33.8 (1998), pp. 1208–1219.

- [79] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and T. Kawahara. "90-nm process-variation adaptive embedded SRAM modules with power-line-floating write technique". In: *IEEE Journal of Solid-State Circuits* 41.3 (2006), pp. 705–711.
- [80] M. Khellah, Y. Ye, N. Kim, D. Somasekhar, G. Pandya, A. Farhang, K. Zhang, C. Webb, and V. De. "Wordline & bitline pulsing schemes for improving SRAM cell stability in low-Vcc 65nm CMOS designs". In: 2006 Symposium on VLSI Circuits, 2006. Digest of Technical Papers. IEEE. 2006, pp. 9–10.
- [81] K. Kanda, H. Sadaaki, and T. Sakurai. "90% write power-saving SRAM using sense-amplifying memory cell". In: *IEEE journal of solid-state circuits* 39.6 (2004), pp. 927–933.
- [82] K. Agawa, H. Hara, T. Takayanagi, and T. Kuroda. "A bitline leakage compensation scheme for low-voltage SRAMs". In: *IEEE Journal of Solid-State Circuits* 36.5 (2001), pp. 726–734.
- [83] A. Alvandpour, D. Somasekhar, R. Krishnamurthy, V. De, S. Borkar, and C. Svensson. "Bitline leakage equalization for sub-100nm caches". In: ESSCIRC 2004-29th European Solid-State Circuits Conference (IEEE Cat. No. 03EX705). IEEE. 2003, pp. 401–404.
- [84] F. Ramezankhani. "Designing faster CMOS sub-threshold circuits utilizing channel length manipulation". PhD thesis. Carleton University, 2012.
- [85] K. W. Mai, T. Mori, B. S. Amrutur, R. Ho, B. Wilburn, M. A. Horowitz, I. Fukushi, T. Izawa, and S. Mitarai. "Low-power SRAM design using half-swing pulse-mode techniques". In: *IEEE Journal of Solid-State Circuits* 33.11 (1998), pp. 1659–1671.
- [86] M. Yamaoka, N. Maeda, Y. Shinozaki, Y. Shimazaki, K. Nii, S. Shimada, K. Yanagisawa, and T. Kawahara. "Low-power embedded SRAM modules with expanded margins for writing". In: *ISSCC. 2005 IEEE International Digest of Technical Papers. Solid-State Circuits Conference*, 2005. IEEE. 2005, pp. 480–611.
- [87] R. Saeidi, M. Sharifkhani, and K. Hajsadeghi. "Statistical analysis of read static noise margin for near/sub-threshold SRAM cell". In: *IEEE Transactions* on Circuits and Systems I: Regular Papers 61.12 (2014), pp. 3386–3393.
- [88] A. Wang and A. Chandrakasan. "A 180-mV subthreshold FFT processor using a minimum energy design methodology". In: *IEEE Journal of solid-state circuits* 40.1 (2005), pp. 310–319.
- [89] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester. "A variation-tolerant sub-200 mV 6-T subthreshold SRAM". In: *IEEE Journal of Solid-State Circuits* 43.10 (2008), pp. 2338–2348.

- [90] B. H. Calhoun and A. P. Chandrakasan. "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation". In: *IEEE journal of solid-state circuits* 42.3 (2007), pp. 680–688.
- [91] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake. "A read-static-noise-margin-free SRAM cell for low-VDD and high-speed applications". In: *IEEE journal of solid-state circuits* 41.1 (2005), pp. 113–121.
- [92] B. H. Calhoun and A. Chandrakasan. "A 256kb sub-threshold SRAM in 65nm CMOS". In: 2006 IEEE International Solid State Circuits Conference-Digest of Technical Papers. IEEE. 2006, pp. 2592–2601.
- [93] J. P. Kulkarni, K. Kim, and K. Roy. "A 160 mV robust Schmitt trigger based subthreshold SRAM". In: *IEEE Journal of Solid-State Circuits* 42.10 (2007), pp. 2303–2313.
- [94] S. Lutkemeier, T. Jungeblut, H. K. O. Berge, S. Aunet, M. Porrmann, and U. Ruckert. "A 65 nm 32 b subthreshold processor with 9T multi-Vt SRAM and adaptive supply voltage control". In: *IEEE Journal of Solid-State Circuits* 48.1 (2012), pp. 8–19.
- [95] I. J. Chang, J.-J. Kim, S. P. Park, and K. Roy. "A 32 kb 10T sub-threshold SRAM array with bit-interleaving and differential read scheme in 90 nm CMOS". In: *IEEE Journal of Solid-State Circuits* 44.2 (2009), pp. 650–658.
- [96] J. Myers, A. Savanth, R. Gaddh, D. Howard, P. Prabhat, and D. Flynn. "A subthreshold ARM cortex-M0+ subsystem in 65 nm CMOS for WSN applications with 14 power domains, 10T SRAM, and integrated voltage regulator". In: *IEEE Journal of Solid-State Circuits* 51.1 (2015), pp. 31–44.
- [97] J. Myers, A. Savanth, D. Howard, R. Gaddh, P. Prabhat, and D. Flynn. "8.1 An 80nW retention 11.7 pJ/cycle active subthreshold ARM Cortex-M0+ subsystem in 65nm CMOS for WSN applications". In: 2015 IEEE International Solid-State Circuits Conference-(ISSCC) Digest of Technical Papers. IEEE. 2015, pp. 1–3.
- [98] M.-F. Chang, M.-P. Chen, L.-F. Chen, S.-M. Yang, Y.-J. Kuo, J.-J. Wu, H.-Y. Su, Y.-H. Chu, W.-C. Wu, T.-Y. Yang, et al. "A Sub-0.3 V Area-Efficient L-Shaped 7T SRAM With Read Bitline Swing Expansion Schemes Based on Boosted Read-Bitline, Asymmetric-V<sub>TH</sub> Read-Port, and Offset Cell VDD Biasing Techniques". In: *IEEE Journal of Solid-State Circuits* 48.10 (2013), pp. 2558– 2569.
- [99] Y.-W. Chiu, Y.-H. Hu, M.-H. Tu, J.-K. Zhao, Y.-H. Chu, S.-J. Jou, and C.-T. Chuang. "40 nm bit-interleaving 12T subthreshold SRAM with data-aware write-assist". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 61.9 (2014), pp. 2578–2585.

- [100] J.-J. Wu, Y.-H. Chen, M.-F. Chang, P.-W. Chou, C.-Y. Chen, H.-J. Liao, M.-B. Chen, Y.-H. Chu, W.-C. Wu, and H. Yamauchi. "A Large  $\sigma V_{TH}/V_{DD}$  Tolerant Zigzag 8T SRAM With Area-Efficient Decoupled Differential Sensing and Fast Write-Back Scheme". In: *IEEE Journal of Solid-State Circuits* 46.4 (2011), pp. 815–827.
- [101] N. Verma and A. P. Chandrakasan. "A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy". In: *IEEE Journal of Solid-State Circuits* 43.1 (2008), pp. 141–149.
- [102] A. T. Do, Z. C. Lee, B. Wang, I.-J. Chang, X. Liu, and T. T.-H. Kim. "0.2 V 8T SRAM with PVT-aware bitline sensing and column-based data randomization". In: *IEEE Journal of Solid-State Circuits* 51.6 (2016), pp. 1487– 1498.
- [103] B. Wang, T. Q. Nguyen, A. T. Do, J. Zhou, M. Je, and T. T. Kim. "A 0.2 V 16Kb 9T SRAM with bitline leakage equalization and CAM-assisted write performance boosting for improving energy efficiency". In: 2012 IEEE Asian Solid State Circuits Conference (A-SSCC). IEEE. 2012, pp. 73–76.
- [104] C.-Y. Lu, C.-T. Chuang, S.-J. Jou, M.-H. Tu, Y.-P. Wu, C.-P. Huang, P.-S. Kan, H.-S. Huang, K.-D. Lee, and Y.-S. Kao. "A 0.325 V, 600-kHz, 40-nm 72-kb 9T subthreshold SRAM with aligned boosted write wordline and negative write bitline write-assist". In: *IEEE Transactions on Very Large Scale Integration (VLSI) Systems* 23.5 (2014), pp. 958–962.
- [105] J.-S. Wang, P.-Y. Chang, T.-S. Tang, J.-W. Chen, and J.-I. Guo. "Design of subthreshold SRAMs for energy-efficient quality-scalable video applications". In: *IEEE Journal on Emerging and Selected Topics in Circuits and Systems* 1.2 (2011), pp. 183–192.
- [106] M. E. Sinangil and A. P. Chandrakasan. "Application-Specific SRAM Design Using Output Prediction to Reduce Bit-Line Switching Activity and Statistically Gated Sense Amplifiers for Up to 1.9 x Lower Energy/Access". In: *IEEE Journal of Solid-State Circuits* 49.1 (2013), pp. 107–117.
- [107] E. Seevinck, F. J. List, and J. Lohstroh. "Static-noise margin analysis of MOS SRAM cells". In: *IEEE Journal of solid-state circuits* 22.5 (1987), pp. 748–754.
- [108] A. J. Bhavnagarwala, S. Kosonocky, and J. D. Meindl. "Interconnect-centric array architectures for minimum SRAM access time". In: *Proceedings* 2001 *IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD* 2001. IEEE. 2001, pp. 400–405.
- [109] B. S. Amrutur and M. A. Horowitz. "Speed and Power Scaling of SRAM's". In: *IEEE journal of solid-state circuits* 35.2 (2000), pp. 175–185.
- [110] J. M. Rabaey, A. Chandrakasan, and B. Nikolić. *Digital integrated circuits: a design perspective*. 2003.

- [111] S. Schuster, B. Chappell, R. Franch, P. Greier, S. Klepner, F. Lai, P. Cook, R. Lipa, R. Perry, W. Pokorny, et al. "a 15-ns CMOS 64K RAM". In: *IEEE journal of solid-state circuits* 21.5 (1986), pp. 704–712.
- [112] H. K. O. Berge, M. Blesken, S. Aunet, and D. U. Rückert. "Design of 9T SRAM for dynamic voltage supplies by a multiobjective optimization approach". In: 2010 17th IEEE International Conference on Electronics, Circuits and Systems. IEEE. 2010, pp. 319–322.
- [113] W. Wolf. *Modern VLSI design: IP-based design*. Pearson Education, 2008.
- [114] K. Scott and K. Keutzer. "Improving cell libraries for synthesis". In: Proceedings of IEEE Custom Integrated Circuits Conference-CICC'94. IEEE. 1994, pp. 128–131.
- [115] C. Fisher, R. Blankenship, J. Jensen, T. Rossman, and K. Svilich. "Optimization of standard cell libraries for low power, high speed, or minimal area designs". In: *Proceedings of Custom Integrated Circuits Conference*. IEEE. 1996, pp. 493–496.
- [116] T. Mozdzen. "Design methodology for a 1.0 mu cell-based library efficiently optimized for speed and area". In: *Third Annual IEEE Proceedings on ASIC Seminar and Exhibit*. IEEE. 1990, P12–3.
- [117] D. S. Kung and R. Puri. "Optimal P/N width ratio selection for standard cell libraries". In: Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design. IEEE Press. 1999, pp. 178–184.
- [118] K. Keutzer. "Impact of library size on the quality of automated synthesis". In: *Proc. ACM/IEEE Int. Conf. on Computer-Aided Design, Nov.* 1987. 1987.
- [119] F. Beeftink, P. Kudva, D. S. Kung, R. Puri, and L. Stok. "Combinatorial cell design for CMOS libraries". In: *Integration, the VLSI Journal* 29.1 (2000), pp. 67–93.
- [120] C. Piguet. "Design of low-power libraries". In: 1998 IEEE International Conference on Electronics, Circuits and Systems. Surfing the Waves of Science and Technology (Cat. No. 98EX196). Vol. 2. IEEE. 1998, pp. 175–180.
- [121] C. Piguet, J.-M. Masgonty, S. Cserveny, C. Arm, and P.-D. Pfister. "Low-power low-voltage library cells and memories". In: *ICECS 2001. 8th IEEE International Conference on Electronics, Circuits and Systems (Cat. No. 01EX483)*. Vol. 3. IEEE. 2001, pp. 1521–1524.
- F. Abouzeid, S. Clerc, F. Firmin, M. Renaudin, and G. Sicard. "40nm cmos 0.35 v-optimized standard cell libraries for ultra-low power applications". In: ACM Transactions on Design Automation of Electronic Systems 16.3 (2011), p. 35.

- [123] N. Lotze and Y. Manoli. "A 62 mV 0.13μ m CMOS Standard-Cell-Based Design Technique Using Schmitt-Trigger Logic". In: *IEEE journal of solid-state circuits* 47.1 (2011), pp. 47–60.
- [124] H. Reyserhove and W. Dehaene. "A 16.07 pJ/cycle 31MHz fully differential transmission gate logic ARM Cortex M0 core in 40nm CMOS". In: ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference. IEEE. 2016, pp. 257–260.
- [125] J. Jun, J. Song, and C. Kim. "A Near-Threshold Voltage Oriented Digital Cell Library for High-Energy Efficiency and Optimized Performance in 65nm CMOS Process". In: *IEEE Transactions on Circuits and Systems I: Regular Papers* 65.5 (2017), pp. 1567–1580.
- [126] M. Pons, J.-L. Nagel, D. Séverac, M. Morgan, D. Sigg, P.-F. Rüedi, and C. Piguet. "Ultra low-power standard cell design using planar bulk cmos in subthreshold operation". In: 2013 23rd International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). IEEE. 2013, pp. 9–15.
- [127] M. Blesken, S. Lütkemeier, and U. Rückert. "Multiobjective optimization for transistor sizing sub-threshold CMOS logic standard cells". In: *Proceedings* of 2010 IEEE International Symposium on Circuits and Systems. IEEE. 2010, pp. 1480–1483.
- [128] B. H. Calhoun, A. Wang, and A. Chandrakasan. "Modeling and sizing for minimum energy operation in subthreshold circuits". In: *IEEE Journal of Solid-State Circuits* 40.9 (2005), pp. 1778–1786.
- [129] M. Vohrmann, S. Chatterjee, S. Lütkemeier, T. Jungeblut, M. Porrmann, and U. Rückert. "A 65 nm standard cell library for ultra low-power applications". In: 2015 European Conference on Circuit Theory and Design (ECCTD). IEEE. 2015, pp. 1–4.
- [130] R. Taco, I. Levi, A. Fish, and M. Lanuzza. "Exploring back biasing opportunities in 28nm UTBB FD-SOI technology for subthreshold digital design". In: 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI). IEEE. 2014, pp. 1–4.

## A. Characterization Script

```
set vddVoltage 0.3
set vddsVoltage 0.3
set qndVoltage 0
set gndsVoltage 0
## Define default voltage and temperature ###
set_operating_condition -voltage $vddVoltage -temp 27
# Set vdd and gnd net names
set_vdd vdd $vddVoltage
set_vdd vdds $vddsVoltage
set_gnd gnd $gndVoltage
set_gnd gnds $gndsVoltage
set_var slew_lower_rise 0.1
set_var slew_upper_rise 0.9
set_var slew_lower_fall 0.1
set_var slew_upper_fall 0.9
set_var measure_slew_lower_rise 0.1
set var measure slew upper rise 0.9
set var measure slew lower fall 0.1
set_var measure_slew_upper_fall 0.9
# Set the maximum output transition time allowed
set_var max_transition 1.5e-09
set cells { INV_X1 \
NAND2_X1 \
NOR2_X1 \
MUXI2_X1 \
MIN3_X1 \
```

```
AOI22 X1 \
OAI22_X1}
set combined {input}
define template -type delay \
              {0.02 0.1 0.5} \
-index 1
-index 2
               {0.0150 0.0500 0.1500} \
delay_template_3x3
define_template -type power \
-index_1 {0.02 0.1 0.5} \
-index_2 {0.0150 0.0500 0.1500} \
power_template_3x3
define_template -type constraint \
-index 1 {0.02 0.1 0.5} \
-index_2 {0.02 0.1 0.5} \
constraint_template_3x3
set inputs {A B C D S}
set outputs {Z}
set clocks {CK}
set async {S}
define cell \
-input $inputs -output $outputs \
-constraint constraint_template_3x3
                                       -delay
           delay template 3x3 \
-power
        power template 3x3 \
$cells
```
## **B.** Datasheet

| Cell Group NOR2_X1, | , | Process corr | ner | , Temp 27 | .00 | ), Voltage | 0.5           |
|---------------------|---|--------------|-----|-----------|-----|------------|---------------|
| Function            |   |              |     |           |     |            |               |
| Pin Name            |   | Func         | cti | on        |     |            | -+            |
| +<br>  Z  <br>+     |   | (!(A) →      | : ! | (B))      |     |            | +-<br> <br>+- |
| Footprint:          |   |              |     |           |     |            |               |
| +<br>  Cell         |   | Area         | +   |           |     |            |               |
| NOR2_X1             |   | 0.0000       | +   |           |     |            |               |
| Leakage<br>+        |   | Leakage (nW) |     |           |     |            | -+            |
| +<br>  Cell         |   | Min          |     | Avg       |     | Max        | -+-<br>       |
| NOR2_X1             |   | 0.0004       |     | 0.0009    |     | 0.0012     | -+            |
| Pin Capacitance     |   |              |     |           |     |            | -+            |
|                     | I | Pin Cap(pf)  |     |           |     | Max Cap(pf | )             |
| Cell                |   | A            |     | В         |     | Z          | -+            |
| NOR2_X1             |   | 0.0007       |     | 0.0007    |     | 0.1500     | -+            |
|                     |   |              |     |           |     |            | -+            |

Delay

Delays(ns) to Z rising:

| +<br> <br>+           |                        |  |                  | D | elay(ns)            |                    | +           |
|-----------------------|------------------------|--|------------------|---|---------------------|--------------------|-------------|
| Cell                  | Timing Arc(Dir)        |  | min              |   | mid                 | max                | -<br> <br>_ |
| NOR2_X1  <br> NOR2_X1 | A−>Z (FR)<br>B−>Z (FR) |  | 4.2245<br>3.9119 |   | 13.3442 <br>12.3462 | 39.4895<br>36.5839 | <br> <br>   |

Delays(ns) to Z falling:

| +-                 |                    |  |                        |  |                  | Del | ay(ns)             |  |                    | + |
|--------------------|--------------------|--|------------------------|--|------------------|-----|--------------------|--|--------------------|---|
| +-                 | Cell               |  | Timing Arc(Dir)        |  | min              |     | mid                |  | max                | + |
| +-<br> <br> <br>+- | NOR2_X1<br>NOR2_X1 |  | A->Z (RF)<br>B->Z (RF) |  | 8.6296<br>8.3737 |     | 26.9722<br>26.5312 |  | 79.4379<br>78.4787 | + |

Power

## Internal switching power(pJ) to Z rising:

| + • |                    |  |        |  |                  |  | Power(pJ)        |  | +                  |
|-----|--------------------|--|--------|--|------------------|--|------------------|--|--------------------|
| +   | Cell               |  | Input  |  | min              |  | mid              |  | max                |
| +·  | NOR2_X1<br>NOR2_X1 |  | A<br>B |  | 0.0008<br>0.0005 |  | 0.0008<br>0.0005 |  | 0.0008  <br>0.0005 |

Internal switching power(pJ) to Z falling:

| + | +                    |  |        |           |                  |  |                  |                    | ł    |
|---|----------------------|--|--------|-----------|------------------|--|------------------|--------------------|------|
|   | <br>                 |  |        |           |                  |  | Power(pJ)        |                    | <br> |
|   | Cell                 |  | Input  |           | min              |  | mid              | max                |      |
| - | NOR2_X1<br>  NOR2_X1 |  | А<br>В | <br> <br> | 0.0001<br>0.0001 |  | 0.0001<br>0.0001 | 0.0001<br>  0.0001 | <br> |

| 1       |       |      |   |
|---------|-------|------|---|
| +       |       | <br> | + |
| +       |       | <br> |   |
| T       |       |      |   |
| +       |       | <br> | + |
|         |       |      |   |
| Passive | Power |      |   |

## Hidden power(pJ) for A rising: Conditional

| + |         |  |         |  | Power(pJ) |  | +       |
|---|---------|--|---------|--|-----------|--|---------|
| + | Cell    |  | min     |  | mid       |  | max     |
| + | NOR2_X1 |  | -0.0004 |  | -0.0004   |  | -0.0004 |

## Hidden power(pJ) for A falling:

| Cond | lit | ic | nal |  |
|------|-----|----|-----|--|
|      |     |    |     |  |

| +                 |                    |       |          |  | Power(pJ) |  | +       |
|-------------------|--------------------|-------|----------|--|-----------|--|---------|
| +<br> <br>'       | Cell               |       | min      |  | mid       |  | max     |
| +                 | NOR2_2             | X1    | 0.0004   |  | 0.0004    |  | 0.0004  |
| +                 |                    |       |          |  |           |  | +       |
| Hidden<br>Condit: | power(pJ)<br>ional | for B | rising:  |  |           |  |         |
| +<br> <br>'       |                    |       |          |  | Power(pJ) |  |         |
| +<br> <br>1       | Cell               |       | min      |  | mid       |  | max     |
| +                 | NOR2_2             | X1    | -0.0001  |  | -0.0001   |  | -0.0001 |
| Hidden<br>Condit: | power(pJ)<br>ional | for B | falling: |  |           |  |         |
| +<br> <br>1       |                    |       |          |  | Power(pJ) |  |         |
| +                 | Cell               |       | min      |  |           |  | max     |

|   | NOR2_X1 | 0.0003 | 0.0003 | 0.0003 |
|---|---------|--------|--------|--------|
| + |         |        |        | +      |

END Cell Group NOR2\_X1