Category: Fab

GlobalFoundries Readies the 20nm Process Node


First and foremost, GlobalFoundries expectation of the foundry business exceeding the growth pace of the overall semiconductor industry is coming to life – while the semiconductor industry is growing at a 5% annual rate, foundry business is growing by 10%, i.e. twice as fast. The semiconductor industry is expected to grow from $304 billion in 2011 to $384 billion in 2016. At the same time, leading edge foundry (65nm and below) are expected to double, from $15 to $34 billion. In order to capture as much pieces of the silicon pie, GlobalFoundries is pushing forward with the 20nm process, scheduled to arrive next year.

How did GlobalFoundries make 20nm a cost-effective technology platform?
The major change for the 20nm process is adoption of Gate Last, which was the key differentiator between Common Platform and Intel, TSMC and other players. Now you can be a TSMC customer and switch to IBM/GlobalFoundries/Samsung without too much engineering drama.

The manufacturing advantage is going back and forth between GlobalFoundries, TSMC and Intel. Intel lead the market with 45nm, then TSMC launched 40nm. Intel came back with 32nm, then TSMC and GlobalFoundries both started shipping 28nm semiconductors. In April, Intel launched the 22nm process (after several months of delays, untypical for the 800 pound manufacturing gorilla), and that lead will be continued onto 2013, when 20nm process from GlobalFoundries will take the lead, before all three foundry groups meet at 14nm.

With the 20nm node, GlobalFoundries worked hard with its partners inside the Common Platform Alliance (IBM and Samsung) to make the node as SoC-friendly as possible, yet keeping the same performance characteristics that enabled very high clocking of the 32nm and 28nm parts.

First transistors at 20nm show interesting comparison between GlobalFoundries and its direct competitors
First tests on a test ARM Cortex-A9 core at 20nm show an interesting comparison between GlobalFoundries and its direct competitors

The 20nm Low-Power-Mobile node directly competes for mobile SoC business from companies such as Qualcomm, Texas Instruments or even NVIDIA Corporation. The interesting player in this game will be Qualcomm, which is scheduled to ship its 28nm chips from GlobalFoundries during the third quarter. Even though neither GlobalFoundries or Qualcomm want or are allowed to talk to us in regards to their mutual cooperation, our well-connected sources are telling us that one of key customers, Taiwanese HTC was both ashamed and angry for having to ship its flagship phone, the One X with an old 40nm Snapdragon S3 processor.

Such problems should be the matter of the past at 20nm. GlobalFoundries claimed significant lead in the key Performance-Power-Cost metric, as well as marginally lower battery life based on transistor design alone.

Shrinking the die from 90nm to 20nm opens up new possibilities in terms of transistor density
Shrinking the die from 90nm to 20nm opens up new possibilities in terms of transistor density

20nm also represents an interesting paramount in the silicon industry. Just like Intel’s 22nm, the 20nm process is being geared as the process that will revolutionize the mobile industry. While we won’t see 20nm designs until very late in 2013 (unless somebody pulls a surprise), leading mobile processor vendors will be able to develop a billion transistor design in a smartphone i.e. tablet friendly package.

Crossing the Lab to Fab Chasm: What after 20 and 14nm nodes?
Crossing the Lab to Fab Chasm: What after 20 and 14nm nodes?

After the 20nm reaches its peak in terms of Gate Last HKMG, and first FinFET (i.e. 3D transistors) roll out the door, we can expect 14nm FinFET process coming from all the major industry players. This will also be the first time when we are going to see collision of worlds: Common Platform, TSMC, SMIC and Intel – all on 14nm, all with FinFET transistors. Intel will still probably use bulk silicon, with Common Platform i.e. IBM/GlobalFoundries/Samsung utilizing fully-depleted Silicon-on-Insulator.

SOITEC, leading provider of SOI wafers made significant progress and from what effectively was a dead technology for the 28nm and 20nm node – brought a lot of development forward all the way to 28nm. While GlobalFoundries said they will not offer standard SOI on the 28nm and 20nm, Fully Depleted SOI is making an entrance in a pretty big way, two years ahead of the schedule.

All in all, the 20nm roadmap shown to us by GlobalFoundries holds ground for now. Time will tell if the market is going to dive in for all the products that 20nm process will make possible.

Analog Circuits Benefit from Scaling Trends

The same semiconductor technology roadmap driven by digital scaling requirements can be profitably applied to analog circuits.

By Carlos Azeredo-Leme and Navraj Nandra, Synopsys

As CMOS technologies scale to smaller nodes, both benefits and challenges are created. There are advantages in speed and power due to lower capacitance loading and the lower supply voltage. Conversely, the reduction in intrinsic device gain and available signal swing negatively impact the performance.

Unlike transistors used for “digital” functions—meaning two-state operation—scaling doesn’t happen so readily or cleanly with analog blocks. The motivation to scale is driven by the fact that transistors scale exponentially with each process generation. Basically, transistors get smaller and more of them can be put on the same-size die. Figure 1, which is from Moore’s original paper, is an incredibly bold predication made over 40 years ago. It is actually based on very little data. In fact, we’ve seen the logic gates’ density doubling every 18 to 24 months. On the other hand, analog circuits like analog-to-digital converters (ADCs) only double their speed-resolution period every five years or more.

Figure 1: This representation of Moore’s Law predicting the exponential increase in complexity of integrated circuits was originally printed in “Electronics,” Volume 38, Number 8, April 19, 1965.

There are many reasons that analog doesn’t scale as readily. This can be seen graphically in Figure 2, which shows the transistors that are characteristic to 130- and 40-nm CMOS technologies. In the saturation region, there’s a much larger change in current in the 40-nm process. This change shows that the transistor has much less control over the current. The design of amplifiers is therefore much more difficult, as the realizable gain per stage is significantly reduced. To compensate for that reduction, more sophisticated circuits are required, leading to larger area and higher consumption.

Figure 2: The devices characteristics have degraded significantly from 130 nm process (LEFT) down to 40 nm process (RIGHT).

Another factor is the reduced supply voltages and dynamic range. As the supply voltage is reduced with the migration to lower process nodes, the available signal range shrinks. At today’s 1-V supply level, the signal range may be 0.7 V or less. This lower signal range requires proportionately lower noise levels to maintain the same dynamic range. In mixed-signal circuits, such as ADCs using switched capacitors, the reduction in the noise level can be achieved with larger capacitors. To compensate for a 2X reduction in signal range, the capacitors must be increased 4X— making it quite difficult to scale down the area. In addition, the larger loading capacitances require larger currents for charging and discharging, invariably leading to higher consumption. Is there any hope in benefiting from the process-node scaling that we see in digital blocks?

USB CASE STUDY

Figure 3: This graphic depicts generations of Synopsys’ USB 2.0 PHY showing area scaling from 180 nm down to 28 nm.

Figure 3 shows three generations of Synopsys’ USB 2.0 PHY. Clearly, we’ve managed to scale the design from the original 180 nm to today’s 28-nm version. Getting there wasn’t as simple as re-targeting standard cell libraries and then running automatic place and route. Scaling was achieved for this analog/mixed-signal IP using a number of different design techniques.

First of all, the parameterized transistor cells were technology-node-optimized. In addition, much of the highspeed analog circuitry was pushed to the low-voltage core domain. The smaller technology nodes do have a higher poly sheet resistance per unit area, which helps in making the resistors smaller as well. The I/O voltage also scales from 3.3 to 1.8 V in 28 nm. This voltage scaling provides benefits in more efficient capacitor designs—making the lowpass filter in the phase-locked loops (PLLs), for example, much smaller. Of course, due consideration to leakage, linearity, and breakdown voltages must be made.

The main benefit of smaller technology nodes is to target higher speeds—for example, USB 3.0 operating at 5 Gbits/s or SATA at 6 Gbits/s—rather than improving the power consumption of existing designs. Like digital gates, analog IP benefits from technology scaling, but with a very different methodology—not using design tools, but rather different analog architectures.

DATA CONVERTERS

Here’s an example for ADCs. The run-of-the-mill, dual 10-bit, 80-MHz ADC in 0.18-{LC MU}m technology is 5X smaller in 65 nm. This is impressive. Like the previous USB PHY example, the size reduction has been achieved by architectural changes that were made possible by designing using core (1.2-V) devices. Originally, the ADCs were designed using I/O (3.3-V) devices, due to the higher voltage headroom allowed by these devices.

Presently, all state-of-the-art ADCs are designed using core (1.2-V or lower) devices. Although designing highperformance converters at the core voltage is challenging, it yields substantial gains in terms of maximum sampling rate, power dissipation, and—obviously—area. Architectures have evolved significantly. Many design tricks are employed to reduce area. For example, by employing digital calibration schemes, it’s possible to relax the performance of the individual analog blocks in the ADC. This makes those analog blocks (operational amplifiers, comparators, etc.) simpler, smaller, and lower in power consumption.

In the case of dual-matched converters, it’s possible to be very area-effective by reusing a very-high-samplingrate, single-channel ADC to convert two channels at halfspeed. This is achieved by adding a front-end stage that samples and holds the two channels in the same instant. Area savings of almost 50% can be achieved this way.

As is true for digital designs, a “virtuous cycle” is created by having a smaller design. If the converter is smaller, the parasitic capacitances that it must drive also will be smaller. As a result, the op amp that drives them doesn’t need much higher output drive. In addition, the commensurate biasing circuits that go with it are simpler, allowing even more area (and power) to be saved.

How far can we go with this scaling? Where’s the limit? If we go below 32/28 nm, will we continue to see this size reduction in analog IP? Our conjecture is that area improvements will happen, but not at the dramatic levels seen in the 180-to-65-nm example cited previously. Here are a couple of reasons for this:

  • The advantages of moving from I/O to core devices have already been achieved with 65-nm technologies. Moving forward, it will become harder and harder to design using sub-1-V supplies. In addition, the designs will become more complicated in order to yield good performance at those low voltages. Most likely, there will be only two transistors stacked with many more placed laterally.
  • The converters are now a very small fraction of the complete system-on-a-chip (SoC) area—even if, in some cases, multiple instantiations of the converter are used (for example, in multiple-input multipleoutput [MIMO] transceivers). Therefore, there may be no market driver/need.

“DIGITALLY ASSISTED” ANALOG CIRCUITS

Despite all of the challenges for analog design, a modern SoC offers a great advantage to the analog blocks: the availability of almost limitless computing power. In fact, today’s digital circuits achieve huge densities and extremely low energy per logic operation. For example, 45 nm offers above a million gates/mm2 and two-input NANDs consuming only about 1 nW/MHz. This is at least an order-of-magnitude improvement over 0.18 {LC MU}m.

Frequently, analog circuits are making use of “digital assistance,” which allows simplification of the critical analog circuits that don’t scale easily. Examples include high-resolution ADCs and high-performance analog front ends. These digitally assisted analog design techniques are enabling the analog circuits to analyze themselves and auto-correct their deviations.

A classic technique is calibration or digital correction. Such techniques allow high levels of accuracy to be achieved with smaller component areas. They are directly applicable to circuits that rely on matching, such as successiveapproximation and pipeline ADCs. The challenge in these techniques is to identify algorithms that allow the calibration or correction to be autonomous, rather than at the fabrication stage. The circuit must be able to estimate its own errors and then apply an appropriate compensation without interrupting the normal operation.

Typically, the self-calibrating routine is run at power-up. In some cases, however, the drifts with time and temperature cannot be accepted. The calibration must then continue running in the background or be repeated periodically. To avoid interrupting operation, a redundant stage can be added, allowing the components to be put offline in a rotating fashion for calibration one at a time. Alternatively, a replica stage can be calibrated. The result will then be mirrored to the operating stages.

Calibration can be used in a large variety of situations:

  • Tuning of analog filters: These circuits depend on time constants, which are determined by resistors and capacitors that show large process and temperature deviations. Calibration can be done against a precise clock reference that’s compared with the R-C time constant.
  • Centering of voltage-controlled-oscillator (VCO) frequency range: The oscillators running frequency can vary widely with process and temperature. Calibration can be done by forcing the control voltage to the mid-range and adding loading capacitors to the VCO stages to adjust the running frequency.
  • Offset compensation: Offset is unavoidable in analog circuits due to mismatches. Calibration is made possible by comparing the output voltage to 0. The circuit can then be balanced by adding some adjustment voltage (e.g., through a small digital-toanalog converter [DAC]).
  • Calibration by correlation with other calibrated parameters: In radio-frequency (RF) amplifiers, for example, gain is determined by the transconductance (Gm) of the active device and by the inductor (L) and capacitor (C) values of the tuned load. Normally, similar L and C are already calibrated in the VCO and their calibrating word is available. In addition, Gm can be made to track poly resistance in the biasing generator. It’s therefore quite possible that there’s a strong correlation between the amplifier gain and VCO calibrating word. This can be obtained by simulation and placed in a table lookup that’s used to adjust the RF amplifier during normal operation. No calibration of the RF amplifier takes place. The designer simply relies on correlation with other, already calibrated components.

Calibration techniques are confined to the block level. Other digitally assisted techniques take advantage of the fact that the analog block is embedded in a complete system—including all of the digital-demodulation and data-extraction stages. Today, this complete system is often entirely implemented in one chip. It’s then possible to estimate the quality of the signal being received and obtain information to feed back to the analog circuits to adjust their parameters (see Figure 4). These techniques are very efficient for the compensation of mismatches, phase deviations, and distortions.

In wireless systems, the transmitted signal often includes pilot tones and special coding, which allow the receiver to estimate deviations on the air interface. These same features also enable corrections for deviations in the analog front end using digital signal processing of the received data.

Figure 4: A radio receiver uses system level estimations of the signal quality to provide feedback to the analog circuits and compensate for mismatches, phase deviations and distortions.

Overall, these techniques allow for considerable relaxation of the analog front-end performance, which can be used for minimizing both area and power consumption.

Another trend is the transition of traditional analog functions to the digital domain. One example is filters. Digital filters have many advantages compared with analog ones. They don’t suffer from mismatches and implement mathematically exact transfer functions. In addition, their area and power consumption scale only logarithmically with dynamic range (whereas for an analog filter, they scale quadratically). Since 90 nm, the power consumption of digital filters also is generally lower than the analog equivalent (see Figure 5). Similar observation applies to area.

To move the filters from the analog to the digital domain requires moving the ADCs toward the input (see Figure 6). That means digitizing a wider bandwidth and possibly including some out-of-band interferers that were supposed to have been removed by the filters. However, designing faster ADCs isn’t a big challenge in smaller process nodes. Speed comes together with the scaling of the technology because the devices are faster and the parasitic capacitances are lower. The wider bandwidth and dynamic range isn’t really a showstopper, as the wanted signal bandwidth is still the same. The out-of-band spectrum that needs to be digitized can be treated as noise.

This is the realm of sigma-delta ADCs. They operate at highly oversampled frequencies and digitize the input signal as a highspeed bitstream. In the process, they generate high-energy, high-frequency noise. The output word is obtained after digital decimation in a filter, which reduces the sample rate to the nominal and removes the high-frequency noise. This decimating filter can be merged with the digital filter that was moved from the analog domain, leading to a very efficient overall implementation. Moving filters from the analog to the digital domain has many advantages and no significant disadvantages.

Figure 5: Experimental data demonstrates how digital filters consumption increases only logarithmically with dynamic range and that analog implementations tend to follow a quadratic rule.

Figure 6: A Sigma-delta ADC operates at a high oversampling rate, allowing moving the analog filters to the digital domain.

STOCHASTIC CONVERTERS

A very interesting and ambitious approach is stochastic circuits. Instead of trying to use high-accuracy components (either by making them large or via calibration), stochastic circuits rely on statistics of large numbers. In fact, the components are purposely made to be inaccurate. This technique is best illustrated with an example of a Flash ADC.

A classical Flash ADC includes one comparator for each transition between output codes. For a 3-bit ADC, there are thus seven comparators (see Figure 7a). Each comparator has a trigger point that corresponds to a respective code transition. For example, the first comparator has a trigger point at ½ LSB, the second at 3/2 LSB, and so on. These trigger points are often defined by a resistor tree (also illustrated in Figure 8a). The outputs of the comparators produce a thermometer code, which corresponds to the input signal level.

Figure 7: a) A Flash ADC uses high precision comparators. b) A Stochastic ADC has a similar structure but requires only small low accuracy comparators at the expense of larger numbers.

As the resolution increases, the trigger points grow closer together. In addition, the comparators must be designed for very low offset. This requires a large area or some form of offset calibration for each comparator.

In a stochastic ADC, the comparator trigger points aren’t set by design (see Figure 7b). Rather, they’re allowed to be both random and large. Therefore, the comparator outputs won’t follow a thermometer code with an increasing signal level. Instead, they’ll turn on without any order, as the input signal goes above each individual random offset. Still, the sum of the comparator outputs follows a monotonic characteristic with the input signal level.

The comparators used can be very small because they’re supposed to be quite inaccurate and exhibit large offsets. Those offsets are determined by many factors, such as random variations of the devices VT parameter as well as its length and width (L and W). Because of the large number of comparators—each with independent deviations—the central limit theorem leads to a probability density function (PDF) of the offsets that closely approximates a Gaussian curve. With an input ramp applied to the comparators, the sum of their outputs will have a cumulative distribution function (CDF) as seen in Figure 8. This function can then be linearized and used as the converter output. The number of comparators required for achieving N-bit resolution is on the order of 2 x 4N, which is much larger than the 2N – 1 of a Flash ADC. Because the comparators can use the process’ minimum device size, however, this technique becomes quite interesting in advanced process nodes.

Figure 8: Cumulative Gaussian function that is representative of the characteristic of Stochastic converters.

The semiconductor technology roadmap driven by digital scaling requirements can be applied to analog circuits, such as Universal Serial Bus (USB) physical interfaces and dataconverters. It will therefore provide advantages to the SoC integrator, such as smaller area. Note that the techniques used to scale these analog circuits are different than digital, as they rely on design techniques rather than electronic-design-automation (EDA) tools. In addition, the smaller technologies allow innovation in analog design, such as calibration and digital filters.

GlobalFoundries Production-ready 28nm Fab

Low Power High-K/Metal Gate 28nm CMOS Solutions for High Performance Applications

 

 GlobalFoundries Inc. is the world’s third largest independent semiconductor foundry, with its headquarters located in Milpitas, California. GlobalFoundries manufactures integrated circuits in high volume mostly for semiconductor companies such as AMD, Broadcom, Qualcomm, and STMicroelectronics.

 High-K/Metal Gate (HKMG) is one of the most significant innovations in CMOS fabrication since the development of silicon VLSI. The 28nm technology is designed for the next generation of mobile smart devices demanding faster GHz processing speeds, lower standby power and longer battery life. To meet these demands, the 32/28nm HKMG solution is a “Gate-First” approach that shares the process flow, design flexibility, design elements and benefits of all previous nodes based upon poly SiON gates. This solution is far superior to present alternatives in scalability (performance, power, die size, design compatibility), cost (a typical foundry customer will save tens of millions of dollars over the course of a 28nm vs. 40nm product portfolio lifecycle) and manufacturability.


Figure 1. Dense routing is enabled by 28nm Gate-First HKMG, resulting in substantially smaller die size of ~10-20% depending on user’s standard-cell library

GLOBALFOUNDRIES’ 28nm-SLP technology is the low-power CMOS offering delivered on a bulk-silicon substrate for mobile applications. Relative to other 28nm technologies, it achieves its lower cost platform by substantially reducing process complexity and mask counts. It offers design flexibility with multi-channel length capability and the ultimate in small die size. Available options include multiple SRAM bit cells for high density and high performance.

Since this process downsizes the footprint and power utilization, it optimizes energy efficiency, which translates into significantly longer battery run times and fewer recharge cycles; the benchmark of wireless devices moving forward. The gate-first HKMG process utilizes a functional voltage below 0.8V, scaling 28nm performance and power proportionately against 40nm-LP poly SiON. Overall performance gains include a 49% higher frequency capability, a 44% reduction in energy utilization per switch and >25% reduction in leakage power per circuit . The 28nm- SLP, Gate-First process also supports standard overdrive practices providing additional performance and flexibility gains for a broader application base (wireless AND wired).


Figure 2. Performance and power scaling advantages of GLOBALFOUNDRIES 28 nm-SLP @ 1.1V vs. 40 nm-LP @ 1.2V.

A significant benefit of 28nm-SLP technology is that it provides hefty analog “headroom” (Vcc-Vt) and low noise performance relative to the offerings of other foundries. Gate-First enables a reduction in design complexity by preserving design architecture and layout style, thereby leveraging design investments with IP reuse. This design compatibility helps reduce the overall risks of adopting 28nm.

The Super-Low Vt option provides a performance boost over traditional Vts at a given process node, opening the door for greater than 2GHz performance. The resulting performance boost with a minimal increase in power makes this option attractive to applications with specific thermal requirements that still require the largest performance envelope.

 

 General Description

 

GLOBALFOUNDRIES’ industry-leading 28nm Systems-on-Chip (SoCs) design platform is based on high-k metal-gate (HKMG) technology. GLOBALFOUNDRIES is driving the global standard for new technologies such as High-k Metal Gate (HKMG) with several co-development partners including IBM, Renesas, STMicroelectronics, Samsung, and Toshiba. This 28nm HKMG solution is far superior to that currently pursued by the other leading pure-play foundries, in both scalability (die size, design compatibility, performance) and manufacturability. This 28nm solution is a “Gate-First” approach that shares the process flow, design flexibility, design elements and benefits of all previous nodes based upon poly SiON gate stacks.

 

Features

The technology is available in super low-power (SLP),  high performance-plus (HPP) and low power, high performance (LPH) technology offerings, to cater to the complex requirements of next-generation SoC’s.

The 28nm technologies are based on bulk silicon substrates, and are designed for a wide variety of applications from high performance such as graphics and wired networking to mobile computing and digital consumer to low power wireless mobile applications that require long battery lifetime.

All three SLP, HPP and LPH utilize high k metal gate (HKMG) technology for superior control of the channel with high on currents and low leakage current. Scheduled for risk production after the ramp of 32nm, 28nm is the second technology for high volume production at GLOBALFOUNDRIES that utilizes HKMG.

GLOBALFOUNDRIES’ HKMG enables full scaling from 40nm in area and performance; i.e., 28nm delivers twice the gate density of industry standard 40nm processes and an SRAM cell size shrink of more than 50 percent (cell size of 0.120 square micrometers for dense single port). 28nm transistors offer up to 60% higher performance than 40nm at comparable leakage with up to 50% lower energy per switch and 50% lower static power. As a leading manufacturer of x86 CPU’s, GLOBALFOUNDRIES well understands the constraints and trade-offs of performance, power and area.

28nm Features

 

28nm Super Low Power

28nm-SLP targets low-power applications including cellular base band, application processors, portable consumer and wireless connectivity devices. 28nm-SLP utilizes HKMG and presents a 2x gate density benefit, but is a lower cost technology in terms of the performance elements utilized to boost carrier mobilities.

28nm-SLP supports four Vt options – super low, low, standard, and high Vt with Vddnominal voltage of 1.0V. The I/O devices support 1.8V (with underdrive option to 1.5V) and , 2.5V (with underdrive 1.8V and 3.3V overdrive options) to meet different product specifications. 28nm-SLP also features a wide choice of metal stack options, optimized for density and power. Furthermore, a rich RF CMOS offering will be available in 2012, making it an ideal platform for the next generation of  system-on-chip (SoC) wireless connectivity designs supporting multiple communication protocols.

 

28nm High Performance Plus (HPP)

28nm-HPP targets high performance networking and wired communication applications. The technology supports low, standard, and high Vt options with an operating voltage of 0.85V.  The lower operating voltage is selected for the lowest possible active power , critical for networking and server products where carbon footprint is increasingly an important consideration. The I/O choices include 1.8V (with an underdrive option to 1.5V)  to meet different product specifications. 28nm-HPP  features a wide choice of metal options. 28nm-HPP technology provides a performance boost of as much as 10% over a competitive offering.

 

28nm Low Power, High Performance (LPH)

28nm-LPH  technology complements the SLP technology, extending the frequency of operation for high-performance smartphones, high-end tablets, and notebook computers.  The technology supports 4 Vts (low, standard, high and ultra-high) options with an operating voltage of 0.90V.  Understanding the requirement for extended battery life needed for mobile environments, the technology comes with ultra-low leakage transistors as well as low leakage memories.  When compared to 40nm, the technology can provide 50 percent active power reduction or 60 percent performance boost.  The I/O devices support 1.8V (with underdrive option to 1.5V) and , 2.5V (with underdrive 1.8V and 3.3V overdrive options) to meet different product specifications. 28nm-LPH also features a wide choice of metal stack options, optimized for density and power.

28nm Low Power, High Performance (LPH)