Aging Effects in FPGAs: an Experimental Analysis

Abdulazim Amouri*, Florent Bruguière†, Saman Kiaemeh*, Pascal Benoît†, Lionel Torres† and Mehdi Tahoori*

*Institute of Computer Engineering, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
†LIRMM, CNRS - University of Montpellier 2, 161 rue Ada, 34095 Montpellier Cedex 5, France
Email: {FirstName.LastName}@kit.edu, Kiaemeh@kit.edu
Email: {FirstName.LastName}@lirmm.fr

Abstract—Modern Field Programmable Gate Arrays (FPGAs) are built using the most advanced technology nodes to meet performance and power demands. This makes them susceptible to various reliability challenges at nano-scale, and in particular to transistor aging. In this paper, an experimental analysis is made to identify the main parameters and phenomena influencing the performance degradation of FPGAs. For that purpose, a set of controlled ring-oscillator-based sensors with different frequencies and tunable activity control are implemented on a Spartan-6 FPGA. Thus, the internal switching activities (SAs) and signal probabilities (SPs) of the sensors can be varied. We performed accelerated-lifetime conditions using elevated temperatures and voltages in a controlled setting to stress the FPGA. A novel monitoring method based on measuring the electromagnetic emissions of the FPGA is used to accurately monitor the performance of the sensors before and after the stress. The experiments reveal the extent of performance degradations, the impact of SPs and SAs, and the relative impacts of BTI and HCI aging factors.

I. INTRODUCTION

In order to meet the high performance and low power demands of modern reconfigurable systems, Field Programmable Gate Arrays (FPGAs) are fabricated using the most advanced CMOS technologies and with the highest level of integration. Nowadays, state-of-the-art FPGAs are built using the advanced 20 nm technology [1]. Furthermore, FPGAs based on 16 nm and 14 nm technologies are also in the planning phase [2]. This excessive downscaling makes the FPGAs susceptible to several manufacturing and reliability challenges such as process variation, soft errors, transistor aging and thermal issues [3].

Transistor aging, in particular, is one of the most important reliability challenges at nano-scale [3]. It happens on a relatively long time period, where the circuit delay degrades (increases) continuously over the operational lifetime leading to timing failures. There are several transistor degradation mechanisms; two of the most important ones are Bias Temperature Instability (BTI) [4] and Hot Carrier Injection (HCI) [5]. The BTI mechanism consists of two separated phenomena: Negative BTI (NBTI) affecting PMOS transistors and Positive BTI (PBTI) affecting NMOS transistors. The degradation caused by these mechanisms is related to several different parameters such as temperature, supply voltage and usage. The main difference between the effects of BTI and HCI on the delay of the transistor is that the BTI goes through two phases (stress phase: when the transistor is reversely biased where the transistor delay increases and recovery phase: when the transistor is off where the delay recovers toward its initial value), while for HCI there is no recovery phase and the effect is permanent. Additionally, the BTI mechanism can be distinguished by its sensitivity to the signal probabilities (i.e., the ratio of stress time when the transistor is ON, to the total time), while the HCI by the sensitivity to the amount of switching activities (i.e. as a function of operational clock frequency).

The purpose of this work is to perform an experimental analysis of the impact of various parameters, such as temperature and voltage as well as the usage (signal probabilities and switching activities) on performance degradation of FPGAs impacted by various aging mechanisms. We would like to observe: i) the extent of performance degradation and ii) the relative criticality of different aging mechanisms (BTI vs HCI).

The experiment is done using a set of controlled ring-oscillator-based sensors with different lengths and tunable activity control, which are implemented on pre-specified locations in a Spartan-6 FPGA. These sensors are designed in such a way that the internal switching activities (SAs) and signal probabilities (SPs) of them can be varied and hence the effect of these parameters can be analyzed. Afterwards, accelerated-lifetime conditions using elevated temperatures and voltages are applied to stress the FPGA and emulate the aging process.

The performance of the sensors and hence their delay degradations are monitored throughout the experiment period (one week of stress with 80°C and $V_{dd} = 150\%$ of the nominal voltage, followed by one week of recovery). The influences of several factors such as signal probabilities and activities on the extent of performance degradation of the sensors are analyzed. The results provide some key insights regarding the relation between these factors and the resulted degradation. There is some prior work which uses accelerated-lifetime conditions for stressing FPGA chips and emulating the aging effects such as [6], [7] and [8]. Different modes of stress conditions are investigated in [6], [7] ranging from normal operating conditions (1.2V and 310°CK) to high ones (2.2V and 420°CK). Test circuits that utilize both the LUTs and the routing resources are used for measuring the performance degradation. Similar conditions are applied in [8] to analyze the effect of aging on Physical Unclonable Functions (PUFs). The PUFs were basically a set of ring oscillators mapped to FPGA LUTs.

The main differences between our work and the aforementioned experiments are: i) Investigating the effect of different parameters that have direct relation with aging such as SP and SA and their combinations to identify the contribution of different degradation mechanisms from a system-level perspective, ii) The uniform placement of the different sensors (i.e. the test circuits) across the FPGA chip, which assures a homogeneous thermal profile across the chip, and iii) The use of a novel non-intrusive performance monitoring method based on measuring the electromagnetic emissions of the FPGA [9], which is more accurate than the methods that have on-chip communication modules. This is because those methods may introduce biases in the measurements, which come as a result of intra-die variations, heat generation and voltage droops.
II. SENSORS DESIGN AND IMPLEMENTATION

The main parameters that affect the aging process in FPGA, must be controlled in such a way that their values can be varied in order to analyze their contribution to the total performance degradation. In this section, the details of the sensors used in the aging experiment to vary these parameters are discussed.

A. Sensors design

As mentioned before, the BTI mechanism is distinguished by its sensitivity to the SP, while the HCl by the sensitivity to the amount of SA. Taking that into account, a set of four controllable Ring-Oscillator (RO)-based sensors are used to vary the values of SPs and SAs (see Figure 1). The number of the stages in the RO can determine the generated frequency, and hence the amount of SA. Two of the sensors are utilized for this purpose; the first one (S1) with three inverter stages (reaching a frequency of ≈ 350 MHz on a Spartan-6 FPGA) and the second one (S2) with a single inverter stage (for a maximum possible frequency of ≈ 900 MHz on the same FPGA).

To vary the SP of the internal sensor stages, the switching of the RO itself can be controlled. Usually, all the internal stages of the RO have a fixed input SP of 50% because the inverters toggle continuously between logic-0 and logic-1. Adding an external enable signal (En) to enable/disable the switching of the RO can change the input SPs of all the inverters accordingly. Based on that, an enable signal is added to both S1 and S2 to specify their internal SPs. A clock signal with a duty cycle of 10% is fed to this enable signal, which set the internal SPs of S1 to 5%, 95% and 5% respectively as shown in Figure 2, and the internal SP of S2 to 5%. It should be noted that this enable signal is very low frequency (i.e., 10 kHz) compared to the frequency of RO such that it does not interfere with the functionality of RO. In order to get the effect of the reverse combinations of input SPs, two other sensors are used; S3 as a counterpart to S1 with internal SPs of 95%, 5% and 95%, and S4 as a counterpart to S2 with input SP of 95% as shown in Figure 1.

B. Implementation

With the aim of realizing the aging measurement, a Nexys 3 board that offers a xcs6si16-2csx324 Spartan 6 Xilinx FPGA is used [10]. This FPGA is manufactured with a 45-nm process technology. The slices inside the Configurable Logic Blocks (CLBs) of this device can be divided into 3 different types: SliceX, SliceL, and SliceM. SliceXs are the basic slices and are composed of Look Up Tables (LUTs) and Flip-flops (FFs). SliceLs include in addition an arithmetic carry structure and wide multiplexers. SliceMs, which are the most complex ones, allow using the LUTs as distributed RAM and shift registers. Since these different types do not have the same resources, one can assume that they do not present exactly the same timing performances. In order to compare the measured frequencies among different locations, the sensors have to be implemented on the same type of slices. SliceXs were chosen since they represent one half of the available FPGA slices.

Similarly, the exact same configuration of LUT inputs is chosen, which means that the routing nets structure is the same for all the sensors. This is also to be sure that the same internal paths inside each LUT are used so a comparison between them is then possible. The sensors are implemented using Xilinx Design Language (XDL) description, then converted into Hard Macro (HM) to be sure that the resources used in the final design are those defined in the specifications.

In fact, ROs are PVT sensors (i.e. they are sensitive to Process, Voltage and Temperature variations). For this reason, 40 sensors of each type were implemented. First, the different types of sensors are placed homogeneously to guarantee an homogeneous thermal profile across the chip. Secondly, depending on their location on the floorplan, random and systematic variations of the process may affect the actual frequencies of ROs. The different types of sensors are hence interleaved to average process variations.

III. EXPERIMENTAL SETUP AND SCHEDULE

In this section, we describe our experimental setup, depicted in Figure 3, for both applying the accelerated lifetime conditions and monitoring the performance of the sensors before and after the stress. This is followed by an illustration for the schedule in which the experiment is applied.

A. Setup

In our setup, to eliminate the on-board sources of variation, in addition to applying the accelerated lifetime conditions, the Nexys board was modified to have a direct access to the FPGA core voltage. The FPGA board is then supplied using an external high-precision dynamic voltage controller and the temperature is regulated using a dynamically controllable thermal chamber (See Figure 3). For monitoring the performance of the sensors before and after the stress, an Electro-Magnetic (EM)
method [9] is used to guarantee that only variations due to aging are captured. This is unlike related approaches that have on-chip communication modules, which are susceptible to intra-die variations that can influence the measurements. The monitoring is done by configuring the FPGA each time with a single sensor at a certain location, and then capture its frequency using EM analysis. This process is repeated for each possible location of each sensor type on the FPGA. In this way, a frequency cartography for the FPGA is built.

B. Schedule

The experimental schedule is as follows:

1) At the beginning (at Day 0), before stressing the FPGA circuit, a fresh characterization for the whole FPGA is performed under nominal operational conditions (i.e., with 1.2V and 25°C). This is done by placing S1, S2, S3 and S4 successively (only one sensor from each type at a time) at their pre-defined locations (40 positions for each sensor type) for capturing their fresh oscillating frequency.

2) Afterwards, the accelerated lifetime conditions are applied by exposing the FPGA to an elevated temperature and core voltage. The core was supplied by a 1.8V and the core was supplied by a 1.8V voltage (50% above its nominal value of 1.2 V) using the aforementioned external power supply, and the FPGA was heated to 80°C using the thermal chamber, while the stress configuration was in operation. These conditions are applied for 7 days continuously without any interrupt. This is to avoid any possible intermediate recovery that may happen to the sensors.

3) At Day 7, directly after the stress, the circuit is set back under the nominal conditions (1.2V and 25°C), and a full characterization is performed, exactly as in step 1, for all the sensors (S1 - S4). The FPGA is then completely powered-off for the rest of the day.

4) Step 3 is then repeated on a daily basis until Day 14.

IV. EXPERIMENTAL RESULTS AND ANALYSIS

As discussed in Section III, the FPGA is characterized before the stress (at Day 0) and after the stress (at Day 7), then on a daily basis during the recovery phase (at Day 8, Day 9, Day 12, Day 13 and Day 14). The main results and observations are discussed in the following section followed by a thorough analysis in Section IV-B.

A. Results

The frequency changes of each type of sensors (40 sensors from each type) during the experiment period are depicted in Figure 4. Unfortunately, for the sensors of type S4, the measurement data got corrupted after Day 8. Actually it is because post-processing is more complex with our equipment with such frequencies (which means it requires much time and visual analysis). However, the trend was the same after 2 and 3 days on a subset of points. Therefore, for both S2 and S4 sensors the results are shown only till Day 8 to allow the comparison between these two types.

As can be observed in Figure 4, there are two distinguished groups of frequencies appearing for each sensor type. These are the result of mapping the sensors to different CLBs. Actually, as discussed in Section II-B, each CLB in Spartan-6 FPGA contains two type of slices: either SliceX with SliceL or SliceX with SliceM. Although only SliceXs are chosen to map the sensors, the results show that the sensors mapped to the CLBs that contain SliceM beside SliceX are slower than those mapped to the CLBs that contain SliceL beside SliceX by about 7 to 9%. This is inline with the previous results of [9]. The other observation is that there is a clear performance variation between the sensors of each type.

Although the previous observations are interesting, the main observation however in Figure 4 is the relatively large performance degradation after the stress for all the sensors, This degradation reaches to 5.17% for some sensors of type S2 (see Figure 4).

B. Analysis

The results of the aging experiment, which are given in the previous section, show an aging extent of up to 5.17% after just one week of continuous stress. The effect of different parameters on this extent are analyzed in the following:

1) Effect of Input Signal Probability (SP): As mentioned in Section II-A, in terms of SPs, S1 is the counterpart of S3 and S2 is the counterpart of S4. If we take the measurements of Day 7 (directly after the stress), we will find that S1 has higher aging than S3 by about 18% on average, also S2 has higher aging than S4 by about 10% on average. The measurements of the next day (Day 8), show that some recovery happens. This recovery was higher in S1 and S2 than in S3 and S4. The results of the recovery make the aging of S1 and S3 comparable. The same also can be observed for S2 and S4. This trend continues until the end of the experiment at Day 14.

Based on the fact that the BTI mechanism is sensitive to SP changes and it is the only mechanism that has a recovery effect, these results show that the input SPs play a role in aging. However, more experiments are needed to verify this and to determine what is better in terms of aging, low input SPs or higher ones.

2) Effect of Switching Activity (SA): In terms of SA, S1 is the counterpart of S2 and S3 is the counterpart of S4 (see Section II-A). Both S2 and S4 are about 250% faster (i.e. have higher SAs) than their counterparts. However, the measurements at Day 7 (directly after the stress) show that S2 has only about 12% aging on average more than S1, and S4 has about 21% aging on average more than S3. After the recovery at Day 8, this difference becomes about 33% on average S2 more than S1 and about 24% on average S4 more than S3. This trend continues till the end of the experiment at Day 14.

These results show that the frequency change has a limited effect on aging for this FPGA technology. Another support for this conclusion is the results of aging for both CLBs that contain
Day 7

Day 7

Day 7

Day 7

Day 7

Day 7

Day 7

Day 12

Day 14

Day 9

Day 8

Day 8

Day 8

Day 12

Day 14

Day 9

Day 9

Day 14

Day 8

Day 8

Day 12

Day 9

of this paper.

of possible aging mitigation strategies based on the observations

More experiments are planned in the future for further analysis

experiment is needed to verify this.

and S4) becomes comparable after Day 8. However, still more

relaxation was happening during the 7 days of stress for some of

AC type of stress, not a DC. This means that some sort of BTI

had such time to recover. In fact, since all sensors are ROs,

there was not enough recovery time during the stress (the first 7
days) for these two types of sensors, while the other sensors
show clear recovery. A possible reason could be that

this technology node is less than the BTI effect.

3) Recovery Results: Among all sensors, only S1 and S2
sensors show clear recovery. A possible reason could be that
there was not enough recovery time during the stress (the first 7
days) for these two types of sensors, while the other sensors
had such time to recover. In fact, since all sensors are ROs,
and the transistors under stress were switching, there was an
AC type of stress, not a DC. This means that some sort of BTI
relaxation was happening during the 7 days of stress for some of
the sensors. This may explain why the aging of S1 and S3 (S2
and S4) becomes comparable after Day 8. However, still more
experiment is needed to verify this.

V. CONCLUSIONS

We have presented in this paper the analysis for some of
the main parameters influencing the performance degradation
resulted from transistor aging in FPGAs. The analysis is based
on the result of stress-sensing a Spartan-6 FPGA, where a set of
controlled ring-oscillator-based sensors with different lengths
and tunable activity control is implemented. Furthermore, a
novel monitoring method based on measuring the electromagnetic
emissions of the FPGA is used to accurately monitor the
performance of the sensors before and after the stress. The
results show a degradation of up to 5.17% in the performance of
the sensors after one week of stress. The following conclusions
are also observed:

• Input SPs play a role in degradation.
• The input frequency (SAs) plays also a role, but the impact
of operational frequency on the aging was less compared
SP. This suggests that BTI aging is the dominant factor
in this technology node compared to HCI.

More experiments are planned in the future for further analysis
of possible aging mitigation strategies based on the observations
of this paper.

REFERENCES

[1] N. Mehta, “Xilinx UltraScale Architecture for High-
Performance, Smarter Systems,” Xilinx White Paper WP434,
December 2013.

Generation Architecture for Your Next-Generation Architec-

nology Scaling and Reliability Challenges,” Microelectronics
to Nanoelectronics: Materials, Devices & Manufacturability,

[4] W. Wang, S. Yang, S. Bhardwaj, S. Vrudhula, F. Liu, and
Y. Cao, “The Impact of NBTI Effect on Combinational Cir-
cuit: Modeling, Simulation, and Analysis,” Very Large Scale
Integration (VLSI) Systems, IEEE Transactions on, vol. 18,

[5] A. Bravaix, C. Guerin, V. Huard, D. Roy, J. Roux, and
E. Vincent, “Hot-carrier acceleration factors for low power
management in DC-AC stressed 40nm nMOS node at high
temperature,” in Reliability Physics Symposium, 2009 IEEE

“Degradation in FPGAs: measurement and modelling,” in
FPGA ’10: Proceedings of the 18th annual ACM/SIGDA
international symposium on Field programmable gate arrays.

Mitigation in FPGAs,” in Field Programmable Logic and
Applications (FPL), 2010 International Conference on, 31

[8] A. Maiti, L. McDougall, and P. Schaumont, “The Impact of
Aging on an FPGA-Based Physical Unclonable Function,” in
International Conference on Field Programmable Logic and
Applications (FPL), 2011, pp. 151 – 156.

Process Characterization Method for FPGAs Based on Elec-
tromagnetic Analysis,” in FPL 21st International Confer-
ence on Field Programmable Logic and Applications. Ieee,
Sep. 2011, pp. 20 –23.