#### Highly Reliable System-on-Chip using Dynamical Reconfigurable FPGAs

#### Boyang Du David Merodio Codinachs Luca Sterpone







Politecnico di Torino CAD Group - Dipartimento di Automatica e Informatica Torino - Italy

#### Goal

- Analysis of Single Event Upset sensitivity of SRAMbased FPGAs
  - Identification of Single Points of Failure (SPFs)
  - Error rate estimation
- Mitigation

#### Outline

- 3
- Introduction: SEU scenario
- Verification and Error Rate Integrated tool (VERI-Place)
  - Configuration memory Database
  - Execution flow
  - Results and Classification
- Experimental results
  - Radiation test and Fault Injection
- Conclusions and future works

4

#### The bitstream

#### The original netlist





5

#### The bitstream



#### The corrupted netlist



6

#### The bitstream



#### □ The corrupted netlist



### SEU scenario: accumulation of 2-bit

#### The bitstream

#### $0 \rightarrow 1$ $0 \rightarrow 1$

# A short circuit is created B and other effects are inserted

The corrupted netlist

- SEU within the configuration memory
  - FPGA resource not affected : NO ERROR
  - □ FPGA resource affected : **ERROR**
- SEU induced architectural modification
  - Logic Element: LUT, MUX, FF Config
  - Interconnections: Switchbox





1/0 I/O٧1 V2 \* [] \*\* [] \* + 1 ++ 1 + DOCTOR HI 1 11 1 \*1:\*1:\* H2 I/OI/O2 4 2 4 2 ≠∐ <u>+</u>∐ ≠ =П==П= # [] ## [] # 142425 1 11 1 +<u>↓</u>++<u>↓</u>+ I/O1/0 1 11 1 Let the 2 4 2 4 2 4 44 14 4 44 4 I/OI/O

FPGA array

FPGA configuration memory

Execution time



FPGA configuration memory

Execution time



FPGA array



11





FPGA array

FPGA configuration memory

SEU effect

Execution time

12





FPGA configuration memory

FPGA array

Execution time SEU effect: No error





FPGA configuration memory

FPGA array

Execution time

SEU effect: No error

14





 FPGA configuration memory
 FPGA array

 Execution time
 SEU effect: No error
 SEU effect: Error



I/O1/0 V2 +2 +2 ¢ 1 1 1 8-18 Barris I/OI/O+ (++ ) + (++ ) \* \* \* \* \* \* 1 11 1 -111 11 142424 2 4 3 4 3 #U = U # = □ = = □ == [] = \*D##D# 1 11 1 \* + + + + 1/0 I/O\* 1 11 1 1 11 11 2 4 2 4 2 44-46 - 44 4 44 4 I/O1/0

FPGA array

FPGA configuration memory

Execution time SEU effect





FPGA array

FPGA configuration memory

Execution time SEU effect Scrub cycle





FPGA configuration memory

Execution time SEU effect Scrub cycle

FPGA array





FPGA configuration memory

FPGA array

Execution time SEU effect Scrub cycle





**SEU effect: Error** 

FPGA configuration memory

Execution time SEU effect Scrub cycle

TMR M0 M1 M2 Masked error

The application of Netlist-based TMR and scrubbing is an effective solution

Drawbacks: power consumption and functional availability

21





Errors affecting the circuit outputs happen at the same time

Probability of SEU location
 Avoid of SPFs: TMR is a MUST

### The proposal: VERI-Place tool

- 23
- Measurement of the Application Error Probability (AEP)
  - Number of SEUs in the FPGAs configuration memory until an output error is observed
- Analysis of different design techniques
  - Fault tolerance (DWC, TMR, XTMR,...)
  - Static
  - 🗆 Dynamic
  - Partial and dynamic

### The proposal: VERI-Place tool



### The proposal: VERI-Place tool



#### Configuration memory DB



### Configuration memory DB

| Device Name      | Total PIP [#] | Effective PIP [#] |
|------------------|---------------|-------------------|
| XC5VLX50T-FF1136 | 18,975,457    | 15,695            |
| XC7K70E-2FBG676  | 29,466,958    | 21,081            |
| XC7K325T-2FBG900 | 123,919,224   | 21,081            |

- PIPs of the whole FPGA architecture
- PIPs replica are on different CLB positions
- A PIP requires about 30-35 seconds to be decoded
   XC7K70E would require about 32 years!
- FPGA array is regular, apart from *specific* architectural PIPs
   unique PIPs of a given FPGA device are distinguishable

## Configuration memory DB

| Device Name      | Effective PIPs [#] | Decoding [days] |
|------------------|--------------------|-----------------|
| XC5VLX50T-FF1136 | 15,695             | ≈6.5            |
| XC7K70E-2FBG676  | 21,081             | ≈8.6            |
| XC7K325T-2FBG900 | 21,081             | ≈12.3           |

- □ The decoding is performed on effective PIPs
- The whole PIP coding is generated calculating the configuration memory offset between different CLBs



🗱 Xilinx FPGA Editor - \\vboxsrv\shared\_win7\Xilinx\_Validation\_Projects\test.ncd - [Array1]



#### PIP INT\_X0Y119 SE2END2 -> FAN7

3.693.213
3.693.214
3.693.831
3.693.835

4 configuration memory bits. In order to have the PIP these bits must be fixed at logic value '1'

PIP INT\_X0Y119 EL2BEG0 -> BYP4

3.693.214

3.693.813

3.693.216
3.693.810
4 configuration memory bits.
In order to have the PIP these bits must be fixed at logic value '1'

#### PIP INT\_X0Y119 SE2END2 -> FAN7



4 configuration memory bits.

In order to have the PIP these bits must be fixed at logic value '1'

PIP INT\_X0Y119 EL2BEGO -> BYP4



4 configuration memory bits. In order to have the PIP these bits must be fixed at logic value '1'





### 1-bit controlling multiple PIPs







- Identification of all the architecturally relevant sensitive bits
  - If affected, these configuration memory bits may change the physical structure of the circuit

37

Identification of the configuration memory bits that if affected generate a Single Point of Failure (SPF)

-- Bit Reference 13661788 Location X 60 Y 186 -- TMR Sensitive ID 1 X 60 Y 186 -- Domain 0 - Net 1: net "uAHBUART/count\_reg\_TR0<20>" -- PIP 1: pip INT\_X60Y186 BYP\_BOUNCE5 -> IMUX\_B29 -- Domain 1 - Net 2: net "uAHBUART/count\_reg\_TR1<20>" -- PIP 2: pip INT\_X60Y186 WL2BEG1 -> IMUX\_B26

Calculation of the Application Error Probability



38

39

Calculation of the Application Error Probability



40

#### Calculation of the Application Error Probability



#### Calculation of the Application Error Probability



42

Calculation of the Application Error Probability



Calculation of the Application Error Probability



43

Calculation of the Application Error Probability



44\_

Calculation of the Application Error Probability



45

46

Calculation of the Application Error Probability



47

#### B13 from ITC'99 an interface Meteo sensor

#### B13 circuit characteristics

|      | VHDL      |              | Gate level |          |    | Fault list |    |          |          |
|------|-----------|--------------|------------|----------|----|------------|----|----------|----------|
| Name | #<br>Line | #<br>Process | Тур<br>е   | Gat<br>e | Pi | Ро         | FF | Complete | Collapse |
|      | 200       | -            | std        | 362      | 10 | 10         | 53 | 1,906    | 830      |
| R13  | 296       | 5            | opt        | 317      | 10 | 10         | 53 | 1,694    | 77       |

#### **B13 Test Patterns details**

| Circuit | # Sequences | # Vectors | Fault<br>Coverage % | Fault<br>detected | Fault Total |
|---------|-------------|-----------|---------------------|-------------------|-------------|
| B13     | 5           | 7639      | 81.27               | 1341              | 1650        |

**48** 

Area occupation on Xilinx Virtex-5 LX50T FPGA

| Circuit     | Design<br>Topology     | PLAIN                        | XTMR                      | VP-XTMR                   |  |
|-------------|------------------------|------------------------------|---------------------------|---------------------------|--|
| B13         | Slice FF               | 62/28800 - 1% 147/28800 - 1% |                           | 147/28800 - 1%            |  |
|             | Slice LUTs             | 84/28800 - 1%                | 369/28800 - 1%            | 369/28800 - 1%            |  |
|             | Slices<br>Distribution | 42/7200 - 1%                 | 177/7200 - 2%             | 177/7200 - 2%             |  |
| B13 x<br>30 | Slice FF               | 1590/28800 - <b>5%</b>       | 4770/28800 - <b>16%</b>   | 4770/28800 - <b>16%</b>   |  |
|             | Slice LUTs             | 1830/28800 - <b>6%</b>       | 10,841/28800 - <b>37%</b> | 10,841/28800 - <b>37%</b> |  |
|             | Slice<br>Distribution  | 827/7200 - <b>11%</b>        | 4791/7200 - <b>66%</b>    | 4791/7200 - <b>66%</b>    |  |

- 3 different design topologies have been tested at Los Alamos
  - PLAIN : b13x30 without any type of hardening
  - XTMR: Triple Modular Redundancy version with converge option of outputs pins using Xilinx TMRTool 2.1.76
  - **VP-XTMR**: Hardening version of XTMR with replacement constraints generated by VERI-Place tool.

**B13x30 - PLAIN** 



#### B13x30 – VP-XTMR







with Isolation Design Flow

52



### Test Methodology

Start Impact & **XMD** Process Flash Zynq Start Monitors



**RT: PLAIN - XTMR - VP-XTMR** 



Upset

54

breakeven point



55\_

#### Experimental results – Plain Prediction

b13\_x30 plain



#### Experimental results – XTMR Prediction

B13\_x30\_xtmr



num upset

57

#### Experimental results – ARM-MO

- ARM-MO processor has been tested at PSI
  - □ Available flux of proton: 7.22E6  $[p/(cm^2s)]$
  - Working frequency of 50 Mhz
  - Software: Bubble sort

| Design<br>Version | LUTs[#]      | FFs[#]     | BRAM[#]  |
|-------------------|--------------|------------|----------|
| Plain             | 3563 (12%)   | 961 (3%)   | 4 (6%)   |
| XTMR              | 13,229 (45%) | 2887 (10%) | 12 (20%) |
| XTMR-VP           | 13,229 (45%) | 2887 (10%) | 12 (20%) |

#### Experimental results – ARM Plain prediction

59



#### Experimental results – ARM XTMR Prediction



60

#### Experimental results – ARM overall results



#### Conclusions and future works

#### VERI-Place for Virtex-5LX50T is available online

- Fault injection tests executed
- Radiation test validate VERI-Place
- Specific versions released to some users
- VERI-Place is available for Zynq family
- VERI-Place for Kintex7-X7K325T is available upon request
  - Fault injection is ongoing
  - Radiation test is planned

#### Thank you!

Iuca.sterpone@polito.it