Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 1
New Perspectives for Hearing Aid Hardware Design
Prof. Dr.-Ing. Holger BlumeJun.-Prof. Dr.-Ing. Guillermo Payá Vayá, Dipl.-Ing. Lukas Gerlach, M.Sc. Christopher Seifert
Institut für Mikroelektronische SystemeLeibniz Universität Hannover
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 2
Motivation Basics of power consumption Concept of an application tailored processor architecture Exemplary results for architecture optimization Remaining challenges New processor design project Smart HeaP Summary
Contents
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 3
Hearing aid technology requirements Low processing delay: <10 ms Programmability / flexibility Small form factor (higher user
acceptance)
Motivation: Digital Hearing Aid Systems
μP
Performance
Pow
er C
onsu
mpt
ion DSP
Dedic. HW Arch.Custom
Dedic. HW Arch.Semi-Custom
ASIP
Programmability
Hardwarecost
efficient
Microphone
Digital Signal Processor
Battery
Speaker
Design of ASIPs for Digital Hearing Aid Systems
(1 mW, 1 mm2)
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 4
Different Influencing Factors on Power Consumption
Architecture optimization Custom instructions Co-processor
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 5
Total Power Consumption
ArchitectureDesign Circuit technology
TechnologyCircuit technologySupply voltages
2i i i i i SC DC
i
P σ f C U W P
C
DDU
outUinUn
DDU
1
50,a
""1""1d ""0 ""1500 ,
Glitches
1 f
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 6
Total Power Consumption
ArchitectureDesign Circuit technology
TechnologyCircuit technologySupply voltages
2i i i i i S
iC DCWP σ Pf C U
C
DDU
outUinU n
DDU1
50,a
""1""1d ""0 ""1500 ,
Glitches
1 f
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 7
Low-Power Hearing Aid ASIPs
Real-Time Processing Constraints (tc)
time
Real-Time Processing Constraints (tc)
time
RISC Processor
aA
ca t
Nf
2, UCfP aaadyn
ab AA 2
ab ff 41
2, 2
1 UCfP aabdyn
RISC Processor
+ Custom FUs
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 8
Baseline Architecture
ASIP: Application-Specific Instruction-Set ProcessorInstruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / Cache
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 9
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / Cache
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / Cache
EX1/WB
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / Cache
EX1/WB
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / CacheMAC
EX1/WB
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALU
Data Memory / CacheMAC
Instruction Decoder
Issue 1
Instruction Decoder
Issue 2
EX1/WB
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALUSIMD
Data Memory / CacheMAC
Instruction Decoder
Issue 1
Instruction Decoder
Issue 2
EX1/WB
Baseline Architecture Basic Architecture Parameters
Register File Configuration Memory System Instruction-Set Architecture
Parallelization Techniques Number of Parallel Instructions SIMD / Subword Parallelism
Specialization Techniques Custom Instructions Co-processor Architectures
Compiler / Software Support
ASIP: Application-Specific Instruction-Set ProcessorInstruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALUSIMD
Data Memory / CacheMAC
Instruction Decoder
Issue 1
Instruction Decoder
Issue 2
SFUSFU
EX1/WB
Co-Processor
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 10
Baseline Architecture Reduced 32-bit ISA, 5 pipeline stages 16 KB Instruction Cache and
16 KB Memory Cache Configurable
Caches, bus width, GP register file, MUL, MAC, INT, number of load/store units
Expandable New instruction, register, ports Using TIE language (similar to Verilog)
Area and energy optimization are possible
Xtensa Customizable Processor / Cadence
[www.cadence.com]
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 11
Using TIE language (Tensilica Instruction Extension) Custom instructions, registers and interfaces Can be used in the C program code SIMD-example: 4x 16bit additions
Extension of the Xtensa Processor with hardware units
regfile simd64 64 16 v // 16 x 64bit Registeroperation vec4_add16 {out simd64 sum, in simd64 A, in simd64 B} {} {
wire [15:0] result0 = (A[15: 0] + B[15: 0]);wire [15:0] result1 = (A[31:16] + B[31:16]);wire [15:0] result2 = (A[47:32] + B[47:32]);wire [15:0] result3 = (A[63:48] + B[63:48]);assign sum = {result3, result2, result1, result0}; }
#include <xtensa/tie/vec4_add16.h>simd64 A[VECLEN];simd64 B[VECLEN];simd64 sum[VECLEN];for (i=0; i<VECLEN; i++)
sum[i] = vec4_add16(A[i],B[i]);
vec4_add16.tie
use_vec4_add16.c
Definition of a custom instruction
Using the custom instruction in C
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 12
Configuration Implemented for the HA System Baseline (1-Issue-Slot) Baseline (2-Issue-Slots) Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Xtensa Customizable Processor / Cadence
Exploring specialization
Exploring parallelism and specialization
Exploring parallelism
[www.tensilica.com]
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 13
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er
Customized Configuration:Complex Instruction Extensions
50% FFT 50% IFFT
65% SQRTOverlap+Add Overlap+Add
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 14
Customized Configuration:Complex Instruction Extensions New Register File for SIMD-Op.
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er
65% SQRT
50% FFT 50% IFFT
Overlap+Add Overlap+Add
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 15
New Register File for SIMD-Op. COMPLEX ARITHMETIC Op.
R0 = COMPLEX_ADD(R1,R2) R0 = COMPLEX_MUL(R1,R2) R0 = COMPLEX_CONJ(R1) AR0 = BIT_REVERSE(AR1)
Customized Configuration:Complex Instruction Extensions
R0R1R2R3
ADD ADD
COMPLEX_ADD
A.real B.real A.img B.img
C.real C.img
A.real
B.Real
C.real
A.img
B.img
C.img
Register File (each Register 64-Bits) R0
Register File (each Register 64-Bits)R0R1R2R3
MUL MUL
COMPLEX_MUL
MUL MUL
A.real B.real A.real B.img A.img B.real A.img B.img
SUB ADD
C.real C.img
A.real
B.Real
C.real
A.img
B.img
C.img
R0
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er
65% SQRT
50% FFT 50% IFFT
Overlap+Add Overlap+Add
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 16
New Register File for SIMD-Op. COMPLEX ARITHMETIC Op.
R0 = COMPLEX_ADD(R1,R2) R0 = COMPLEX_MUL(R1,R2) R0 = COMPLEX_CONJ(R1) AR0 = BIT_REVERSE(AR1)
Customized Configuration:Complex Instruction Extensions
R0R1R2R3
ADD ADD
COMPLEX_ADD
A.real B.real A.img B.img
C.real C.img
A.real
B.Real
C.real
A.img
B.img
C.img
Register File (each Register 64-Bits) R0
Register File (each Register 64-Bits)R0R1R2R3
MUL MUL
COMPLEX_MUL
MUL MUL
A.real B.real A.real B.img A.img B.real A.img B.img
SUB ADD
C.real C.img
A.real
B.Real
C.real
A.img
B.img
C.img
R0
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er
x2.2 reduction
x2.2 reduction
65% SQRT
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 17
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er
0
5000
10000
15000
20000
Analysis Filterbank Noise Reduction Amplification Synthesis Filterbank
Tota
l num
ber o
f cy
cles
per
Aud
io
Buff
er SQRT Operations LEADING_ONES(R0) R0 = SQUARE_ROOT(R1) R0 = THRESHOLD(R0,R1,R2)
Customized Configuration:Complex Instruction Extensions
x16 reduction
65% SQRT
Newton-Raphsonmethod for square root computation
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 18
Exemplary Hearing Aid Processing - ASIP Design Space Exploration
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
x1.2
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
x1.1 x1.2
These estimations are done for a 40 nm low power technology process
[Werner; Payá Vayá, Blume, “Case Study: Using the Xtensa LX4 Configurable Processor for Hearing Aid Applications”, ICT.OPEN 2013]
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 19
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
Exemplary Hearing Aid Processing - ASIP Design Space Exploration
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
x3.2
x1.1x1.4
x1.2x1.5
These estimations are done for a 40 nm low power technology process
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 20
Digital Signal Processing Algorithms and Number Systems
Complex numbersReal numbers
Time domain: Frequency domain: filter convolution compression …
FFT …
filter correlation mixer …
Transformations:
Hardware architectures for signal processing should be optimized for both number systems
for performance and efficiency reasons.
Signal Processing Algorithms
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 21
A combination of real- and complex-valued arithmetic in one SIMD multiply-accumulate (MAC) function unit
Flexible switch between operations
Reuse of hardware multipliers for different operations
Real- and Complex-Valued Multiply-Accumulate (MAC) Functional Unit Implementation
Instruction Memory / Cache PC
Instruction Decoder
Register FileNumber of Ports and Registers
Issue 0
IF/DE
DE/RA
RA/EX1
ALUSIMD
Data Memory / CacheCMAC
Instruction Decoder
Issue 1
Instruction Decoder
Issue 2
SFUSFU
EX1/WB[Gerlach, L.; Payá Vayá, G.; Blume, H.: An Area Efficient Real- and Complex-Valued Multiply-Accumulate SIMD Unit for Digital Signal Processors, SiPS2015
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 22
Real- and Complex-Valued Multiply-Accumulate (MAC) Functional Unit Implementation
Processor: Kavuaka 32 Point FFT Cycles Core Area
Real-valued SIMD MAC 570 0.237 mm2
Real- and Complex-valued SIMD MAC and Butterfly Operations
135 (Speedup: 4.22 x) 0.255 mm2 (Overhead: 7%)
Butterfly 8 bit mode
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 23
Real- and Complex-Valued Multiply-Accumulate (MAC) Functional Unit Implementation
FFT Performance of current DSP Architectures
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 24
Example of a Co-Processor CORDIC (COordinate Rotation
DIgital Computer) coprocessor
Advantages: High flexibility High accuracy Fast computation compared to other
approximation algorithms Reduced memory requirement compared to
look-up-table interpolation
Custom Coprocessors for Hearing Aids
ASIP
PE
PE
PE
PE
PE
PE
PE
PE
PE
Co -Processor Architecture
DMA
IM DM
Audio
InterfaceSerial
Interface
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 25
CORDIC (COordinate Rotation DIgital Computer) coprocessor Fast computations of non-linear functions
Overall speed up by hardware CORDIC compared to software CORDIC : Binaural feedback suppression 9,62 X Localization 3 X
Custom Coprocessors for Hearing Aids
Hyperbolic and trigonometricoperations
Sine Cosine Exponential Naturallogarithm
Square root
Cycles
KAVUAKA+CORDIC (HW)KAVUAKA (SW)
TI TMS320C6478
71621
1259
71621
1523
76668
1529
56664
1134
59649
341
100%10%
140%
[Gerlach, L.; Nolting, S.; Blume, H.; Payá Vayá, G.; Stolberg, H.; Reuter, C. ] A Highly Optimized Arithmetic Software Library and Hardware Co-processor IP forFixed-Point VLIW-SIMD Processor Architectures, Technology Transfer in Computing Systems (TETRACOM Technology Transfer Project (TTP), 2016)
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 26
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
KAVUAKA
Esti
mat
ed S
ilico
n A
rea
(mm
2 )
Customized Operations
Customized RF
Core
Memory
Exemplary Hearing Aid Processing - ASIP Design Space Exploration
These estimations are done for a 40 nm low power technology process
x0.5
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60 70
Esti
mat
ed T
otal
Pow
er
Cons
umpt
ion
(mW
)
Clock Frequency (MHz)
Baseline (1-Issue-Slot)
Baseline (2-Issue-Slots)
Baseline (3-Issue-Slots)
Customized (1-Issue-Slot)
Customized (2-Issue-Slots)
KAVUAKA
x0.26
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 27
65 nm TSMC mixed signal process peak power consumption: 1-2 mW average power consumption: less than 1 mW
State-of-the-art Hearing Aid Systems
ON Semiconductor® Ezairo® 7100 http://www.onsemi.com
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 28
0.13μm SMIC 1P8M CMOS technology Average power consumption: 1.1mW 9.3mm2 (3.1mm x 3mm)
State-of-the-art Hearing Aid Systems
C. Chen et al., "A 1V, 1.1mW mixed-signal hearing aid SoC in 0.13μm CMOS process," 2016 IEEE International Symposium on Circuits and Systems
Dynamic range compression (WDRC) Noise reduction (NR) Feedback cancellation (FDC)
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 29
TSMC 65nm GP technology Average power consumption: 1.3 mW 3.61 mm2 (1.9mm x 1.9mm)
State-of-the-art Hearing Aid Systems
K. C. Chang, Y. W. Chen, Y. T. Kuo and C. W. Liu, "A low power hearing aid computing platform using lightweight processing elements," 2012 IEEE International Symposium on Circuits and System
Auditory compensation Noise reduction Feedback cancellation
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 30
Remaining Challenges and Countermeasures
Leakage currents Flexibility and Programmability Verification and Testability
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 31
Total Power Consumption
ArchitectureDesign Circuit technology
TechnologyCircuit technologySupply voltages
2i i i ii DC
iSCWf C UP σ P
C
DDU
outUinUn
DDU
1
0,5a
1""1" "d 0" " 1""0 0,5
Glitches
1 f
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 32
As the technology sizes decrease, the leakage currents increase exponentially.
There is a strong dependency on the process / technology.
Leakage Currents
[Veendrick 2007]
Research
In production
I off[A
/μm
]
Gate length [nm]10010 1000
1,0E -04
1,0E -06
1,0E -14
1,0E -10
1,0E -08
1,0E -12
Intel 20 nm Transistor
Intel 30 nm Transistor
Intel 15 nmTransistor
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 33
CMOS Roadmap
AVAILABLE NOW
IN DEVELOPMENT
MarketsServers, high performance computing and graphics, high-end smartphone, core networking
Premium Tier
FeaturesHigh-performance, balanced-cost
MarketsLow & mid-end smartphones, wireless, IoT, autonomous vehicles, mobile camera
Volume Tier
FeaturesLow-power, cost-effective performance, RF, embedded memory
Wireless,Battery-powered Computing
High-performanceComputing
7nmFinFET
14nmFinFET
28nm
22FDX®
40/55nm
12FDXTM
© 2017 GLOBALFOUNDRIES
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 34
FinFET & FD-SOI – Same Idea, but different Implementation
Bulk CMOS
Bulk CMOS
LowestCost
FinFET
High PerformanceHigh Density
Ultra-thin Buried Oxide Insulator
Fully Depleted Horizontal Channel
Fully Depleted Vertical Fin
Cost effectiveEmbedded
© 2017 GLOBALFOUNDRIES
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 35
22FDX® – Ultra-low Voltage
Demonstrated 0.4 Volt Vmin capability for ultra-low power designs
Minimum Switching Energy operating point at around 0.4 Volt 1)
• As Vdd decreases, dynamic power and frequency also goes down• Leakage power also reduces as Vdd drops• PVT Variations are more significant at low Vdd and can be compensated for by back-
gate bias.
0,0
0,2
0,4
0,6
0,8
1,0
1,2
1,4
0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0
Ener
gy (
norm
)
Vdd (V)
switching energyLeaakge energytotal energy
SLVT, FBB=0.8V
Median:0.64V
Median:0.40V
>200mV
28nm Poly/SiON 22FDX
Logic Vmin
0.4v
0.3v
0.6v
0.5v
0.7v
0.8v
1) Jani Mäkipää, Olivier Billoint; „FDSOI versus BULK CMOS at 28nm node. Which Technology for Ultra-Low Power Design?“ 978-1-4673-5762-3/13/$31.00 ©2013 IEEE
© 2017 GLOBALFOUNDRIES
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 36
FDSOI – Development and Production Facility FAB1 in Dresden
22FDX™ now in production -12FDX™ In development at FAB1 and production beginning 2019
Production facility FAB1 in Dresden for 28nm, 22nm and 12nm (planned) 3500 Employees, Europe‘s largest modern semiconductor company Development facility for 22nm and 12nm 12 Billion $ cumulative investments since 1996 (first AMD, later GLOBALFOUNDRIES)
1.5 Bill.€ planned investment for capacity extension to 1 Mio wafers/year
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 37
Verification and Testability of New Architectures
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 38
Verification and Testability - In-Circuit Emulation
Verification in an early design phase
Tape-out summer 2017 (TSMC 40nm LP Tech.)
Binaural Localization
[Seifert, C.; Thiemann, J.; Gerlach, L.; Volkmar, T.; Payá-Vayá, G.; Blume, H.; van de Par, S.: Real-Time Implementation of a GMM-Based Binaural Localization Algorithm on a VLIW-SIMD Processor (accepted), ICME 2017]
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 39
In-circuit Emulation of Kavuaka
Stereo Codec
Memory
4 inputchannels
ASIC Socket
Stereo Codec
4 outputchannels
Bluetooth
Audio dataConfiguration/instruction
dataDebugging / parameter data
USB
Battery
Power Management
Replacement of future ASIC by in-circuit emulation of Kavuaka
Allows testing and verification with future periphery
4 inputand 4 output
channels
BT
USBPower
Debug & Measurement
Audio
I2C
I2S
KavuakaASIP UART
FPGA
Batt.Testsocket &
Memory
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 40
New Research Project Addressing the Aforementioned Challenges
Smart HeaP Smart HeaP
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 41
System-on-Chip as Platform for further Innovation
ASIPTensilica HiFi
A/D
CPU
A/D
Bluetooth
SmartHeaP ASIC
0.4V 2.5V
Microphones Loudspeakers
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 42
New Project: Smart HeaP
AudiologyAlgorithms
Architectures
Hearing Aids
Technology
ASIP-Framework
Project management / SoC Design
Smart HeaP
Smart HeaP
Prof. Dr.-Ing. Holger Blume, 06.06.2017 Seite 43
Different influencing factors on power consumption Architecture optimization Custom instructions Co-processor
Remaining challenges and countermeasures Leakage currents Flexibility and Programmability Verification and Testability
All of these issues will be addressed in Smart HeaP
Summary