OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
1/22
A. Saà-Garriga, D. Castells-Rufas and J. [email protected]
Centre d’Intel·ligència Ambiental I Accessibilitat de Catalunya (CAIAC)
Universitat Autònoma de Barcelona. UAB
21/01/2014
OMP2HMPP: HMPP Source Code Generation fromPrograms with Pragma Extensions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
2/22
11 Introduction
22 OMP2HMPP Compiler
33 Results
44 Conclusions
Intro Compiler Results Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
3/22
11 Introduction
22 OMP2HMPP Compiler
33 Results
44 Conclusions
Intro Compiler Results Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
4/22
GPGPUS and Embedded Systems
One of the main integrated blocks on heterogeneous platforms
Mali GPUs (embedded systems)
NVIDIA GPUs in first 10 machines of Green Top 500 (Nov, 2013)
GPGPUs are potentially useful for speed up applications Both classical HPC and EHPC
Complex and error-prone due to the programming complexity and language paradigms
Intro Compiler Results Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
5/22
Actual Programming Workflow
Intro Compiler Results Conclusions
New Proposals Learning
Source Code
AdaptationVersion
Evaluation
• New language• Language extensions
•Language syntax•Programing paradigms
GPGPUs programming could become a hurdle that can limit their adoption, since the programmer has to learn the hardware capabilities and the language to work with these.
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
6/22
Programming Alternatives
Intro Compiler Results Conclusions
Directive Based Languages
New Languages
OpenACC[2] HMPP[3]
Language Extensions OpenMPC[4] hiCUDA[5]
Direct Transformations
Par4All[6]
Hide GPU complexity No automatic transfer optimization New list of directives
Hide GPU Complexity New Language
Hide GPU complexity No intermediate language No data transfer optimization Just C source code transformation
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
7/22
Proposed Programming Workflow
Intro Compiler Results Conclusions
OMP2HMPP
Hide GPU complexity
Just one new directive
Uses HPC standard as input
C/C++
New Proposals Learning
Source Code
AdaptationVersion
Evaluation
• New language• Language extensions
•Language syntax•Programing paradigms
OpenMP OMP2HMPP HMPP
• Mercurium Infrastucture.[J. Balart et al. EWOMP 2004]
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
8/22Intro Compiler Results Conclusions
11 Introduction
22 OMP2HMPP Compiler
33 Results
44 Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
9/22
Generate HMPP Directives
Intro Compiler Results Conclusions
Callsite
Codelet
Group
Advanced Load
Delegate Store
Syncronize
…
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
10/22
Generate HMPP Directives
Intro Compiler Results Conclusions
OpenMP block Outlining
#pragma hmpp outlined_block codeletvoid outlined_block(int i, int A[10], int C[10]) { for(i=...) { ... C[i]=A[i]*k; ... }}
int main(){ ... A[x]=v;#pragma hmpp outlined_block callsite outlined_block(i,A,C); ... A[j]=C[j];}
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
11/22
Contextual Information
Intro Compiler Results Conclusions
For each of the variables used inside an OpenMP block to transform OMP2HMPP analyze the Abstract Syntax Tree to identify:
The next/last access (read/write)
Where is computed (CPU/GPU) this access
If an operation is made inside a loop and identify this one.
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
12/22
Contextual information
Intro Compiler Results Conclusions
Data Transfer Optimitzation
Advanced Load
Delegate Store
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
13/22
Use of Contextual Information
Intro Compiler Results Conclusions
Data Transfer Optimitzation (Loops)
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
14/22
Use of Contextual Information
Intro Compiler Results Conclusions
Data Transfer Optimitzation (Loops)
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
15/22Intro Compiler Results Conclusions
11 Introduction
22 OMP2HMPP Compiler
33 Results
44 Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
16/22
Source Code Example
Intro Compiler Results Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
17/22
Experimental Results
Intro Compiler Results Conclusions
B505(1) B505(2) B515Num.
Processors2 2 2
Processor E5640 E5640 E5-2400Memory 24Gb 24Gb 192Gb
GPU NVIDIA Tesla M2050
NVIDIATesla C2075
NVIDIATesla K20
Tested Architectures
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
18/22
Experimental Results
Intro Compiler Results Conclusions
B505(1)
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
19/22
Experimental Results
Intro Compiler Results Conclusions
B505(2)
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
20/22
Experimental Results
Intro Compiler Results Conclusions
B515
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
21/22Intro Compiler Results Conclusions
11 Introduction
22 OMP2HMPP Compiler
33 Results
4 4 Conclusions
OMP2HMPP
A. Saà-Garriga et al., CAIAC (UAB) HIP3ES
22/22
Conclusions
The programmer avoid to expend time in learning.
Tested set of problems from Polybench[8] obtains an average speedup of 113x compared to sequential.
An average speedup over 31x compared to OpenMP.
OMP2HMPP gives a solution that rarely differ from the best HMPP hand-coded version.
OMP2HMPP establish a GPU parallel code reference point for expert developers that wants to refine the parallelization.
…thanks for your attention!
Intro Compiler Results Conclusions