-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathbenchmarks.html
327 lines (317 loc) · 18.4 KB
/
benchmarks.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Benchmarks — Par4All 1.4.6 documentation</title>
<link rel="stylesheet" href="_static/default.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '1.4.6',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<link rel="shortcut icon" href="_static/logo-64x64FW.png"/>
<link rel="top" title="Par4All 1.4.6 documentation" href="index.html" />
<link rel="next" title="Documentation" href="documentation.html" />
<link rel="prev" title="Features" href="features.html" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="documentation.html" title="Documentation"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="features.html" title="Features"
accesskey="P">previous</a> |</li>
<li><a href="index.html">Par4All 1.4.6 documentation</a> »</li>
</ul>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="benchmarks">
<h1>Benchmarks<a class="headerlink" href="#benchmarks" title="Permalink to this headline">¶</a></h1>
<div class="sidebar">
<p class="first sidebar-title">A video demonstration of Par4All</p>
<div class="last"><iframe width="425" height="344"
src="http://www.youtube.com/embed/EP8mq0zh4gA?wmode=transparent"
frameborder="0"
allowfullscreen>
</iframe></div></div>
<p>We use Par4All to parallelize programs for customers and collaborative
research projects but we have also used Par4All on some benchmarks. Here
are a few results that we are allowed to communicate.</p>
<div class="section" id="stars-pm">
<h2>Stars-PM<a class="headerlink" href="#stars-pm" title="Permalink to this headline">¶</a></h2>
<p>Stars-PM is a Particle-Mesh N-body cosmological simulation in C code from
the <a class="reference external" href="http://astro.u-strasbg.fr">Observatoire Astronomique de Strasbourg</a>
and uses 3D FFT among other things.</p>
<p>The goal is to model gravitational interactions between particles in
space. The 3D resolution of these interactions is obtained by
interpolation on space discretization.</p>
<a class="reference internal image-reference" href="_images/Stars-PM-WWW-algo.png"><img alt="_images/Stars-PM-WWW-algo.png" src="_images/Stars-PM-WWW-algo.png" style="width: 50%;" /></a>
<p>We use Par4All automatic transformations process to on Stars-PM. The
sequential version was written in C and has been the object of a study to
manually optimize and adapt on GPU.</p>
<a class="reference internal image-reference" href="_images/Stars-PM-WWW-perf.png"><img alt="_images/Stars-PM-WWW-perf.png" src="_images/Stars-PM-WWW-perf.png" style="width: 50%;" /></a>
</div>
<div class="section" id="static-compilation-analysis-for-host-accelerator-communication-optimization">
<h2>Static Compilation Analysis for Host-Accelerator Communication Optimization<a class="headerlink" href="#static-compilation-analysis-for-host-accelerator-communication-optimization" title="Permalink to this headline">¶</a></h2>
<p>This graph shows our results on 20 benchmarks from Polybench suite, 3 from
Rodinia, and the application Stars-PM. Measurements were performed on a
machine with 2 Xeon Westmere X5670 (12 cores at 2.93 GHz) and a nvidia
GPU Tesla C2050.</p>
<a class="reference internal image-reference" href="_images/speedup-part1.jpg"><img alt="_images/speedup-part1.jpg" src="_images/speedup-part1.jpg" style="width: 100%;" /></a>
<p>The OpenMP versions used for the experiments are generated automatically
by the parallelizer and are not manually optimized.</p>
<a class="reference internal image-reference" href="_images/speedup-part2.jpg"><img alt="_images/speedup-part2.jpg" src="_images/speedup-part2.jpg" style="width: 100%;" /></a>
<p>For more details:</p>
<ul class="simple">
<li>refer to the <cite>publication « Static Compilation Analysis for
Host-Accelerator Communication Optimization. »</cite></li>
<li>Polybench : <a class="reference external" href="http://sourceforge.net/projects/polybench">http://sourceforge.net/projects/polybench</a></li>
<li>Rodinia : <a class="reference external" href="http://lava.cs.virginia.edu/wiki/rodinia">http://lava.cs.virginia.edu/wiki/rodinia</a></li>
</ul>
</div>
<div class="section" id="hyantes">
<h2>Hyantes<a class="headerlink" href="#hyantes" title="Permalink to this headline">¶</a></h2>
<p>Hyantes is a library to compute neighbourhood population potential with
scale control. It is developed by the MESCAL team from the Laboratoire
Informatique de Grenoble (France), as a part of HyperCarte project. The
HyperCarte project aims to develop new methods for the cartographic
representation of human distributions (population density, population
increase, etc.) with various smoothing functions and opportunities for
time-scale animations of maps. Hyantes provides one of the smoothing
methods related to multiscalar neighbourhood density estimation. It is a C
library that takes sets of geographic data as inputs and computes a
smoothed representation of this data taking account of neighbourhood’s
influence.</p>
<p>For more information: <a class="reference external" href="http://hyantes.gforge.inria.fr">http://hyantes.gforge.inria.fr</a></p>
<div class="section" id="results">
<h3>Results<a class="headerlink" href="#results" title="Permalink to this headline">¶</a></h3>
<p>We measure the wall-clock time that includes startup time, data load time
and output write time, that is the real time understood by users. By
measuring kernel time only, speed-up would be better but less
representative of the real application (Amdahl…).</p>
<p>On a Wild Node with 2 Intel Xeon X5670 @ 2.93GHz (12 cores) and a Tesla
C2050 (Fermi), Linux/Ubuntu 10.04, gcc 4.4.3, CUDA 3.1, we measure in
double precision:</p>
<ul class="simple">
<li>Sequential execution time on CPU: 30.355s</li>
<li>OpenMP parallel execution time on CPUs: 3.859s, <strong>speed-up: 7.87</strong></li>
<li>CUDA parallel execution time on GPU: 0.441s, <strong>speed-up: 68.8</strong></li>
</ul>
<p>To test it by yourself on the main computational part, go to the
examples/P4A/Hyantes directory of Par4All or look in the <tt class="docutils literal"><span class="pre">git</span></tt>
repository
<a class="reference external" href="https://github.com/Par4All/par4all/tree/p4a/examples/P4A/Hyantes">https://github.com/Par4All/par4all/tree/p4a/examples/P4A/Hyantes</a></p>
<p>For mobile users, it is interesting to show result figures on a laptop:
for the single precision (compiled with make USE_FLOAT=1) on a HP
EliteBook 8730w laptop (with an Intel Core2 Extreme Q9300 @ 2.53GHz (4
cores) and a nVidia GPU Quadro FX 3700M, 16 multiprocessors, 128 cores,
architecture 1.1) with Linux/Debian/sid, gcc 4.4.3, CUDA 3.1:</p>
<ul class="simple">
<li>Sequential execution time on CPU: 38s</li>
<li>OpenMP parallel execution time on CPUs: 18.9s, <strong>speed-up: 2.01</strong></li>
<li>CUDA parallel execution time on GPU: 1.57s, <strong>speed-up: 24.2</strong></li>
</ul>
</div>
</div>
<div class="section" id="hologram-simulation">
<h2>Hologram simulation<a class="headerlink" href="#hologram-simulation" title="Permalink to this headline">¶</a></h2>
<p>Fresnel is a C program from the <a class="reference external" href="http://www.holotetrix.com">Holotetrix</a>
company that simulates Fresnel diffraction in holograms before
manufacturing.</p>
<p>The application has been tested on various architectures:</p>
<table border="1" class="docutils">
<colgroup>
<col width="86%" />
<col width="14%" />
</colgroup>
<thead valign="bottom">
<tr class="row-odd"><th class="head">Architecture</th>
<th class="head">Cores</th>
</tr>
</thead>
<tbody valign="top">
<tr class="row-even"><td>Intel Core 2 duo E6600 2.4 GHz</td>
<td>2</td>
</tr>
<tr class="row-odd"><td>Intel Core 2 duo T9400 2.5 GHz</td>
<td>2</td>
</tr>
<tr class="row-even"><td>2 Intel Core 2 X5472 3 GHz</td>
<td>8</td>
</tr>
<tr class="row-odd"><td>2 Intel Nehalem X5570 2.9 GHz</td>
<td>8</td>
</tr>
<tr class="row-even"><td>nVidia GTX 200</td>
<td>192</td>
</tr>
<tr class="row-odd"><td>nVidia Tesla C1060</td>
<td>240</td>
</tr>
<tr class="row-even"><td>nVidia Quadro FX 3700M (G92GL)</td>
<td>128</td>
</tr>
</tbody>
</table>
<p>The speed-up are normalized with 1 E6600 core as reference and the
computations are done for different hologram sizes for the following
results:</p>
<div class="figure" style="width: 100%">
<img alt="_images/performances_float-fresnel-200dpi.png" src="_images/performances_float-fresnel-200dpi.png" />
<p class="caption">Performance of Holotetrix Fresnel program on <em>x</em>86 and CUDA GPU</p>
</div>
</div>
<div class="section" id="spec-cpu2006-410-bwaves">
<h2>SPEC CPU2006 410.bwaves<a class="headerlink" href="#spec-cpu2006-410-bwaves" title="Permalink to this headline">¶</a></h2>
<p>bwaves is a computational fluid dynamics Fortran program that simulates
blast waves in three dimensional transonic transient laminar viscous flow.</p>
<p>More information on the program itself on
<a class="reference external" href="http://www.spec.org/cpu2006/Docs/410.bwaves.html">http://www.spec.org/cpu2006/Docs/410.bwaves.html</a></p>
<p>On a Wild Node, we get a <strong>speed-up of 4.5</strong> with a 2 Intel Xeon X5670 @
2.93GHz (12 cores).</p>
</div>
<div class="section" id="matrix-multiplication">
<h2>Matrix multiplication<a class="headerlink" href="#matrix-multiplication" title="Permalink to this headline">¶</a></h2>
<p>The classical Hello World in Fortran can be found in
<tt class="docutils literal"><span class="pre">examples/F77_matmul_OpenMP</span></tt> directory of Par4All or look in the <tt class="docutils literal"><span class="pre">git</span></tt>
repository
<a class="reference external" href="https://github.com/Par4All/par4all/tree/p4a/examples/F77_matmul_OpenMP">https://github.com/Par4All/par4all/tree/p4a/examples/F77_matmul_OpenMP</a> so
you can try by yourself.</p>
<p>On a Wild Node, we get a <strong>speed-up of 12.1</strong> (thanks to cache effects)
with a 2 Intel Xeon X5670 @ 2.93GHz (12 cores).</p>
</div>
</div>
</div>
</div>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<p class="logo"><a href="index.html">
<img class="logo" src="_static/banniere+tagline.png" alt="Logo"/>
</a></p>
<h3><a href="index.html">Table Of Contents</a></h3>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="download.html">Download - Par4All</a><ul>
<li class="toctree-l2"><a class="reference internal" href="download.html#binary-distribution">Binary distribution</a></li>
<li class="toctree-l2"><a class="reference internal" href="download.html#previous-releases">Previous releases</a></li>
<li class="toctree-l2"><a class="reference internal" href="download.html#installing-from-the-sources">Installing from the sources</a></li>
<li class="toctree-l2"><a class="reference internal" href="download.html#distributed-version-control-system">Distributed version control system</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="features.html">Features</a><ul>
<li class="toctree-l2"><a class="reference internal" href="features.html#par4all-current-version">Par4All current version</a></li>
<li class="toctree-l2"><a class="reference internal" href="features.html#what-is-going-on">What is going on?</a></li>
<li class="toctree-l2"><a class="reference internal" href="features.html#roadmap">Roadmap</a></li>
<li class="toctree-l2"><a class="reference internal" href="features.html#internals">Internals</a></li>
</ul>
</li>
<li class="toctree-l1 current"><a class="current reference internal" href="">Benchmarks</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#stars-pm">Stars-PM</a></li>
<li class="toctree-l2"><a class="reference internal" href="#static-compilation-analysis-for-host-accelerator-communication-optimization">Static Compilation Analysis for Host-Accelerator Communication Optimization</a></li>
<li class="toctree-l2"><a class="reference internal" href="#hyantes">Hyantes</a></li>
<li class="toctree-l2"><a class="reference internal" href="#hologram-simulation">Hologram simulation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#spec-cpu2006-410-bwaves">SPEC CPU2006 410.bwaves</a></li>
<li class="toctree-l2"><a class="reference internal" href="#matrix-multiplication">Matrix multiplication</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="documentation.html">Documentation</a><ul>
<li class="toctree-l2"><a class="reference internal" href="documentation.html#users-guide">Users guide</a></li>
<li class="toctree-l2"><a class="reference internal" href="documentation.html#developers-guide">Developers Guide</a></li>
<li class="toctree-l2"><a class="reference internal" href="documentation.html#publications">Publications</a></li>
<li class="toctree-l2"><a class="reference internal" href="documentation.html#presentations">Presentations</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="community.html">Community</a><ul>
<li class="toctree-l2"><a class="reference internal" href="community.html#members">Members</a></li>
<li class="toctree-l2"><a class="reference internal" href="community.html#support">Support</a></li>
<li class="toctree-l2"><a class="reference internal" href="community.html#research-collaborative-projects">Research Collaborative Projects</a></li>
<li class="toctree-l2"><a class="reference internal" href="community.html#background">Background</a></li>
<li class="toctree-l2"><a class="reference internal" href="community.html#jobs">Jobs</a></li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="news_and_events.html">News and events</a><ul>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#hpc-project-launches-the-1-4-version-of-par4all">2012/05/17: HPC Project launches the 1.4 version of Par4All</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#gtc-2012">2012/05/14: GTC 2012</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#meetup-hpc-gpu-in-paris">2012/01/25: Meetup HPC GPU in Paris</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#wild-cruncher-breakfast-february-14th">2012/01/13: Wild Cruncher Breakfast February 14th</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#par4all-1-2-is-out">2011/08/07: Par4All 1.2 is out</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#hpc-project-announces-wild-cruncher">2011/06/14: HPC Project announces Wild Cruncher</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#teratec-2011">2011/05/20: Teratec 2011</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#open-gpu-8th-of-june-2011">2011/05/19: Open GPU 8th of June 2011</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#hpc-project-launches-v1-1-par4all">2011/03/18: HPC Project launches v1.1 Par4All</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#video-showing-par4all-features">2010/12/14: Video showing Par4All features</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#gtc-2010">2010/12/06: GTC 2010</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#pips-tutorial-at-bangalore-dec">2010/12/03: PIPS Tutorial at Bangalore déc</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#teratec-2010">2010/12/03: Teratec 2010</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#open-gpu-project">2010/12/03: Open GPU Project</a></li>
<li class="toctree-l2"><a class="reference internal" href="news_and_events.html#nvidia-gtc-2009">2010/12/03: Nvidia GTC 2009</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="features.html"
title="previous chapter">Features</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="documentation.html"
title="next chapter">Documentation</a></p>
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="_sources/benchmarks.txt"
rel="nofollow">Show Source</a></li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="documentation.html" title="Documentation"
>next</a> |</li>
<li class="right" >
<a href="features.html" title="Features"
>previous</a> |</li>
<li><a href="index.html">Par4All 1.4.6 documentation</a> »</li>
</ul>
</div>
<div class="footer">
© Copyright 2014, The Par4All team.
Created using <a href="http://sphinx-doc.org/">Sphinx</a> 1.2.3.
</div>
</body>
</html>