Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to reproduce timings #39

Open
serge-sans-paille opened this issue Feb 23, 2015 · 6 comments
Open

Fail to reproduce timings #39

serge-sans-paille opened this issue Feb 23, 2015 · 6 comments

Comments

@serge-sans-paille
Copy link

I installed hope from the git and run the following:

import numpy as np
import hope
@hope.jit
def pdf(density, dims, center, w2D, r50, b, a):
    for x in range(dims[0]):
        for y in range(dims[1]):
            dr = np.sqrt((x - center[0]) ** 2 + (y - center[1]) ** 2)
            density[x, y] = np.sum(w2D * 2 * (b - 1) / (2 * np.pi * (r50 * a)**2) * (1 + (dr / (r50 * a))**2)**(-b))
    return density

with:

 python -m timeit -s 'import numpy as np; b = 3.5; a = 1. / np.sqrt(2. ** (1. / (b - 1.)) - 1.) ; r50=20;center = np.array([10.141, 10.414]);dims = np.array([20, 20]) ; x1D = np.array([ 0.5 - 0.9491079123427585245262 / 2 , 0.5 - 0.7415311855993944398639 / 2 , 0.5 - 0.4058451513773971669066 / 2 , 0.5 , 0.5 + 0.4058451513773971669066 / 2 , 0.5 + 0.7415311855993944398639 / 2 , 0.5 + 0.9491079123427585245262 / 2 ], dtype=np.float32) ; w1D = np.array([ 0.1294849661688696932706 / 2 , 0.2797053914892766679015 / 2 , 0.38183005050511894495 / 2 , 0.4179591836734693877551 / 2 , 0.38183005050511894495 / 2 , 0.2797053914892766679015 / 2 , 0.1294849661688696932706 / 2 ], dtype=np.float32) ; w2D = np.outer(w1D, w1D) ; from pdf import pdf; density = np.zeros(dims, dtype=np.float32)' 'pdf(density, dims, center, w2D, r50, b, a)'

and the output is rather slow compared to the expected result. C++ module runs at the expected speed, so what did I do wrong?

@cosmo-ethz
Copy link
Collaborator

@serge-sans-paille i've copy-pasted your code an got:
10000 loops, best of 3: 103 usec per loop

when I compile the c++ code that we provide in the benchmarks and then measure the timing (using pdf = __import__("pdf", globals(), locals(), [], -1).run) I get:
10000 loops, best of 3: 55.1 usec per loop

This factor of 2 is expected.

What is your OS and compiler?

@serge-sans-paille
Copy link
Author

OS: linux/debian/testing
compiler: c++ --version
g++-4.9.real (Debian 4.9.1-19) 4.9.1

@cosmo-ethz
Copy link
Collaborator

Admittedly, I have little experience with this combination (HOPE on debian & g++4.9).

What are the timings you get for the C++ and the jitted PDF code?

What are the compile flags you’ve used to compile the C++ code and what is HOPE using (add import hope; hope.config.verbose = True; in the call)

@serge-sans-paille
Copy link
Author

pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
    for x.l in (0.J:dims.l[0.J]) {
        for y.l in (0.J:dims.l[1.J]) {
            new dr.d
            dr.d = numpy.sqrt((((x.l - center.d[0.J]) ** 2.J) + ((y.l - center.d[1.J]) ** 2.J)))
            new __sum0.d
            __sum0.d = numpy.sum(((((w2D.f[:w2D@0,:w2D@1] * 2.J) * (b.D - 1.J)) / ((2.J * 3.141592653589793.D) * ((r50.J * a.d) ** 2.J))) * ((1.J + ((dr.d / (r50.J * a.d)) ** 2.J)) ** -b.D)))
            density.f[x.l, y.l] = __sum0.d
        }
    }
    return density.f[:density@0,:density@1]

Compiling following functions:
pdf(float32^2 density, int64^1 dims, float64^1 center, float32^2 w2D, int64 r50, float64 b, float64 a)
running build_ext
building 'pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0' extension
C compiler: x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -fno-strict-aliasing -g -O2 -fPIC

compile options: '-I/usr/lib/python2.7/dist-packages/numpy/core/include -I/usr/include/python2.7 -c'
extra options: '-Wall -Wno-unused-variable -std=c++11'
x86_64-linux-gnu-gcc: /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.cpp
c++ -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -Wl,-z,relro -g -O2 /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.o -o /tmp/hope2H9vzI/pdf_e5a322486ef195fd39b6f44cfb7c3a45dd00790dd0bc04544bbea1ae_0.so

10 loops, best of 3: 1.41 msec per loop

@serge-sans-paille
Copy link
Author

and 1.32ms when compiling with clang

@cosmo-ethz
Copy link
Collaborator

@serge-sans-paille I was able to reproduce the behavior you see on an Ubuntu box. It seems like that the other benchmarks are doing alright and only the star-psf benchmark is causing some issues.

As expected, the code that HOPE generates is identical on OSX and Ubuntu. This makes me assume that the compilers on Linux might struggle to optimize the code as much as clang on OSX.
This isn’t very satisfying but I don’t have better explanation at the moment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants