Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU platform #15

Open
zonca opened this issue Nov 22, 2017 · 14 comments
Open

CPU platform #15

zonca opened this issue Nov 22, 2017 · 14 comments

Comments

@zonca
Copy link
Contributor

zonca commented Nov 22, 2017

do you have any plan to add the CPU platform to this example plugin?

@peastman
Copy link
Member

CpuPlatform extends ReferencePlatform, so the same kernel works on both Reference and CPU. See http://docs.openmm.org/latest/developerguide/developer.html#the-cpu-plaform. If you're writing a kernel that you expect to be performance critical, you can include separate implementations for the two platforms. Otherwise, there's no need.

@zonca
Copy link
Contributor Author

zonca commented Nov 22, 2017

Thanks @peastman, it would be nice to have an example on how to setup a ThreadPool just for computing all the pair of interactions in parallel. I'm trying to learn from CPUNonbondedForce and it is tough.

@peastman
Copy link
Member

You might take a look at CpuCustomNonbondedForce, since it's a bit simpler. CpuNonbondedForce uses vector instructions to compute 4 or 8 interactions at once, which makes the code complicated. CpuCustomNonbondedForce computes interactions individually, so the code is simpler (but also slower).

@peastman
Copy link
Member

By the way, if you just want an example of using ThreadPool to parallelize a straightforward calculation, there are lots of those. For example, see https://github.com/pandegroup/openmm/blob/master/platforms/cpu/src/CpuBondForce.cpp#L172-L176.

@zonca
Copy link
Contributor Author

zonca commented Nov 22, 2017

that is exactly what I want to do, but I don't understand where I create and potentially initialize the threads variable

@peastman
Copy link
Member

You can just create a thread pool as

ThreadPool threads;

To avoid the overhead of constantly creating and destroying threads, though, the CPU platform has a shared thread pool for use by kernels. You can access it as data.threads, where data is the CpuPlatform::PlatformData. You can get it from the ContextImpl by calling CpuPlatform::getPlatformData(context), though in practice the kernel factory just passes it to the constructors of all the kernels (see https://github.com/pandegroup/openmm/blob/master/platforms/cpu/src/CpuKernelFactory.cpp).

@zonca
Copy link
Contributor Author

zonca commented Nov 22, 2017

thanks, I think I got it, I'll try to implement it. Though it would be nice to have an example of this in the example plugin.

@zonca
Copy link
Contributor Author

zonca commented Dec 13, 2017

thanks, in my plugin I have 5 kernels but I'd like only to parallelize 2,

@peastman
Copy link
Member

Just to be clear: there's nothing stopping you from using multiple threads in your reference implementation. We generally don't, because we want to keep the reference implementation as simple as possible, but you can do whatever you want.

If you want to have a different implementation for the CPU platform, just register your kernels with that platform.

do you have an example of CPUKernelFactory that only implements a subset of kernels?

https://github.com/pandegroup/openmm/blob/master/platforms/cpu/src/CpuKernelFactory.cpp

@zonca
Copy link
Contributor Author

zonca commented Dec 14, 2017

I'd like to keep the same conventions of OpenMM and have a separate CPU platform if I understand how to implement it.

From the guide I don't understand if the registerKernelFactories below is needed:

  • only if I just want to use the Reference code for both Reference and CPU platform
  • or also when I want to reimplement some kernels
extern "C" void registerKernelFactories() {
    for (int i = 0; i < Platform::getNumPlatforms(); i++) {
        Platform& platform = Platform::getPlatform(i);
        if (dynamic_cast<ReferencePlatform*>(&platform) != NULL) {
            // Create and register your KernelFactory.
        }
    }
}

Moreover does this go in the Reference Kernel Factory or the CPU Kernel Factory?

Thanks!

@peastman
Copy link
Member

That code works because CpuPlatform is a subclass of ReferencePlatform, so the cast succeeds for either one. If you want to use different implementations for the two platforms, you'll register different kernel factories for them. Here's an example from CudaRpmdKernelFactory.cpp:

extern "C" OPENMM_EXPORT void registerKernelFactories() {
    try {
        Platform& platform = Platform::getPlatformByName("CUDA");
        CudaRpmdKernelFactory* factory = new CudaRpmdKernelFactory();
        platform.registerKernelFactory(IntegrateRPMDStepKernel::Name(), factory);
    }
    catch (std::exception ex) {
        // Ignore
    }
}

It looks up the specific platform it wants by name, then registers a new kernel factory with it. You'll want to do the same thing.

This also assumes the reference and CPU platforms are contained in separate libraries. (That's what the example plugin does. Each platform is a separate library.) Each library can only have a single registerKernelFactories() function. It doesn't matter what file it's defined in, but if you have two of them you'll get a compilation error due to the multiple definitions of the same symbol.

@zonca
Copy link
Contributor Author

zonca commented Feb 1, 2018

thanks, I got the kernel registration working, now I have a problem with ThreadPool not being properly initialized.

In the Kernel Factory, I get the platform data with:

KernelImpl* MBPolCpuKernelFactory::createKernelImpl(std::string name, const Platform& platform, ContextImpl& context) const {       
    CpuPlatform::PlatformData& data = *static_cast<CpuPlatform::PlatformData*>(context.getPlatformData());          

However if I print out data.threads.getNumThreads(), it is always different, generally around 20000. Is there any initialization I need to perform?

@peastman
Copy link
Member

peastman commented Feb 1, 2018

You need to instead call:

CpuPlatform::PlatformData& data = CpuPlatform::getPlatformData(context);

The CPU platform is a bit different from others because it's a subclass of ReferencePlatform. That means the "platform data" returned by context.getPlatformData() is actually a ReferencePlatform::PlatformData, not a CpuPlatform::PlatformData. The CPU platform then creates a second "platform data" object for the extra data it needs to store (beyond what it inherits from ReferencePlatform), and it needs to create a separate interface for accessing that.

@zonca
Copy link
Contributor Author

zonca commented Feb 1, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants