Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for mmap across machines? #16

Open
complyue opened this issue Aug 12, 2020 · 5 comments
Open

Support for mmap across machines? #16

complyue opened this issue Aug 12, 2020 · 5 comments

Comments

@complyue
Copy link

I have a motivating use case, favoring direct mmap over serialization/deserialization, where a Compact Region tends to be reused by many machines, from multiple parallel processes, and even multiple lifecycles of processes over time.

That's some single heap of cyclic data structures (can be immutable at light ergonomics cost), which I propose to be stored as a Compact Region, are actively scanned at heavy parallelism. While a consuming node machine will maintain some fixed number of concurrent processes performing arbitrary jobs, a process will exit after done some jobs, followed by another process created to carry on more jobs. Some jobs may share a same heap of data, so it's much desirable that such a heap be cached by os kernel pages automatically.

It's pretty straight forward by using a virtual file (e.g. driven by a FUSE filesystem) that mmap'ed with its content fetched on demand, or a physical file on a shared storage (e.g. mounted via NFS) mmap'ed will do similarly, which is easier to implement but less flexible.

I suppose a Compact Region can be read right away if I manage to have it mmap'ed to the same address in space, from another machine, but it's way over restrictive for flexibility, I wonder if pointers within a Compact Region have already be aware of relocation and would work as expected already, or how much work needed to achieve that?

And if code change needed, can it be done with a library separate from stock GHC?

@complyue
Copy link
Author

On second thought, I realize there also needs a Compact Region building api, that takes a designated mmap'ed region as target storage space, instead of malloc-on-demand or sth similar. Is this feasible as well?

@ezyang
Copy link
Owner

ezyang commented Aug 12, 2020

The pointers will never automatically relocate, that would require GHC to generate different code to process compact region pointers, and the point is that you don't have to recompile anything. You'll have to map the memory region into exactly the same address space everywhere.

@complyue
Copy link
Author

I get it, thanks. Then I'm not aware which api I can use to build a Compact Region at specified address (within a mmap'ed region), does such an api already exist?

@ezyang
Copy link
Owner

ezyang commented Aug 14, 2020

Oh, I forgot about some internal details of our implementation. Since compact regions have to live in honest to goodness GHC blocks in the memory manager, what you want may be somewhat difficult to actually do; at least, it's not supported out of the box here.

@complyue
Copy link
Author

I'm not very familiar with internals of GHC, does the memory manager have some sorta extension mechanism, viable for me to mmap a region and persuade the allocator to use it?

I'd think with heavier use of this, a parallel Haskell implementation may perform much better on workloads with large immutable datasets as shared input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants