Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Place Timsort? #25

Open
haneefmubarak opened this issue Aug 17, 2014 · 2 comments
Open

In-Place Timsort? #25

haneefmubarak opened this issue Aug 17, 2014 · 2 comments

Comments

@haneefmubarak
Copy link
Contributor

I was wondering if it would be possible to do an in-place Timsort. I was thinking something along the lines of doing the run finding stage at the beginning followed by an in-place mergesort.

I've yet to write it up (I intend to), but I did some extensive tests using the sorts here against 1+ GB of data (same data for all tests - fetched once from /dev/urandom`). What I found, unsurprisingly, was that you can use multithreading to speed up the process (I suppose for large volumes you could even go hadoop-style Big Data). When multithreading, nested mergesorts were faster than any other sorts.

What did surprise me, though, was that an in-place mergesort was the fastest. Timsort came second, and normal mergesort third, but still. It came to me, however, that the reason was due to cache locality.

I was wondering though, if we could do the run finding from Timsort and then do an in-place mergesort, perhaps that would be the fastest? It could take advantage of cache locality AND pre-existing runs.

@haneefmubarak
Copy link
Contributor Author

Thanks either way.

@swenson
Copy link
Owner

swenson commented Nov 20, 2014

I think this is a great idea. If you would like to submit code to try for this, I would be happy to review it. Otherwise, I'll think about it and work on it sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants