Simple parallelization framework #4501

HechtiDerLachs · 2025-01-25T10:56:55Z

At the moment this is WIP and still failing.

thofma · 2025-01-25T12:04:41Z

Out of curiosity, what is the overhead for of spawning a process/moving the data around? I know it depends on the application, but having the timings for a simple example as in the tests would be interesting.

HechtiDerLachs · 2025-01-25T13:57:18Z

Unfortunately it seems difficult to reproduce a working version of this right now. We had it yesterday at some point, but I don't seem to figure out what's going wrong now. Once we have it back running, I'll put some communication timings here. This will also be interesting for when it comes comparing with the native Singular serialization.

HechtiDerLachs · 2025-01-29T09:41:13Z

The tests added here are running again, thanks to yesterday's work by @antonydellavecchia !

So here are some timings. As far as I understand, without starting a new process, parallel tasks are automatically executed on the parent process. I get these timings:

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000118 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000114 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000117 seconds (223 allocations: 12.016 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000114 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

When I start one other process, as indicated in the tests, I get the following:

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002743 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001832 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002679 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002704 seconds (1.74 k allocations: 90.705 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001806 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001831 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001891 seconds (1.69 k allocations: 93.752 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.003475 seconds (1.69 k allocations: 88.564 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

@antonydellavecchia : Do you happen to know whether the serialization and deserialization also takes place in the first case? Or do we benefit from some caching there? That would be interesting to know in order to estimate how much time goes into deserialization and how much time is actually spent in sending the things around.

Edit: Interestingly timings seem to go up when using more workers. With four processes spawned I get

julia> @time Oscar.parallel_all(a)
  0.007401 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007863 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007877 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007536 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.008121 seconds (1.90 k allocations: 104.127 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007908 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

antonydellavecchia · 2025-01-29T10:33:48Z

The tests added here are running again, thanks to yesterday's work by @antonydellavecchia !

So here are some timings. As far as I understand, without starting a new process, parallel tasks are automatically executed on the parent process. I get these timings:

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000118 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000114 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000117 seconds (223 allocations: 12.016 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.000114 seconds (220 allocations: 11.141 KiB)
(true, QQMPolyRingElem[y, 1, x])

When I start one other process, as indicated in the tests, I get the following:

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002743 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001832 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002679 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.002704 seconds (1.74 k allocations: 90.705 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001806 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001831 seconds (1.69 k allocations: 87.689 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.001891 seconds (1.69 k allocations: 93.752 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

julia> @time success, res1 = Oscar.parallel_all(a)
  0.003475 seconds (1.69 k allocations: 88.564 KiB, 2 lock conflicts)
(true, QQMPolyRingElem[y, 1, x])

@antonydellavecchia : Do you happen to know whether the serialization and deserialization also takes place in the first case? Or do we benefit from some caching there? That would be interesting to know in order to estimate how much time goes into deserialization and how much time is actually spent in sending the things around.

Edit: Interestingly timings seem to go up when using more workers. With four processes spawned I get

julia> @time Oscar.parallel_all(a)
  0.007401 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007863 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007877 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007536 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.008121 seconds (1.90 k allocations: 104.127 KiB)
(true, QQMPolyRingElem[y, 1, x])

julia> @time Oscar.parallel_all(a)
  0.007908 seconds (1.89 k allocations: 98.064 KiB)
(true, QQMPolyRingElem[y, 1, x])

If you repeated the experiment with the same Ring then yes the it will be cached on all processes.
Meaning you still send the messages but they aren't unpacked on either side.
We don't yet have a way to send messages only when we know they are necessary.

antonydellavecchia and others added 30 commits September 16, 2024 09:11

some progress on simplifying ref handling in IPC Serialization

45d3516

some progress

3e62a1b

moving things around

8d76aab

type params function

94587e3

some progress on refactoring

e4c5bae

some progress

1dfd391

more progress

c1cbd72

almost completed uni variate polynomials

12eb6ef

uni vairate polynomials serialization passing

84c46e6

MPolyRing tests passing

e35b665

moving things around

4537cf0

grading needs containers

eb3c734

not sure where i left this

a94c566

new load_type_params

72a96cf

bug with julia 1.11.1

5e06a82

vectors + polynomials working

4cbaee4

some progress on tuples

7255064

reworking save_type_params

b41dee5

tidying

d758fad

dicts working

f61ecd1

containers completed, need some upgrades though

66a5f2e

tidying

32bb5ee

broke everything

47e3f3f

fixed containers and polynomials

d971bb5

some issues with field embeddings

beea514

fields tests working now

b6f35d3

algebras working

4b37896

started on groups

0b0d1de

Merge branch 'master' into adv/ipc-serialize

62b1bbc

fix, inconsistencies from merge

0324e87

HechtiDerLachs and others added 11 commits January 25, 2025 09:58

Actually add tests.

3fc7f24

comment out save_as_ref

1b2d2ef

tests working

784f01b

implements generic type_params for parallel task

380c9ea

test clean up

cf313ce

going through

b0f001a

Fix tests.

94f51f9

container fixing

6610015

Merge branch 'adv/ipc-serialize' into simple_parallelization_framework

e70a1e4

Clean up parallel.jl and add documentation.

8e5a753

Adjust tests.

5849fdb

HechtiDerLachs marked this pull request as draft January 25, 2025 10:57

antonydellavecchia mentioned this pull request Jan 25, 2025

Parallel Framework #4502

Open

4 tasks

antonydellavecchia and others added 9 commits January 27, 2025 17:20

simplify type_params

4ded0c1

fields working

0efc726

back to polyhedral

a69c0d4

graded ring working

faa656b

sets not passing yet

81cb2e8

started to change docs a bit

4ee31af

fix some set tests

0fd56a2

Merge branch 'adv/ipc-serialize' into simple_parallelization_framework

5011a7c

fixes regression

8ec5fc5

HechtiDerLachs added 4 commits January 29, 2025 11:20

Some cleanup.

dbab05d

Expose a bug with sending dictionaries.

6bb9803

Fix exposure.

11909b8

Fix tests.

7934683

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple parallelization framework #4501

Simple parallelization framework #4501

HechtiDerLachs commented Jan 25, 2025 •

edited

Loading

thofma commented Jan 25, 2025

HechtiDerLachs commented Jan 25, 2025

HechtiDerLachs commented Jan 29, 2025 •

edited

Loading

antonydellavecchia commented Jan 29, 2025 •

edited

Loading

Simple parallelization framework #4501

Are you sure you want to change the base?

Simple parallelization framework #4501

Conversation

HechtiDerLachs commented Jan 25, 2025 • edited Loading

thofma commented Jan 25, 2025

HechtiDerLachs commented Jan 25, 2025

HechtiDerLachs commented Jan 29, 2025 • edited Loading

antonydellavecchia commented Jan 29, 2025 • edited Loading

HechtiDerLachs commented Jan 25, 2025 •

edited

Loading

HechtiDerLachs commented Jan 29, 2025 •

edited

Loading

antonydellavecchia commented Jan 29, 2025 •

edited

Loading