You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first pass everything seems to go fine. No swap storage gets used:
Network Port: 8444 [chia]
Final Directory: /chia37/
Stage Directory: /nvme/
Number of Plots: infinite
Crafting plot 1 out of -1 (2022/02/25 19:40:16)
Process ID: 30160
Number of Threads: 22
Number of Buckets P1: 2^8 (256)
Number of Buckets P3+P4: 2^8 (256)
Pool Public Key: 87360049e3f15e7b3b8d9f7605e5e6abfa44c254e183ffbf7f63a1c3a8a7d9265fedb3e80ad3f102402818a
24a18e6f8
Farmer Public Key: 8934b2b0af6fb032944907fb0f38ee706f2a557ce152fb9f7e07a7f7978c6e90cb6681cd90b4390c664513a
0dde89ab6
Working Directory: /tmpfs0/
Working Directory 2: /tmpfs0/
Plot Name: plot-k32-2022-02-25-19-40-194f5119638f2c64f8808aabc4272e27b923a7bca63ecb8703fd24a96330c28d
[P1] Table 1 took 16.6896 sec
[P1] Table 2 took 135.187 sec, found 4294906657 matches
[P1] Table 3 took 147.339 sec, found 4294884563 matches
[P1] Table 4 took 166.244 sec, found 4294689520 matches
[P1] Table 5 took 164.188 sec, found 4294408554 matches
[P1] Table 6 took 159.424 sec, found 4293875369 matches
[P1] Table 7 took 127.028 sec, found 4292745068 matches
Phase 1 took 916.121 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 9.10817 sec
[P2] Table 7 rewrite took 36.413 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 34.2086 sec
[P2] Table 6 rewrite took 48.4351 sec, dropped 581421025 entries (13.5407 %)
[P2] Table 5 scan took 32.2651 sec
[P2] Table 5 rewrite took 46.5942 sec, dropped 762109002 entries (17.7465 %)
[P2] Table 4 scan took 31.7279 sec
[P2] Table 4 rewrite took 45.8766 sec, dropped 828989486 entries (19.3027 %)
[P2] Table 3 scan took 31.5959 sec
[P2] Table 3 rewrite took 45.481 sec, dropped 855164322 entries (19.9112 %)
[P2] Table 2 scan took 31.5307 sec
[P2] Table 2 rewrite took 45.4374 sec, dropped 865601145 entries (20.1541 %)
Phase 2 took 455.778 sec
Wrote plot header with 268 bytes
[P3-1] Table 2 took 41.9283 sec, wrote 3429305512 right entries
[P3-2] Table 2 took 32.6918 sec, wrote 3429305512 left entries, 3429305512 final
[P3-1] Table 3 took 53.1187 sec, wrote 3439720241 right entries
[P3-2] Table 3 took 32.6833 sec, wrote 3439720241 left entries, 3439720241 final
[P3-1] Table 4 took 53.6664 sec, wrote 3465700034 right entries
[P3-2] Table 4 took 32.5607 sec, wrote 3465700034 left entries, 3465700034 final
[P3-1] Table 5 took 54.4015 sec, wrote 3532299552 right entries
[P3-2] Table 5 took 33.341 sec, wrote 3532299552 left entries, 3532299552 final
[P3-1] Table 6 took 56.2956 sec, wrote 3712454344 right entries
[P3-2] Table 6 took 35.0807 sec, wrote 3712454344 left entries, 3712454344 final
[P3-1] Table 7 took 43.3253 sec, wrote 4292745068 right entries
[P3-2] Table 7 took 40.3063 sec, wrote 4292745068 left entries, 4292745068 final
Phase 3 took 513.911 sec, wrote 21872224751 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 57.7683 sec, final plot size is 108805631481 bytes
Total plot creation time was 1943.65 sec (32.3941 min)
Started copy to /chia37/plot-k32-2022-02-25-19-40-194f5119638f2c64f8808aabc4272e27b923a7bca63ecb8703fd24a96330c28d.plot
The parallel job is similarly unremarkable. However, I start to notice there seems to be memory in use that is unaccounted for. After the first pass, the free ram seems to not all be freed. Some is buffered, and eventually released as the 2nd pass starts to use more RAM. However, very quickly it starts swapping.
Crafting plot 2 out of -1 (2022/02/25 20:12:39)
Process ID: 30160
Number of Threads: 22
Number of Buckets P1: 2^8 (256)
Number of Buckets P3+P4: 2^8 (256)
Pool Public Key: 87360049e3f15e7b3b8d9f7605e5e6abfa44c254e183ffbf7f63a1c3a8a7d9265fedb3e80ad3f102402818a24a18e6f8
Farmer Public Key: 8934b2b0af6fb032944907fb0f38ee706f2a557ce152fb9f7e07a7f7978c6e90cb6681cd90b4390c664513a0dde89ab6
Working Directory: /tmpfs0/
Working Directory 2: /tmpfs0/
Plot Name: plot-k32-2022-02-25-20-12-2fca26a0acbcef6ed2cb766af119b1607e39dd87e4e6bd9e13f2d60d09b9eef4
[P1] Table 1 took 17.2749 sec
[P1] Table 2 took 141.758 sec, found 4294798949 matches
[P1] Table 3 took 154.108 sec, found 4294613298 matches
[P1] Table 4 took 177.264 sec, found 4294121045 matches
[P1] Table 5 took 176.233 sec, found 4293306541 matches
Copy to /chia37/plot-k32-2022-02-25-19-40-194f5119638f2c64f8808aabc4272e27b923a7bca63ecb8703fd24a96330c28d.plot finished, took 674.963 sec, 153.735 MB/s avg.
[P1] Table 6 took 164.334 sec, found 4291669308 matches
^C
****************************************************************************************
** The crafting of plots will stop after the creation and copy of the current plot. **
** !! If you want to force quit now, press Ctrl-C twice in series !! **
****************************************************************************************
[P1] Table 7 took 129.035 sec, found 4288198668 matches
Phase 1 took 960.026 sec
[P2] max_table_size = 4294967296
[P2] Table 7 scan took 9.36747 sec
[P2] Table 7 rewrite took 80.4473 sec, dropped 0 entries (0 %)
[P2] Table 6 scan took 34.3109 sec
[P2] Table 6 rewrite took 49.7584 sec, dropped 581724842 entries (13.5547 %)
[P2] Table 5 scan took 49.8793 sec
[P2] Table 5 rewrite took 48.0592 sec, dropped 762463788 entries (17.7594 %)
[P2] Table 4 scan took 35.6396 sec
[P2] Table 4 rewrite took 134.517 sec, dropped 829251400 entries (19.3113 %)
[P2] Table 3 scan took 122.388 sec
[P2] Table 3 rewrite took 164.452 sec, dropped 855382896 entries (19.9176 %)
[P2] Table 2 scan took 47.5694 sec
[P2] Table 2 rewrite took 164.525 sec, dropped 865731775 entries (20.1577 %)
Phase 2 took 959.806 sec
Wrote plot header with 268 bytes
[P3-1] Table 2 took 52.9561 sec, wrote 3429067174 right entries
[P3-2] Table 2 took 32.7323 sec, wrote 3429067174 left entries, 3429067174 final
[P3-1] Table 3 took 53.3638 sec, wrote 3439230402 right entries
[P3-2] Table 3 took 32.6586 sec, wrote 3439230402 left entries, 3439230402 final
[P3-1] Table 4 took 53.8672 sec, wrote 3464869645 right entries
[P3-2] Table 4 took 33.1071 sec, wrote 3464869645 left entries, 3464869645 final
[P3-1] Table 5 took 54.5899 sec, wrote 3530842753 right entries
[P3-2] Table 5 took 33.5783 sec, wrote 3530842753 left entries, 3530842753 final
[P3-1] Table 6 took 56.5318 sec, wrote 3709944466 right entries
[P3-2] Table 6 took 35.2192 sec, wrote 3709944466 left entries, 3709944466 final
[P3-1] Table 7 took 44.1606 sec, wrote 4288198668 right entries
[P3-2] Table 7 took 40.3744 sec, wrote 4288198668 left entries, 4288198668 final
Phase 3 took 528.386 sec, wrote 21862153108 entries to final plot
[P4] Starting to write C1 and C3 tables
[P4] Finished writing C1 and C3 tables
[P4] Writing C2 table
[P4] Finished writing C2 table
Phase 4 took 57.5257 sec, final plot size is 108744530557 bytes
Total plot creation time was 2505.79 sec (41.7632 min)
Started copy to /chia37/plot-k32-2022-02-25-20-12-2fca26a0acbcef6ed2cb766af119b1607e39dd87e4e6bd9e13f2d60d09b9eef4.plot
As it's copying the completed plots to the disk, and the tmpfs volumes are being emptied, the RAM usage does not decrease. Instead, after one of the copy jobs finishes, the RAM usage hits a low of about 100 GB, and then actually increases until about half my RAM is in use when the 2nd transfer completes:
total used free shared buff/cache available
Mem: 515955 238948 276687 17 319 274698
Eventually, if I allow it to keep going, one of the jobs will be killed for lack of memory when the swap becomes exhausted. Whether or not the process is killed or terminates normally, a large part of my RAM is no longer available, as per above.
The only way I have found to free it is to reboot.
I use zfs on the destination disk, but I limit arc cache to 4 GB. Also, I have 24 GB of swap. Once again, none is used the first pass, but increasingly it is exhausted until one of the processes is killed, and the other will continue normally.
In addition, the overhead incurred by swapping actually makes parallel jobs slower.
UPDATE:
I tried running with no swap, and it nearly worked.
I could have probably freed up another 3 GB by using a lighter distro, but I estimate, there was still ~7.6 GB of tmpfs to be written. I calculated that based on 8.7 GB shown free on tmpfs1, and the past observation that it bottoms out at ~1.1 GB. Therefore 8.7-1.1=7.6:
Shortly thereafter, the second job crashed at the same point. Both file transfers had completed, so something else is causing the additional memory usage.
Obviously, "MemAvailable", which I got from /prox/meminfo, was a bit optimistic. That pane and the one with /tmpfs/ space to the left are updated every second. Only a few seconds elapsed before I was able to get the screengrab.
I don't have a problem with it swapping some at ~P2/T2. That happens when I run one job with 256 GB of RAM, and it only adds a minute or so onto plot times. Unfortunately, if I let it run, it will use all 24 GB of my swap, and I'm sure it would grow much larger.
tl;dr
When plotting parallel jobs in tmpfs with 512 GB of RAM, every pass needs increasing amounts of RAM, until one job crashes.
The text was updated successfully, but these errors were encountered:
I'm using madmax 1.1.8-ecec17d.
I have a supermicro X9DRI motherboard, with 512 GB of ddr3.
Prior to starting plotting, less than 1% of the system's memory is in use, even before I stop superfluous services:
I am attempting to run madmax in parallel, per numa node.
I create tmpfs partitions:
and I begin to plot:
I also set swappiness to zero now:
sysctl vm.swappiness=0
The first pass everything seems to go fine. No swap storage gets used:
The parallel job is similarly unremarkable. However, I start to notice there seems to be memory in use that is unaccounted for. After the first pass, the free ram seems to not all be freed. Some is buffered, and eventually released as the 2nd pass starts to use more RAM. However, very quickly it starts swapping.
As it's copying the completed plots to the disk, and the tmpfs volumes are being emptied, the RAM usage does not decrease. Instead, after one of the copy jobs finishes, the RAM usage hits a low of about 100 GB, and then actually increases until about half my RAM is in use when the 2nd transfer completes:
Eventually, if I allow it to keep going, one of the jobs will be killed for lack of memory when the swap becomes exhausted. Whether or not the process is killed or terminates normally, a large part of my RAM is no longer available, as per above.
The only way I have found to free it is to reboot.
I use zfs on the destination disk, but I limit arc cache to 4 GB. Also, I have 24 GB of swap. Once again, none is used the first pass, but increasingly it is exhausted until one of the processes is killed, and the other will continue normally.
In addition, the overhead incurred by swapping actually makes parallel jobs slower.
UPDATE:
I tried running with no swap, and it nearly worked.
I could have probably freed up another 3 GB by using a lighter distro, but I estimate, there was still ~7.6 GB of tmpfs to be written. I calculated that based on 8.7 GB shown free on tmpfs1, and the past observation that it bottoms out at ~1.1 GB. Therefore 8.7-1.1=7.6:
Shortly thereafter, the second job crashed at the same point. Both file transfers had completed, so something else is causing the additional memory usage.
Obviously, "MemAvailable", which I got from /prox/meminfo, was a bit optimistic. That pane and the one with /tmpfs/ space to the left are updated every second. Only a few seconds elapsed before I was able to get the screengrab.
I don't have a problem with it swapping some at ~P2/T2. That happens when I run one job with 256 GB of RAM, and it only adds a minute or so onto plot times. Unfortunately, if I let it run, it will use all 24 GB of my swap, and I'm sure it would grow much larger.
tl;dr
When plotting parallel jobs in tmpfs with 512 GB of RAM, every pass needs increasing amounts of RAM, until one job crashes.
The text was updated successfully, but these errors were encountered: