Commit 56146cb
committed
Repartition by event name before writing to the lake
Previously, our Iceberg writer was using the [hash write distribution mode][1]
because that is the default for Iceberg. In this mode, Spark
repartitions by the dataframe immediately before writing to the lake.
After this commit, we explicitly repartition the dataframe as part of
the existing spark task for preparing the final dataframe. This means we
can change the Iceberg write distribution mode to `none`.
Overall this seems to improve the time taken to write a window of events
to Iceberg. This fixes a problem we found, in which the write phase
could get too slow when under high load (Iceberg only): specifically, a
write was taking longer than the loader's "window" and this caused
periods of low cpu usage, where the loader's processing phase was
waiting for the write phase to catch up.
This commit also removes the config option `writerParallelismFraction`.
Before this commit, there were disadvantages to making the writer
parallelism too high, because it would lead to smaller file sizes. But
after this commit, now that we partition by event_name, we might as well
make the writer parallelism as high as reasonably possible, which also
speeds up the write phase of the loader.
Note: this improvement will not help Snowplow users who have changed the
parition key to something different to our default. We might want to
make a follow-up change, in which it auto-discovers the lake's partition
key. For example, some users might want to partition by `app_id` instead
of `event_name`.
[1]: https://iceberg.apache.org/docs/1.7.1/spark-writes/#writing-distribution-modes1 parent 5ca2239 commit 56146cb
File tree
8 files changed
+38
-31
lines changed- config
- modules/core/src/main
- resources
- scala/com.snowplowanalytics.snowplow.lakes
- processing
- tables
8 files changed
+38
-31
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
127 | 127 | | |
128 | 128 | | |
129 | 129 | | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
130 | 136 | | |
131 | 137 | | |
132 | 138 | | |
| |||
181 | 187 | | |
182 | 188 | | |
183 | 189 | | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
188 | | - | |
189 | 190 | | |
190 | 191 | | |
191 | 192 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
94 | 94 | | |
95 | 95 | | |
96 | 96 | | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
97 | 103 | | |
98 | 104 | | |
99 | 105 | | |
| |||
145 | 151 | | |
146 | 152 | | |
147 | 153 | | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | 154 | | |
154 | 155 | | |
155 | 156 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
116 | 116 | | |
117 | 117 | | |
118 | 118 | | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
119 | 125 | | |
120 | 126 | | |
121 | 127 | | |
| |||
160 | 166 | | |
161 | 167 | | |
162 | 168 | | |
163 | | - | |
164 | | - | |
165 | | - | |
166 | | - | |
167 | | - | |
168 | 169 | | |
169 | 170 | | |
170 | 171 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
44 | 50 | | |
45 | 51 | | |
46 | 52 | | |
| |||
121 | 127 | | |
122 | 128 | | |
123 | 129 | | |
| 130 | + | |
124 | 131 | | |
125 | 132 | | |
126 | | - | |
127 | 133 | | |
128 | 134 | | |
129 | 135 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
74 | 74 | | |
75 | 75 | | |
76 | 76 | | |
77 | | - | |
| 77 | + | |
| 78 | + | |
78 | 79 | | |
79 | 80 | | |
80 | 81 | | |
| |||
100 | 101 | | |
101 | 102 | | |
102 | 103 | | |
103 | | - | |
104 | | - | |
| 104 | + | |
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| |||
Lines changed: 7 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
81 | 81 | | |
82 | 82 | | |
83 | 83 | | |
84 | | - | |
| 84 | + | |
85 | 85 | | |
86 | 86 | | |
87 | 87 | | |
| |||
175 | 175 | | |
176 | 176 | | |
177 | 177 | | |
178 | | - | |
| 178 | + | |
| 179 | + | |
179 | 180 | | |
180 | | - | |
181 | | - | |
182 | | - | |
| 181 | + | |
| 182 | + | |
183 | 183 | | |
184 | | - | |
185 | | - | |
186 | | - | |
187 | | - | |
| 184 | + | |
| 185 | + | |
188 | 186 | | |
Lines changed: 2 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
18 | 18 | | |
19 | 19 | | |
20 | 20 | | |
21 | | - | |
| 21 | + | |
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
104 | 105 | | |
105 | 106 | | |
106 | 107 | | |
| |||
Lines changed: 1 addition & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
64 | | - | |
65 | | - | |
| 64 | + | |
66 | 65 | | |
67 | 66 | | |
68 | 67 | | |
| |||
0 commit comments