From 74febeff72cf03200b69f7d48e4c3e79a2d37d23 Mon Sep 17 00:00:00 2001 From: Jihoon Son Date: Tue, 26 Nov 2024 13:38:11 -0800 Subject: [PATCH 1/2] Add doc for memory management Signed-off-by: Jihoon Son --- docs/img/memory_state_machine.png | Bin 0 -> 57761 bytes docs/memory_management.md | 40 +++++++++++++++++++ src/main/cpp/src/SparkResourceAdaptorJni.cpp | 2 + 3 files changed, 42 insertions(+) create mode 100644 docs/img/memory_state_machine.png create mode 100644 docs/memory_management.md diff --git a/docs/img/memory_state_machine.png b/docs/img/memory_state_machine.png new file mode 100644 index 0000000000000000000000000000000000000000..310737eeaac7404ad153902703b2c62d575677e9 GIT binary patch literal 57761 zcmeEu1zgqHwm%?^goH>)2`JqN(%pziNh2V6K)M^G5hX-vFi^TXL_$$Q=`K-9ke2@M z-vMUy-nlb#-+TAHJNNJB-RE$A=ePG>d*yepz19g;QI^5RB*jEPK){xhl~hGQKqW>% zKomqr1y5cDOk{xnAUdeZh$G~6kk26?s0TPoX*t4NZ&_HIBG7S39Q{Sd$!>1z;7G?M zNyo`)bo(}|iG`7wy%EfT)yC8jJObYvnVH%gzo2C6ZeeY0M8_%3!^#f6x};=eY+>VQ z>tIgDB@RBz**Kb7ga3lZ;NPp)!9QBy|2fzVICu;$Gk`zEZr`>x)ik}SU;$>A=Ha@` z%Eb#FUs8}&Q&ObklmLHQTUeQb|HzmcTiHTGB+Tt?Vc>~02PYpZJM@3x;dP^1M)nrp zPa$Pu_BZdSnSy9RPlf|&Br`#_I-XZkhx`q46UQV7mS(unQ91lJk>1b))b0ZU5m+#j4dq$qH)85t=%=ewtj=uQ*U&t~fOsyd`{u@35 zrTOQ_{=yCWU##eu?ElhqwpTUX92B^$?D#GDH7s~IWlR;^Ss)XFRN2AN?L=cuO#rix zKib&Z07-}&JJ~y%LNek2AM9?O1P& zoE&Yzd*+TX>*H@tT`e57zWe(I_?w-T2fPG*BMIp#G&B4Iq@wT-Hjegg(4b#Fd^ZaE z?$=@PPbcGU+1fZt8^J8Bp$B62#^x4|KqG-^vr8H|0S$p{QXG2ex~-GFF{E|oj*dW$ zIeEB%b@BkQLH|H6IIx=8+L~FL8r`;VU^TXd!M`wefWInz>u3ou>W4*mIK}^VnImK3 z;04P%I2zeI9&?F<8``mri5O%afC}5&rogX2PriS_0Zycfsl5fv6p()XHsnLi!0UYA zT`-%OspCKR@<_^%Xu;dy!*A%;@uQKT!G;+XAl26gX-i zMj#k_Q)?qf3+EpRH9B?~X1}}#Y3-k}1MtqVRsPe)`+M7WHnMg)CQ)5kLHVlYv3LH? zkbX227fAIWi9w@WfM*6G`_3qSP*hn{YiCnO3u7bj&JnjksBtufEaP`%_xt<%Oa*Cg3kVuA`4Gzq(fr?qla;{+(;(Js$S&BOYK`|0WKN^S|cNj)whoY5$up z4eqKq+5ga`!L|LrU}pd3F70^x-?=n3Syd@9Ndr}NWo0>KnZL8BzmR7;#=3ufzsCP} zQurRI{R+dtVdezB{flrLJL@m5?FI-JzhE^enEdhqV)!puO$+*PJPPt`&@c!}0$lqO zx&7}Ux8q>_hi(oYLjUvJ+zIIY4x)W;LO;(sKhHXUdp!QL76Z?p{O4H=*D*5qzp$8- zNCFQrte5{JnEoZId_U}Gl=G5m_A|LLSA`>(F*7nJ)avYN-EPQdXm$oXel^LNvl-{ZGmh4v4R zKu2PS1|6yJeS*}cJ^wV3}hvD?7zVW-v^qX ziGrVrg1=1^9P9NT!U1?X`acg2a2|WNe*g#m1T_2c9Q5me{{jF%K?Lv=0F-6^6Zrci z84VwEREzL4Yw*9BHTbgz1PUd9Z~o7-Ag;eZlYSp_el>{i4>$iJqk$*U{)Ex|YA3%O zb$=So9~#XM#Z*T?kiL8Huguh6=?w>WFe>pDrGsbyRr3&hxji6%Slfu&5wxC%A)LQ>J=zlcJ$y_j3 z6=*b=&Gjhl&&zuphK}PY2mdj6_~Uxw?`QtU^B?Dr{(IANfy+YaBR>2Y&cR>*u8{io z#~i;6?^gJuH~+4w;Wz7|kJaoq@Rz!@vH>V4ysG&ZhG0A=nCo{S7|-$j=)ZY#6K>}} zAmN`7>}SWv@r~$jn)b0?{~-k9fafrN-<|yGH2*>^AHRPe@GGSHJ~#6dgZ=E8`lZe8 z&sxy`KNyUU_n0Tg0hJSS3{ZyVUxUH^i_`zbFxW9m{{aTm6q8f?3q!EW{6A{U2_pO9 ze)(95{>>rS|6%dW$+7(gI12qy)9nvEiT@ur&%)bOv>=B4bJv7N#`Zsw+&_X=f1*W= z1AesqS7NX~Gxx73?{ALY92~qq=CV2Xj}u%cC;cOoc|7c&2?+n!2Vkz>&u#xii9b8* zkAGqS_=Vp*XZ>j-|NjRjaV$`o9=XJW;gY28w@SjL6UbV|4io zW5nZp>OZ(C{q1P-^Qz}B_|q&uB#QngA>%O~`NOo`|3}e;^YV{3Kfiqb7e$kk<^H+p z|7y|Xn5TaqO|Gjb%BlV3?|wKq_K1E5vmo17!G~&Gih}6b8NpP zjsB)39bTROJ7u!}lL+8g&;Jks{OgOV|9eRzc8(t_DvlGu|Ar|ec*C_FH9`B1d1H(Eg|_J<85ovtlYXUi+33e7rG z-k4rPfWH@U@%DgGaAbV~6Ez++;)io*t^_fPAt2%T`YIp=Q?pjA3rI_VkLI+uo6g7( z%43f2eYzZsk4%3?g?Ua?HzDv|I5k*Jk)F!V&t%agf)Eb_af%;bMg<#P6q-f!8es@l zAR>Gza!KDnc0>+h#G4kZ_hU~18VyMi?-zD|vvbl#4*6WMCkBI4-f$VQ31$4C)g785 z7>|ltQRSVKuhye*+8Joh1ZiKBMQ(51^Tc2!Rv#5I_)1@u5bR!-9~M%8MG>ZuI10`o zHU%|zsj`Em{lt)ETW}hi-`pdhO1a>I*Bb!eyCyOov2G=qF|+|k0xIf}_7JX}GgR?A zc2msIoC*X~B-PwTZ3Zz2D>d}6B@{q0#EOf)6{3OAU%m}sVsrvL2QcUpjT}E%8TsiM zDiVQ6O3hX&U)QL&ls%{rJ;R!H;x{5UgDpR)lvTxn=(7OyDNncd z+MS_FnO;RJ^@Gzl4(RjTrCfmMqhiH76@ErkSL-wj3M7wnl7I0&c)>1%ju0JQ#SLRv$j;L$rbQosmMY zB+-8g>8Fr@YW=h%z=S`tq%JAL`L$RRYhoksRD3^%m7Frio@+8hyPIDW-k8i|9(jt8 zc!VkOlDzbrkeQWHv-9{BD&C;fm!LK z(|$zg5us*QBGoR-tiyK?KEbUN@|RbDQ=2=@fC$;Q2Jk&>3mpcBJN?aDbz8Mlr_J~z z>}saZ=WpIR+^s*<@mxzBXZG=_>Q2t%YvG70w0?4yFY0Vxk7;tWX(t+Ad& z%HTKQ<%5T|m6mos-(FFM*?KMLG~3u9-KuhAQZ0xE}6~`@HsY?6{t~)03P!zB{(X zN^$tDm6&pWRwf)vZ&qADM5NQL9M5iu%I(ul{3ATvQ;C|#=r6Ws$Sn``pcDXsOxu!`s*}CW~DDP zE?BlcVRd>w_4>kH=7UwX!=k-=k|--~Ujry>9HItlKGLx9{|4)n{kwg=Veg%0R1d8Q)6`WtEP=+-m*By1f}TmxH=I`w*;J z@yHk`z(vroOPb6kMhLyV(V^-&O*WpD+leOf?d4R<+^P}&w)n#;g-+g9pe4koPomqm;5X@Q?dwh7n5;f!mM)3xn0a8#=edHnhMpC{*ureFAcZWmD zb^6+IU>$LX{Szz6%bsTZ6E+0I6e}Szg!w-E3*ndRx7%l&(4}LqY*Mpyi=#$`>XX=%F1hV`MMu=sCn*mr;b%h1{mkkx*n{h`p9y~`wK4F-DR=? zno6-=O}RIL7Bug;k!?`xx}Hg?N3UGMDJBshkLCfm7J3P*6M#M<+ z6&Olc1=yiZJX@hmfBjbNG7AC1N>Ug(lBnT?51 zXIUS+hNK^yvJVMX!B4xDLD?>BO84^>O*>$m<%;CPMpOPxzA?#*CZ?+gz4_i7%2VH- ziR@e0#bxV0;hQGu)WXDC2JC3tx|$raqUp1t*~|ghS7cs`yX2 zQcq8Ap5ZsPg756OxB8%-d3~v``tVABr7}OFD9x+!z4mc(lLtGxHc z26MZY-ZbkjEw$J_4hHuX9Fe`R;xQyC`-^5XvGGI>l0 za7Cl})nOR5i>iGPJEbC^4?N^LKJmeZ!C`6A8MEN_&2Gc%#WZnR{W(n@qYL_xD&sqS zvkb%%k2UYWSiQLN2Tt4yl7e^8DKHU;zyk{`fSY`SSZoqx4Va0L+*8J$@zst zRl&F7+B)WqigDF7Du`l`lN?gkl1ypD`mH`a4i)a@tnxO{EUYVxm3jLT6GE56_~ry8 zJ7HMqC{@zSV!6U45*S~Fr~Huv6%2Mh2=ohmq{d~~J$-nvd{}pZ^%CN^vpG*7J;o)( z{jOkPN3+tvP^{hsNM+-TzbD= zU0bJYoj95(jR39GR*g_g`#@xG40P?$+raa?s`<{1=Jh70i)O}Qc{K+;tqY)6kX5{I zF!Std7pCF{0bQ*5YH)&F$QhqpoA=1kxFD=O(JQN(oMH2Eo1PzdG3f*(#?B~Mfeu^- zkP`btgsDc8KtiDUqEu{1UIdLwQ~qU?AMm%osx{`MmMBMaN*N2np0VZ z`h;SUu3PG4eC=AitW%k0vS)GCV@Q1XKo-5rTDJY5``SfTa}PwiJ|}}*LvsOc74M<$ z6b*Xp*bx<>#p#E}O(^Z>dM=+$*?wn6VXs4H5S!5DtngWA*-Zg)3V6Cl%Hr9FVI?w0E#jHM%M5WuLV|`s#oczUF;MU-7?Nu&w0^*XQ3GQe7 z?aCZOFJzgB8so5(ywEnRXlG})%4#!-O!jedQ&h&4Kfiep7Kf!ffi%OT-JT|Qc1I(i z33p#vTp{&BgqdWJa1$Tek<<#vf#07QQ00V9Ftiw#-12g-UrNG4XOD2-V~KX396 z#nQG=5h_W(&l3EgB8}F=sowBQclP5JD8Sf%O7yZ4JLQ_3jaVOb#ZFA5-%w2sgRXZN z2iu>gqQCrXiS5_xDXX#eR5=T+Az1jAXd}$8PR-i|G3ln~RJX;A9&(CI?YE)sf7X~gtz%MiEiX|^W z9J-BwqIKW+PIswlnzlxF8%nK?H5R2_CduZbT*@#kr;VaqA4T1)I@@)eo-l;-@&i8e z*HanmOVf%r^!uFs=To^uvGDn5r-GRnvp2GKKlvwjmwV4!Wz}b}_|cx8t(MeQdcG_^ z_jzblE(genPIaJ6wdiU#Nom-%oAqU?+;(C*>=xOy0wqpri6Qtyqx7t@v>)PQoXwrY z#_Iz1vcBY%5IwCc#-pJ8)_^9DDBtLhd}@P>yHcX={qjc%CH?4BI?-7Vb+7vJwB1X* zEO{F~p{n$X>Rzn(tQOy^eHvJ;z&Vms^aUU(^4c&ZNHeAJXNzr%iG)mbXt5vhRI3D+ zNvd*+H3pO^sASYy={zQ``x+EfhTmnV6O5WqI$-Jh;^DT_ls`;tJucsKCT-A1T@9Zn zHm(1(B=*-0Kl$7|75i!1!%bV6P^=+Uai{JwsYxRpMDuDin8)LXK0{pE0UKYt1g&CTT4;pE`Xs|l?l!7cx0X(0*dC94ZF{AD?z00u z({S5|{ahS}a4fGfsSlU2t!Mn3+;W+F!X)Spm2}PV1!xgwhb8AJFAe6oJh@e>u-N~? zm)=fn7;{c9KNMX_=39GE;RctT*f@i#h}by0YO1W0#={-0QtO4arT((uD?PTA-N{R8 zdGiJVVm&n}S%%r7Y(KvO4fX)@yU;(W**`?gu_N+{r&=EO-7uiw9eH>>W+- zI-7W1o8dVuhF00!gU(B_YWW%waie--FXvNqcZnf8J&znrGKaFmx*R^-V{_pKLU@94 z9jG9JpCtOMvHKP42`=Xu@l@O^n)Q{2OlPdEGFbtwn9&HgGs#A?a*FTI?4npN0+B@fW21-yv@I_m4bi2Y(ytr@3;STr7nOY^H4Ze6B zchHw33A_6khQHXQC=9DkgPn?XG{4f~=|DZQ`dmS97BIdRj~L%#PmKVH1V4)&MTtD? zpl0Xr>BsS|8}D`K&Y0d!sol}R(DZoFS;?YGX_vmEbc*gi&|aUaCX`-1ulwco`(Fgc zRX?1Smshm<WXRzU4ax445DI<>vbj;jO*=}V-V;FDYO(# zUV-OjzRG~SOloK0IUovyR`0(?Um&-&) z^FW^pFO||gdOIW8eQ*eP5Iamcr>~u!4kIj`S7qWZVeD!`*&qR;RF{FhF&S4=5g(iC zZ9*9>u5!P|M7Q4lJkNLu_e!uAn$F6Lo#%&7F~~dJ`-(APKF0WxcSL(Qn(0Jpxj;>m&E&3TRSqCUfhltJ$(rZQxMX^{A zPC!KBy>uF_2!I9PorHz6K32z91P7yL;3F?4Vu43x$W`;|`P%g5bgbqYJ{u~WVsGc; z(z+;GC)53Mn~+%w`^VlHd}}0j_TG2#*>ag#6!)qq_);L`55e=eVX}dA*n`+WO>s%e zRN>Xv7jPM*F|fQMRCP!KgE#g276seO)R1U$9%$&BqdrDPKTjHKB}&nO(6dogzOx@{ ziNc+rvG0^WZ$zol^G+F7Y#3Y@umX|<`HEeymp|>!Z{Rzsc@HlyRBT2*L4Jg(9*BT^ zmPSP>C+M);Cpfg^>v$z5iiCD1#K z1j7G*GC}IMrXURm!JYSaS-({R5|VIJqNTfstj)COw0_XFDazzL9_EyUEv|656Ydzq$4T@0yo>5he{Pn<^S)s@ovP z`nvO(7BUcV8ZqK9>YMnSVX4ZCfjnfY#S{xT*d=s$%x>l_DYs@V?%cfID9S|?=XxCZg+`OHR zrDsl14AeJRLow7VcQ#9)uc?yN*8{KRG=nO23Z+FLS0&^Y8Sy35ml{*l--bsP=~hCYEUtuPMd zECS>{dc2Ouxg9CFd3>UL2}{Z88UCY%S2>;Czf4^c%f`yiMpx(J=-NWG=U*>0$>q)x z=x}_>1wZ3EaBScS+-WLyxtGq0jz>`hBRvOVrKlkR$Oa`EetvqC-cl*YRxPEc4J=<# zigKZgw_j)Eb))T)5si*+s2hJP)v9&vWoAV|67ePevq%z6DA`;z5+ch%jS}kYWnc2w zbIeR{h@>PzIrfHAApei6-y#hOMCuwhEFiE#)b$tR^D{-~4n;S@m!tIQ7rSXeLdhNX z!lfdC>@eMhGA81@J8nmCgEsQhCN8g-6@9Qh&4FE{mqvnD^MH<1MPq>|s4EzH9-^5P z-4Q8c6N15qx$;>XG=yrc-s16THlp*cV9@1V2iGsm2pG$2jjH z7E*hq-UJsm6Auuk{0+v9`2o|Y{WylHs98-V8V_8LLCX{5u>OWcGUE2pVWz)f@=0c zc@_xJ%lV3I=?Z9!yhJ|UmbqWZ?|OzT0%0Y|g;{OfE64Kn@VFKlBKqU22zJz7lyJh5 z8hT*Sc*IF)UH+dguri}R$S9`8vvDBwcD><`G%o-8!%Q}b6sI#!&xkn+Qqvk{^o1&^ z(X;a4*fUjo?%v%jOt85`usPd9Vzg4L3Elt;VJQ^j&#fScv-{qA-x#y zD=sF6MMt!^#GO+mL|IuBIBmUm0aLrTK_=a+^;92_*)X`+85l$Bfs#)RmyjGFePYKH zA>N?!t>`+1|E%JIO8FH;j+c7pa%|Y&4D{Y}c!dJ)Ua%UKvdo8t<&Xlro*|}FQ=_S> z16?2kyc(>)mjFt}hWu#^uQ;j?CDvgtnt(}l`&Nt6d;+E2oEWCW{ZB3i9 z|E((j$z-3LXJ$GZPX~0b#f}R!$Bj=F>=MCG9i6!l8-J>iG0(w`pb6!HP3x@48T4|2 z4|lbaMa`|{z$H@rdgay#xPGe){iLq1Za%Sl+HITI3XJ^*E}fb_6IjAsVSIxsqyc!h z6`)I~Fy@&Sd8(1s*YmGi6O1?F^WE=7Y|Jc=0Ie9rd5rTE~h*PZPORVYj6 z7Y(lDR^Z2Ldn5&0z#}|21X^98rl7!NnO?LrSvDSlY3%5f7xP5{MVQf*&6a?oUKKNk z%&)trLrxD?5CNw6$pVTbr_f&@Le4p=ffP=W$1fC#B8X9=vZ(~uf!nlAx@I>9JgSuI zUsY?IN>+%*Z-w;B^qfY*lMM$sdF_J?U@E=@NR#@bAZ4%9Yl+{OJ=L*ut4XzQ$;pD; z-(R8J%!vEV0P_e)BG8Bm5@0&uiJBw?(3nE^ewQ$tP%jln(g%4Kk`7aGZrLB;t_E~W^G_b#~F6LdeHNR8lU zjxiw9S~ilO70Ngq%p9cIrR}*b}9*<;ogC!oMuHJ%hqIHMq2?XM1l?)dk6LYT)BS6C! z)jfwE(S4b&VT+KrxwD;^UX;_>sDZGa5g5Gsh>h)?*Asmk=Wr0wPt$_T7v8yNkS|M! z5j_pb=iS`3HcUVw9xekOx)e&8nCwgA9=p?kF~(VTN;!ZGEuk2*+D2Q>0hb4>C4||K zhmC+zXi;REP)035lIkf$3w`DNa2@C#l!Aj`!UL1TBe<$v324iRB!jI2F!T8J{V}q8 zFY}iR$-lp&)p{su4bhQ9ju2%NTru%_5aVH3R&;c6*1R>)la4$0{b{H~1!X3XTh&AC zW-5(c46h(l;UNMeLQfJU0@R2gtIkH;$J->htDqy#%%YUA8jNq)H>q+nDHRKBqtX>$ z0Ju+joHBY*pPN>=w|#&m(*sN9a0{pc+A@eA;zstE(a|X$+`n_3g%Zxi8)$cOB|s2K zXme^ksRcOSG4do)bdjm0ie=(7NWlUolZxa;g(IwxKM)b@N4@v5c&qR{5*|C4x(Nv~ zsvwNV@jX>DEVehu-9FyKx}NTei%hFJ3hwwXefyS-X4`6UL5He&j@YtP7?IAHpDLQb z?-Ux&M-!YfTEzy9r^{otiNdyx-jC;j>v9FNP7m7ALGvmjF<1xmSV$sY44KZ;tHyL1 z;-AOR!gm_4^SVY25|nl7d4ZF@4MWIK8gw-tzueV1M=5ppIQ{8)#*iq0&BvGp+;oE@ zCrSo%S=E4qtU82hfw*SI^5M$7JWfkIUuthxXizgFt0H&!TWa8KAME;O8#o!gEe zh2K&kAluWX^~PAo#AccmVj$sV0c{f1g^$07Fih2sUVJI+cCY`HS$T>gcSJj*u}hgH zz#jsA2bC*;Zt;{Hm8vMO{M#?U4&eIjK^DJk%nV_6 zXr{L>OVzRPi4=8J@HzIj>|L(rn=}QjdrC*P5*X4?xH{`%91-K&P=%a;SIBC+1khWK z0?FAbg00TM2+-Rjxy~Vf z>soWX6AQS|1Yi{PQT+y$J<_`by)m&~b*w65bw#RsQD8W&eQ*MLfq|XkSQ_imXuvt@ z2FyDQ>lL8DAtVwwa%+loVW1ze$OEVrLghV9WeXaN1@4oEsZU>c$)8oD})t(6%CepDU30Q&=D!rR22T0|EkY|KOfhr2*J4cz60#q;o&m{+3pfER}J@h*%DFh~JFaRRV52?y>$g)RZ zZb;}@Z@E6<^=iZ1wj5Hde_V0mM1VuUMrn+AOevctaQ`F zikZAM0#U7mR0g^e>hahR$$XZE%k%eePC=zYezCx*KK{r8U0O3M138|R(e*gxl(n@y zKK@Otm=`g9VF<>oeV7Nh!Sy-n@1_$0%=qRdi4ACQ2Tnv16`IqquiVhQ1zkoLSu45g z3vN@8jdaqP%YJ3K@2B$kOOMKLAOq>f=pug95seRqYyfr2=}Zy;bly8-NcQ+Ox$5=v zU?62FjH%a)$4t~v2Z=Y~mVhs8zX8bWNF@j25wiKpXF4GPPJ)}ere_~1Y~ zC0^c%M_ff>C+&$GGW^X`Jn|O_rXYAd{^BKy4R-Afh(4+}q9TBZetKa}*JT%58Wiqa zU`1-O);&n$J<*#cDPIf?yX}A>z_<-CF61;gE1@8kiOCPQ(^6OOqh&%|#zoW-g9^c1 z8KvN-703gx1=y+-OH#VCo0+H-o=G){1ZLJ( zeK%`rakGvJzQEgacyQi%*U{g)4N>x))iy9gR(m$$J)t;f(J0Z^P@gO3r+Xvr$mRw=;U8GR=#H-}M5eOhOxvUFtaa zP(||~YOtAn(T@laEdpY??{uCB*lGl{9wk^U1Z!KyZS3raJ8PS4eaAtQsz4A?=Sluu z0S7pff&iJ!DQOh&Glg{kH>Qp~apCETQPX!f`J zU+4j;g;~fEC5IFVAlf0?QCFuqo8WORlJk>YKQTJ3ScK^esTEZ|p!fbj@6pY5L4>fD z0oqh%TtAaMCwYO@n#5otsv1k=u!yE0y=rG7 zP%T5DC!I%31Mu)*$(NhG;zI|SnFl>P64B8hzHUw zd)!$IoR2h649VMS7zKO8c77W7a1fgs96hC!PFN;;7r{6a3dz*bOdw+D3We+8$tL*T zSAF!QEuud#xk9vePV};bslhG^QWVWaAdG~-yWuCJeMK)Kwjd;j@Bt{Lx%u8`3KzB>{`S2TR91cnV%Nl+03!BO*Nt#wX(i(jvOe{c%~g}TRl zo#?K_#IXgiA*yC^>r+sCNh^spN!x=eC}zZ`20D#QI)R$5ODb2uw=EyM$8Le;5Hs{AAQF@@VtybWH$jYPLNEyN)rl;rlZ8y!1w-$o@@y4HDxF1IPCQ!ybS%P7b5Hs^G!;D$2DbfCB$&Fxoq_cb`_mUpE> zDhay~cNZMd8!7j%>D8*tAU&&?>QjK>5!bkq3Y^$ONGR}+a9H^3#Y!=r`6H5cUfJS_ zC()=#wC#uJPpjOiKQ_KVdDN+a98#P6+QHz26@mn5xSt^2`h5WeSudZmgf5WPVroDD zeoPjV%{|rlo)t@zEl}4mR#(gHx)EyGaasmGgKP+bS3zKslsa`ZHQ>kTRi22B~0d57?65dv`Q?+#x(|NJzP|sRR-4qE+b#$(y2}J(!eF>^Zj;TU)D_B7`5z1C4upY z$;bvGjMvIHT?2P$4DY;pRh`JD5oPTYJYdjC8b)emfTgL0U< zKnQRQj(ws}Bkq>@HzDm8_F{2T5uRCe+2PlrqnjS1Welshot60QlN^5fLdN|NEY^{T zdnZ&$N&~7mgH_I>^MO;{#JkA^EI$V1Mv@PcmdKKU3~H6m=wJbN?b{^yQc2Z>h*P(Z z?`w`+!VpFm?gepRLb>SXY$%&517Y47TkT%Mk?ejcS>aimZOG;XOs76Ozb8mjl3b=? ztl&f(YHDk!OlCqVVbfe{8K3|!2vOOEfzKGwXKeJ?De)z2*r~k1u@3>HOD|ijPdT1? zCD;^>Ms^uJ+1hrFT3>J=0p@FCm=Nr`VmOVO{))u|?<<}Mh4td3rShCygG7yMLE*D% z{#-Of+?A=9Efjg5Vk6Nn9pDl3`lrsrgak{#1eSZvTDI`7&lxlIUU3T{#^rrV>ci1D z3=QXK8OTtUnxx(o9FS%1lz~raR~M2-3yqMb;fD~+$`1X|BR5xbw*rnK1cic|6jRLs zkr;RMOVl^<*eI+ccpLC@;3OLRr+?09qqx}_5XtKAM0Y_5+KVP8F>(0{^rGXLoKVux zis-EAvYfKu;CM8$@`&`2r7!S(>gd{7+=cHmo+Ea+7CzJ={H^Zd3bf&ObQHz~uiymV z(Q12w11F)f$vDax9B5~IgJ7#)yH*XS8;z!3xWXyd6ss;MgujIq9GX1^pH2@2Cf?#6 zB81CJPiX&Mvll+yj;KPG|IviDeS+c{_Ecnt~xR5+*vU*V(~wmOJifQw05 z+*23{zT%n`G96qV_;2j+;ew6es``)<2eHBU(R*UETsYm+5v-O$6{4pjXeK7pXI{2k z>-e0Zjl|IO_L|Ij^+kYrOVF+t*p=An@(CR2!QiP^c7E^`q9ph%yM>xk(8w}zikcWd z!&$90Af=}V*NCiucWyIUl@o#pEeXr;xjE`SKXQlWJ|jb3czqlA)zi+Vn7lZ=VOVD` zOhCgcw0!xP2Ot)_H=3n4DuVL>|7E-^9h_JKjB?r+N2)7Xz@JMy<$t~PD#!3EiMIif z*Khh-;nGLT*T@%_z@{QRTBikdw=t;^5M+a3Y^^JXD)D z4S5~?B9g!PhI|=Z6`m406!stK&L-cQwg^O|bpA?p;3%Bq3Qc$VIpLhBZL!64zQFo; z;>qjzSvYfFH!{x6Kc-;wlDyLcQ@jGqrUp)=e>yQN4n8ImoBcD<8k|Uqmf}at5RrUV zDtv73;0$*=Cj$c)4LwVt#r6~w+Q9Z8eNks!`9RP4@uu;kGx7RID$^Za<7x$G)j8Y1 z?YxJOkTRDqrS!YQw z>{62#M+U0cK&NsIJ{noP-INHfRC=h(;?f?Ffv^%%u!e8It(>GL^ju^`bV%mS7f<0< zZXEW!XCcFN5Jac>TjZ*Tpor|5?8#E1xy4WkxAU;uIG`|{kpLGKHdZJOm>b4wbb3?my4dRg^)4XTzS-Y1H>dgxK;HFWGd@3jw**@d} z{k506Ndb|83+~8pwe1g2*jxSxS9+hvPc72n=JvgB$m6z}fq#8#eLIV1${$yx)A9+N z?`GJTpM#!|v*~3ZP5G})P*`}k!7}pSfbj9wf350>WM_{@TxC=fkpHicTKwpK6)s`j zX$P$8Bby63zchFkfF=>;BqBVxE4v^jMCnI3{UkpNq}mOVTp1afc{zGSD6Glb0@o6c zrp2HBBZo!@0!!0NLnOEwlP|imn$t$B)|@6H!XrlC<3LumBrskq#_0ih=NA1Arw;Tk z`XqjG2TeNBU2USvmlb>wU^IvquQjV9o&=7xx{BN}9M7A7JGv zLv}>Cf5CAUB<9AKrPhz)n+mxJ5K=P$o-%HnC}c7uEmGkiECaFX(()ypg*%p6ZqLli z!*Kb3A7FIX)!{Z0A>2i-iD&n+aIN)GN~45AsM>jt%|#N^TbR62oDtz?sgFFZ>}@_4 z?jYAmp-FDO%bbb4g=buC7W@&uybVg^utxy+0zoTj-X8(h2z{0S3?*y4y3xlVn9!^d znhDOJ@jiFLPCo|_;Pi~f5d;{SvRCj>4J95U(2)+!_vohg+$OW?hnV`{vP*a>V5&Fe zwO3AnseLo;qkABNots?~0W}iM<#ijmQ=sL-#tw9a*i7`gC8e|l44U7Sqj+-5nBrb( z3Q_&RMt(RYC@*h@_X~n*45%gL17-#CZeTR#%RbXtz(rTr8d_V6S zC~MG5T8(|i3<|`rGwXNvOa*NEr)lKyF)|6BO<&=C7}a?;$GlpC2?Tr)WZ0o$l;nucq%Bn zhL^49_^in;4VWqNXsTxt6SKO$#Dce?gzbWACIMcby=h7|R>yQKA`#Hj5N>d=`k?s+ zr~#V#bh+Nm?&Cd~rKGi*&qDdGP?vz`#|xAUP?-@SuE)TuFg8#HE>K?&>QD^e6)Ix~ zo8=GWS`tSWZ#83qsvW40vblpF>aTcf(tZK-6HJ}-H%#5T>uHiM85ITfXfzvB?hZ@= z=6F#*JO7!ll|lcjKQUo>%?!)i@%nGgY^)2VK8FWW_aD@lY;(1z`OGO8XoD*L<=LDq zUFDSOq#lZQw+^E}zSkn_w2v<)9RJpG+47zbXiH%xybcPw_rr!qmz)JYgF^MxON5`p z%;>JYcw^K2aPy7NhnN(#Bv4+=I<~*$Lj0ieovGz=I&x;Hz6hwe34?vJjW+R?SZYmdCsMr zpHjaDUX0zi0@`}CJV9RrThNvBpb-2)m_zb{u4&@g@I2>S$>&XodKYi$eO{n|>caI$ zHp|LchHKA5jU(ZpLTxY-BS-j)nRU%NgFw7Pe9?-sOV#xiPz_ex_pVj&6GyYbey>5C z@Gj^qaO&dkuV~?E+e~w<8ku3$tXWH57M$IJszu+Ll@EnoIok@vY_+@sns8pIoE})Y z?6Y>+v{98K%d85=O2G(`@enisnc2M}8!IReU-DY^AifJUB>b-TdghG4JDgIq{XM8F zNX*o0f5CulTDvorD|t>ac;qwaOwhK23L=?1qZIV!RcETdvK_9nfu0bk$z%!Y2r250 zdJFVxrrJ`@&-1zK9Ja^DP=)%7f=#EiWT(3Ua0k~Xv*uD_l!h5w*Ja+@m@)7?M_RNo2Dl5cFRvlcrO)ZuR6)a>-;>*txP z`4oL|OSA3Hin>1R!Gml;;6wYN7j=EF*I@zFw{MG~H!p)W8Lzq4JhOxNeOmRZY;4C! zm@w#2>fc;jAA4W%G#mHHZvS~uOw06^Lw(e)nMuEpyx6hUws(^P6#7!A1rE*XE!jVq zqS&gC+2*LvzMXKc*RJ~5Ni*7_$ zQ=ZQ{E)4oMSW=r)fhmTqC4CkYm`&N~d3F{m!D~Bkr5ta&2=#y9Vo+}zm=3=08Gm0} z|22qhG0A1KIJI_BMrc+Hmods~(YU;>lIITQI5uhhtXtY{%17FZ4BTahP`%*ZLj8f& zbaCeMgDTJ$mIo+i_*TyNE6ZcsrX};y%wo?nFrLAR!VJYLJS3MVvDXrR# z0=E~~&HE?vcby`i$6{$xNLAhzFiF&UP6;!e)yo!kyMD+~!h&Hpxzk_o4YlF8JR7rr zd}Y9$fobyb$M7ij(EQ-X2U!GpwDm}$Z0_gjnb9BNbI>E9C!zAddzsdHLa$A$nk}Rr z%%z6GdtiJSIWh2=Cmt!u5jyUC97}izabC+*O7v$izXMD2xg$B z6c^;@zyh8%tLNFrao)>IDDAM&GrPXx^3Ir|n0W`p8flUR{8wsax0n}PzgQXQ=z`|F zF}G;XSqVx;<4q-7_v`!>Di<^6*zMAD;Vo-zJIRxzgN+z0|e4 z5am*1yaE)f$aDM+E1mpC-3Je4Q&|&Dd>Pmc|Iezesw+imUfDCO8!2D$mMzQO53}}X zE3;jqu6!WXWlA3)T(*2h`cD3|n!ik^RV^-ep#(?XLDG)*YGj)xrDDLT(#~Lj4@?td zC!V^b>}b-n)=XmCUZRzLak|!LL($xWB8yA7w6?#Q0XB74t!l^@)%cO=QxkpHkkpjP zRnvYh0}N8Sm9p83gVou&*s{9jWW%z$D*Hs8lVRi-WA1&~*rglkpqZqU1>^D+ZbSwD z5zyLMw|dB^L*v!ItC3MFNl}~VB)ix>AnSeohGmSNF@5c+(ka=kem;4a0Zf_q`CX>N zp~nbuqcTlfV|7)Y+2c$WDn!KGW$MgXA|tKktNHZr@= zTQY{WA!mdAiQkD^`ivoPM)y$N}_^Y{f3Bh%2U|Aib|1{4QR!=v+ZrxX*?=#G3*nCrSnA1+I@g9Be+`}!XAJ4NR`;g|P z^V_l~6+N`ipdKaDQCd`B;ktL|n>I;?%S%}{`g@j!?O_hv*}3FHlJ(wNIL5XcrrdW_ zI%HtV)9#VQbg^=$%Ql~q=iStwHjaE4N6^hzX^}(YZ@Jv|g)y}+{#BRcP*r9&r68Km z;0Vvt57pYsmu7Hd%?JbFyew z*U_973(#!NrtqHc8j3(BAb!zru%1E860=Aj)!Y0a6brVo(2ZC~L^SVd^_7{`X-ZsY zH;aIHs6(?zo(->^9RqF5GL=*@>-@0dMmo$Y%9X^uv$WFyEa((IxD}>ZlfgBZE*Yqw zMn5$_1f5(79f&TvmoYswfAfaknKWU3<~g5=YR|2XM-^#eR+p47`DF|g`rS;t+Vk+9 zYlchYtwM$si|p7I&GhwX2Lj>8UYOOC|8%VAvoXjuzy!B zdGvN%)>3*$ZJ=c-M{`LimgO3Lblm#A2BXhqLb#vImz@eF6ZvB0^~j<{hh%ikJw7c| zW<72{yH9&#uoo3{=FL0w!nS&QgDvri#c{JbGrgY-4M%U~;#WSJfni+Nvl92ES7TV| zGkf>gq-e+6GLf`ePO&FP9OJ_qq%`>m_N?9G{^SjP;qSc<+c7^zy+WVdW z9%jF%gYBQ+kXmvsuRb3;qPeO{u=Pm8VknZ9byhFqVf>YzI-W_1jrV-d@+)r7Fd0-5 zUU(7gOwsj`a0Z<_rToStFAY|$nnzh)wF`Rri{Wyj(IGVVL@`*D!isvU;1)!%IXd4G z2m9jB`+lS6+=0ii(1oOu{%5d<6=`+y6<*H+uh6X#+Uk_Cj2eg=e-o|%O&lOPv7nDidj>3A|ZyQj+eE7r?47b~{wdR0p?+Z> zY_dfC0#X%A4K3G8m08L;s1Gg4hCV2qQaBHh`!@G0H+uAd6;~Y5bQwxOM~o7$tD2zs zov8Z{3ZYs*-^95Ey=uM^1{E2X`uy?g~kBJu@tDt*jz1<8)bh9L(TMoh(55LU`Z#Hj_> zJ#jTzON7?S7cV~V(1WGrSEN1MFC$f-kx%Mg{F*1+Lui-2;AK@j3|k;+w~5J0L#){` zv=s4bQ6%lLEWL+XH`FJcO;3w6eKP<7K(Qgo&j6v-=ssi5mLZZ02ub>ZbfP z4li9a4312tWi`K<=6p_NT&|dp4s-nsF*mi89BCv+&*iv=r@|Rp=H&F%+#M<(l(XZ9 zS}a1aHdbmSMXb$5ia_U?73kzv)t4>IBYE{`!7U5pI;j=IR+^bVJ@5Qr z-nk{NGR^(OAq~yE|5w^uMn&1RVZ$&EGD8mCFr;)SARyq-jdX(pD2;%GfS?Rr3R2Py z(vkzxZ6ipBAgH7WDkX@%dxradp6AE6*0libJNh3C)r-?DS6 zb&gbeRi=s-^TaxxW+BlX1nYmZBa2NRzw&auK;>y%Zqs14>M0`Xw?EW=n0Z@ z-qZ-^t0yi-nLm<(i8n&a?mS*mu~ru#o?UyPZe!rA(sUlGMydKXnY2}{vCLuK}O+-OnYxUjqDeo3R$XcO->z~hBd@S^)INhgCRLVf#28jklj z!e)9c9CFSJRHZuIG67^ecMhT-sEhP*W>6k)t*@Sz({?+l@>hNSCMcsOKW~#4} zLT1wN@w(GW_<5hnfyf&=d{xFpm|`~Vxy;P+fHKChR(c*e*J}Y=CYO0xT27Paitas~ z&CrX|`^J+dPM^pt92bm(PFrf*y+qgRm=6UjD}Rq=cj#zVG>47OKT90WU^R`i>KVMTIUGYBBpS!Ny)$Zi^R6 z+$CG-Pg~F^kVp@(ONJFSr1MR0$(9B@Ezt3M1gq;&InOT}*UcE%Z@>&F8`! zScO6<9n`S^a$`6{`e!5ccP$!&#Y544f4 zXq(gIIyvO+{LSLNrp@ZU8rIU0sBn`!w)EaqX3-l~1zABeN z&v5LKP)b-@YcncL>>ncKi^^N7^L)fZ;IbN)QR$8@-)@t1H=(4g-E^g9;HhzqdN8Ll zPIb?>sPZ14lgZ?RE?;JtoAXa?L*{JdpJT45^1adY z5j=J+#HF4Wj3wQE_ectwCZKb!<;|Fxs0-?eB{B10ln}!l;J*r6zNK!lB&SZ?H+pQb zkmfP*UWwPF-R%5&2ie>=PGM{1mb&pPt{~}SOSY<)^y;?{o7KV=C72VpIz)mGXNvAT zI!|3?eDqUxiKp{IyXz|R&ZLMvkN68SI4SkHoE6(pZiOdXW)LlL`dt|*WHkGC6R)!go`dd*%-Ne7AE-J>L(g&ErFnNX^F z&zQq4zHRg{hjjDWh5?5*jr9apiZ2zl1WrKiQpuA%j6;|dqs;`Zz_CKJeMY&$w<4~R z0Uj3tHTi9)*N<9gR9&rg`|%YV{dkS>-p7t_vQ{7Hwbu9rD&vsU8j>T1J;Do(Iy)S1 zdz%&S%}zHocjm@Tn`CG`Hor!5 zEKfYkmaSrf-+o9zRE@1RbXIjsIzK+AV|4enz(xr#@9~jFLGYo^NBXszM>lLYYFSlCX;QB@RjNq#vGhsPZP^Z+iE%lT*ymrJ;R}fw)Z44;L-W8P3_`g5h zj>vvk?j;(zKdzz$*HNh9A)71etIO!s@gp4~Y9S@(ml`+qucKkuobv}l0C&3IzN)yF zJUpm97kueGleI& z!Rcyvr1Q{I&z8v>1x`I$%~M}>*_ch%*={y@13iGv*y=7fgIgSiCbQCvqh`cbWlx|s z{?dhuLBZc#59~kOTeQ8m`SV0-rRE^O+eNl!F&4y|b$ZSA@|#EQ%8#)DB<=kCQV z&m|j+#sVIpjycry%7zE<^3E|X=~eU>(lOTM?O91qazWC>pNC0R1cn| z`Hid6)uLgoyRLWh$GabvUsAJi`B}>+=MPT$cpo1gKF(~+|FA5LHoZR`M{0ll+P3<< zm5a@-PaO02UYKPb&@uRVeSz=dMOy^o^4YxNy0q7p2u6HoJzUxGgi0u zE_bDW!t}^@uV|IuvGU%`XuBRAL)FWmxXNHk!N2vuw`mwW1E==-tp6iWB0pVBZ) zP)+anxjgI42+z!}D%dD0u_V!!7Idv5&fj(|xAblAkweX~SmdF*cKy23FF0$SHqO=n z%7#}!nz;3=r6-Fj*8|=23RH-@D^nfF*6MxqeUzH|o1a!dC0P76mnK#Ev#E=j*DJSY zI2&}>82wYy4bpncykyr6ZX3mMkgx0^Nx%a~#u5+Cfu)$`XEKC?(E{+&09VRi1~#L= zsMjQ2#(;RpM9Y@6Ptms!|8*X8Y{XpMLD%$awRa?MGn8#y?uWk_<7>}1^bPP9UZq>g z=H>BGdgm24yj*0ecpO;ZB+CD?i0PKxUcN!%`iH{plr#XYv|EYD-#C_zSkd$>Z&$Ol z=yp0;zv=0KJ=MRYKPfHQq_@2+dgqx=(ay!nQv6 zkLN5lduUJ}*W9NYX*OKZh70pb_}k6@MD09l*zOWpb)kuQpJi+VEBKWyRdY07Trlwj zcf}nYOX?q5(K19dsr0=z3-iNuNg~xL_Aidl|BSrW)41}^W3xD&WDdDPJq1_E=_}6q zK1G|>yh4&AlZO)hDiRm|*q*XIpq#ii=4QCxBuSTZ2UDj=NGblT#WX*DncamCR1?iMoWIHt?h{>%UOy70}p%Kd3HY6)RncN{KFzidg5))5qrCs;MI@^=x-d9uq z_bJzTucg@-el=-bYYeU~skOW`XZEr?@afV9Z=H;}jhnT&|E+C<3ubvN{!;_`9=0A? zZKGyad8J!!qbNOixBb`InPuiet5?ED#<16{?8kUSP3BwV6x$!mEjiHNx>Ww7YHiDT zlXkPLm?3deCqQRm)R*t~=s_}8N~4qP*%JL)oxP(iv4)nqi2Ve0*RH`Ew^8RMo6$>C zX6zgCI-5H3X3vhJl|K{>cAA93eXX7$*xkgK)3i-0zU|Z60=cqL;HrwIry2it9A-@$|`*0N5HwcegSh>KmZJ?pV;hDPL zIMJZqDQtS!vPdI4+!0c&-gU?_J`JF3uc}kV~pIqoK{# z;m~++2{ST%NpGbb3u_wSQ>eNc&`*I1LNcvAQMfp&+w;}bcbqHFVw zra!FWDz`hflSfNmdNRLuqi;0xHcOJ4%NPm@^m-7tPlkm*vvoWHG5F2+Z43GSQ0C8| zYkONU`x~Tw`vX|kyRWrt?@e4$NsagD;BMj7BWw1`^@}6bb4)KAijqB=r`ND`6l^fM zGJUY_z-wejh0NF z6rL#G3Fn6kP#IPyd0d~oNAKZF@9DtuO=rP0PG$b$Q>$*X)~vR9{~g|8Y4_%Lq8(ok z-5+MTTo3J{-U*@}m;Ckg+b9qchNRg(x!0}g)G{D@a^3e~($p@*(}Dbz>w;q)me<{p zb^dT4;#n+KdGu~;e0ofkc-ybY3+oy+Z)8n4J+^7w)53qqWc=3jF<_ZJBP-9{fEoph zQ|i$(<)wM#ejAI>86}CRGOE5bsJg_qERz<7!{dCKnK7d_d3EnY(QHe0*4nnFfL-7< zvIPB=P{yaUCv?qm0UYivCD_dBE0XS3;nQX@(+4lFI9JE!>~w1rNO|@Y#!*?BV!u3^ zJBpQVxY<1DzcVvGW!3CLrIgmcIk>Mu1F#=~*ANZv)P;IAS~+)vxKSrRR(-y1bD?t9xdwN-s0#H0f65%x z0X1QJ&lRQxu_)r7%P|~w0HAkLs4x>#c&%@k^T7LC;~DUWA!O+D)YN-E zEFfmR%Ocnoa^7*6X9V~KiHSBoXYhA!SAwSmz`Muxk;YAOf>upR0S@t1>{b$=IIo@L z-Qd4}PD`TdTYifV4YzE*2wgctk1$?SExS&Ghx6=lOI#zB$Dsg!CXRU4{J>#V7WjgH zbXMaLPx)h6F{Xgaa^zzwB#*<4wz-YoxYs1{-NCvM_7*^>UZqP*a5#D-&a!V}h0h>{ zH%kW1H_i}b#nNod(H*T{7}Du&;`LK=tgSf5;07;U`@ zTC=m1PcZY5iU*G*fFA*vn#7Z(T~ojSM| za%P3K1ei9yd7hL{ya22G)T_8Q3lE+uA+LuDP0@>0CV(d4=Wjqw4|d7+(T>ofG(9>z zG^z>y;tkmuI#pQsvea`Ngve(c`ReN`eqFMnZknwkMQ?l=@lX7Yhk(kBt$Y&o|DKWn zfKwm2qt~$S2QE!SJn(%Uwe|zwqQZ0UH`u#)P!g+S<>XsDD5<3_;aNAnMTzH`hwYpI z1|LRsaE3>V+Md z0FTw!2zSij{O)pB2)|^ z$Dt{cXKp|XwFyBG4kbW|PRAMmU*;=JOH#?^#Yk-e(`S6!Ly$rHK7890DwpA@__kAF zSXIHZwl&*$4fXJC^Mt>3<)vcCmAe-|d)7AfIh(uUgb57b_QPIz0C;etl3LD-;q+?8 z6`ZTJY1hj7Xevn9GB>y$IECD#v z5jx7WpU+Rf^jn_F#*;O}xk6!Q0l}%bwh(Xz>m3Zr1>Oas?Nj3afb)hd-SMN9DM_#b zECh&;cxD*h(}=I+lQP34hjjG0&UuEP7mqy4dx!#G(G`eixLG|NA%1|hD;J)h;K7lr zj5*e`cuZiM>2L3)TLAfmlj~OQ#X||92O-e*6<*>;MX!f{xflVRaJ+N*#tvC{;9+Ws zy!jQ+`_6_KxAp4`dpp8Fm44L7Ut>MRw$xYr_~bZGd6 z=)>Df*sL{6c&}oondw88UjwQ2yyY-sC^kiOSxp;a%6KF0e3HrHA-2H_e>tLp$3M zK78|ExFS=Y0MuWm_UQ6H-t2yi?Fm`((ssHx42BK>6Vc*yl|QiNf$u`j#rJ2})-*Jdu&5K)=4<-LUB#rSw0}EQx%euvU@1E zS>bT_?SrB9)yd(C2PLp1UPmx(Nq{(uj!Do$3s}&cEhsC~{_`{=5Ir@O3W0Lc(#u0t2Ii>ly{ zjV}M&YIFKo>aSy!wp<#)r@$?w?W_#mYUu;f^W@0FqboK1@qqu%%IMAXB$y8mi&uXq zA>X=)k6iMT&cZE(N=eHX@a!{lDk6U1 zcfoa}iTR~5d{%D>^YP}!4M3f^s69Il-$^wQ*=1(X3P`R#$>Xxm((&zZ+yC!EuLAyy z_+>-lFL?IEm{LWZ6CRfS$&K*SM|FHN6w3ZmG841GHW=0pT#;IkZm|Dd>#h*6<4=*l0kxikeL(j*FeF zKu8_}g2Vcc{ccWIhXex0bwopWG>N%CL11Zk*@s52mb-_yVgWTx0giMzgbumh0^r_- zHf9xPKGb{`G5<=xJgWc^#1iQH#J>vFvY&c5D1&29k>V(IK!V)qTG;IE*$&2Y49MOq z&V&`TS(1cvsyJ>g3oacHEB@2KEOq_Ec=q<=fz2on+XwbI9*{Z>Qb~@PjX{pDH*Bim zr$i_Pex+z28 zv!CzmTW{-?B|P6ea*p7q1`Ya3!`OP8xG|*j+C_h1ClUQH>?KNve*|?Jc@T;&Dn6J= zQp!;bNPUE9)KmYvJEFRaiXD! z)mS1Jt9IBO5<^0a4>)oS`&8RSwTV+|Xk$HEFCxWr33k$!-)z?ob$Xn5I5p5c=Rd*|Yazgpsm6-OqnG zL7GX6&w*(pMH8JF?Bf0zo*8tLy}qao>6mWwp5vrd^*jZhg7+dREdTFvjSTe5?~p{b zEd;92I_A=ymyg5DMmE?6+qkcDIL>-Qcr;>gdH}Hm{R<&JAci?~w8K&K&^h+{&9=-; zG*DmE;*&eEj3iVClSJ^FS7cT6x)DJjKm`mbtPARKAnLtN>o;R8!g|M`nh&YT5sZM} zektE}AKWJYvH;RXjmS~ypGr+pk}?^Eg;%k+mGVNicIS5xOpP$&L@;FlXgBd*_tUjY zBk1oh4is2`6Q-c_Ss!FYz(z&SFR$Eyfubm;SAP;gSNcPQ1>lgEH0ztWk=EQB5r!u~ zfnk>Y~D|pESvBzxS?d^gEQ148k9lof|7|CDP&V@!{9z~pD5{L zhe36yWJBD}9}j53yt*FBv&8i`oBvQHE0ha4>g?bNozyJ{C*hZE@$t%lA-OCpfAU9} zU)<0C4;AyuE_Vx9*WU>7?-uuC-g#2z-*X_)nda_wi`uvFeexSF-XwXS!*N*#()_?@ zp^XH>g{GVUAXtZm z<&VpKWpd|IE?D~knJm%-RB_-Im+OW{>0rt7wuBd3HB-VKJO`Z2SVyY7{hA zl`fSBJ^!nLG9cw!$$KHhNA6}J)GECtjR6aDkG*;(EfL{0sE2I%tcSA7 z2a@x7j{RZmpM4eI4A4<5KiAd^99@c~$mgQ5=1iu?NK^IYJJx7wzJ4+xeH{`@^6QxX zHTD8h7p9Qp_i0lwa(8wBt6tp^&7xe$fY~&=5yIHTAgv$Xi-dgT zyb5EuaD^xAXmxZWQ~nB}*`GHL{hpxy(VuR-fIdg~w#qc-d4~Am>6G%)&sFFwpS%f8 zXux$d#Wx#-ufjoLg3Ww^#?gE1w*k{|Ch@OeS7+I`*b;T6fPJq@?6fN-8~SQ~Lg ztY>#Vj;5QS27>G8$^ORQR16tkbT$w#^1~*s1CeKHXXX%V2fc_WMQfyGjMj9_O%KAS z+5D2${j<+AgvnB)J**p}TH7u_GdL00J0FyUD3tpO_2pqbr^QP@Paoe5QK>{&q!5Jn zr}zo508yfP)6vjjlKO}+bI6pwmOx)Kq~}kQ*3$p zr^^eE(#;I5aAyZKU1Sb4*6oLcuDIEi2tM`X$i$aT6THrl$?do|Q@d;aTn;npuC1~f zFl-R7LF+cvj?>ezun+E3qes2Rzj$<~i$_9@xWnp`v!7RZ)j%y;>N}kg3>nUE*l-6B zz)Uj22y+f>W7N70cW%*Kbixeak=S!4jbDJHjMu^$jfT7MUE|e8{1zeBv!TSMP-KQ$iYO7#nN7zhVup!aTH) z3;lOr2M4xYUI`}_bL57aAq{N13azF66TdE0a(Gc3Y8^CI*%O%`bgP+%KfA^?e{0Ir z%(|@HW`2G~Q$IHwvE>`eQS{`R>?33eGf>v!hScbLfA-LQkWWxeSA2(xE-{&)UpZ30 zNL!0q-2Mb`WwfTQwrB%h<*LLcDJB+flD=Hm#kH3IlD5tb!AEJ$uf|GY0<@_>u1D50 zV!UsL-zxqalS+?JVoGn2ChBreE;obvKnB8i4_ z?{NR|JlU^7HKDudVg@}ES$Aj?PCS(S;L*A`=21%%E=S>4fs@esQQ>4X-=Au^14Z@U zOQW(2%n%VMf(}LrHVN_R%dqFuVM3hA9E6G=JM*qlF<>3v<#8rUVakb55X$gXnf;megFKR!SmJxt$Tj)%s<50rwiQM>d^Gda{CkEeC2**fx!I?sP( z+|tWKA}$fSn}_rUv50YsdVmc?xt9pThp}JRePD27hU*ze9pLPn#;t$k%9X$ge-c20 zsa>oe6mlKuD_={_JrBeiO`?>r-f@ExCfu@+Q1c%boEhn&dxxP%VL!1HK-4GW>YW&1 zH?ZS~`^{$i4|`NYb@V-m8lJ8f9!rq=3W@og2{>4UqfAkw3SXTMMjWO3sXq(QK(?sr z;Y!$7FhhHVg9QOG$Z|Y~C)*n!ar8!`@ZF)pS_cd7B#y@|3Yh;xa6dTN@gnBW9v$gd zZiRq6vh};$E5m;To)veMh>SVBq^Du(Ptu%!H&wxKh+BBv(*mc#1>{Uetv>2wo^+(> znle%!QezZ5E;b=(3$gat&k+@Iqrm)uMt_SP>$x$)stpbs{1PKT_2)dz`keXgS2}%W zS1S@ZAwy_xr}o13xxN(1jAXdEj7I~}gszhj8NL@D6q2@QJK~ku_U>KSu_gaX6Oe&h zGg1S(ioJqSm|ht+2pjZB5zYz0e1Z9W$;{}u#5?`(v?gJGBIY?4s|QjJ8tRF6|KbBEP&&<_S=PcQKA>UuMrXY52^DnHFcRjD-~UPOZ=kXxr}gIhTg@NBp) zBtDZtW{f3{KRqGyz0&@a!0>XcYHa2_OBzs>L!lWlRp`R%fdW(WCd&;N@qmo4G_;cn zWv~mb>}_)htdX^PxhFw9=>BlU_!0U%A2mR6#VclctMsP)vE^{c_dU?ymz*K{TX(Xy zp1gZ5uPF53C-!rCo|m#}p5`O&OCC6dA@OL`k^JQTaa|vx54HBAlT9Vx9he;}&vhOM zGhw{l?Wpr#|6HZG2e+_`4WM@>1HyMeR*{I10s7VPO|X(5n{PZ>yLsNI_{}I)^Hbx~ z&=10M*g*?ZkBSj~q*jI-kAWBLh;NjK+N2<6@~?RTW^#wwq#&zX04G2z9u=nWq~?C@ z8ab$w!TG8M;lhWA2dZ<^pCYAJ&OzL%P!<*mdq*d@2P2%8QZE@a8b&smB>iqOj`Vf9 z)+TZ@v-&raIIpQ{D$-D|-89vIva4?^kzxp*a$75kC_Spz30n1z*>unhVNtd@;%YTy z8hWs6Gxw%tff3XJn}~x?-M`rZn=Zn9Ck4+d76?%_l|r1UnC&rb$Asc{>wOmVrBis!3<#&LQ! zkSaCgWV>!S9s;ZQfV&$1@{rp18qeDpBby9hVWSLELf zo#g72 zFAt0MT@xzn8jq(c-vx!9PWjnc3-60`ytccC6PE%qNqVQc-J}@_F%8^wkfQp_P9E9u zc@s21n2WK!TFqns(28S_=xP4*TQF5!bi9H1mpKVd1k13&r|bskF;9f)LE2z*-Ok+t z&H!P<)#MW~|8GX{9|B(v=wiwj90l_6n)SV;nB5%s!M7H=##p#rF~Jo_ZbUpuXzSMp zZ_jU@!_dNYvs|)^yPi!l`EXuNa-f7>JqJeH#RX+#(tw>j2mjVrta_a0%YD<26wc$JvYt8)m#j%>xq}&YQ(U4uV zS>b)9tUXC>D?~NMW8m!S2sicE+^{&*X30Sgh@EMIjqVjp`R8Brj52uc7x7d&76jCYS$d`4PL0!BW0yzzHZsirHEiWM8D zf>R^JFDxn_TE!K3Rgo>H@_gw(d9;Yjc!a_5Qx*d;TdyG%^>DW9*2 zUL)CTLeq~Ebl`HR29{$DMDt!#Tl^rRzgN%)jiHT-t|CUTXFSl-1{*Hk*l2c0hOUV# zHm~fD=cV-h9RpPT-*>)uw1JKnmM^TZzJ<8$DzqJ5Z&j3I_|O^5dfn6`Jj@ZILqZlU z^;;(M58J2bg=;WOcYnmUq5MH##1kO3h?_jvL5l+G0cAVSG-KIKr({47+T#=cr5k3_ zlfbn#`{RW<8QK#lcGA7y9XD!G#Vch4-_N18P`M#~Zu{K0EGfCEb98^iy(fQTv1uo@I&RAvG)nFu4U<$9iz|HiUa;)`U| z5uv|cd$SkgO&3AF7MvubJYJHqM&Kf>RAhWQL286p3-#C+ z26W7O_uo0FNd2yAVi5T?iaqXog=AXplyJlqlmU9O>V<;Lkq%0E-h~AXZaz|n-rO)A zov=yx%l3!yS1y24usi|t8x8y!jL;m4MKFf^I{2U@~c;bZb4(kAC2*w22I3b*IKFmRcxi^ zt8aOMUiUDK2xXkb8Yp{YUT$kQQ<03L@-IH8l^GPO;)*}H?`l9)qQcwnnE1PMIqDcC z@q<*5fkec^?SF+_wujH9!7*Pz&!uBQWU{pHF7nDs#l0zEe$kc1w8!=wegDlrP!&RkW#vJ?xd$?dsT=*qp8=mcUyUn2M%?{L!V>6^Z5{PqAOx#IdDe2gfY;p zSIqHjty&W~xy)Q*pIyu%>T#HY$ zVx1r2sr9mak|8uD&;I5ypIHudo`3XP3B-=j!#%<)8llGv7p8rq2Q^tD>N2+no_g6H z=U+CrX_Bigsg>kWW@?JB(U$aA0h;AC4?zrg+qhmSL?Mk&mc#NA+(9mu@kaw`T7 zb@(f-!ZEJvJ^BpW&-oiI*_3%~@Hv+izNa@+m!8MqEQ^2gndCw2jG@+25h^W0hC{?uMS3qgeKes#}akuZHu=C zF?IUhqd~D0EJVz$|2+jDUZkM>t+|;h3rt%ts%gG>f67RNS%g}Yjk3^P5Y+?^Zc1&m zIQ8ycMTjT-h_%NRajV*-^+|PcSZ%rsDL$b8Ii-Mo&!1P>RCSaw-19*!Gl(aiutJpl zAWouAI>~?o{6Gh?jMV};e^NhbTWyI?Zk^f5_9oK-<3FXu!I~`4xL8FY4lqDe!Hg{^uBoOcur%xFl2I^iU7}6m(jiW;O#T^PM9lR~q)O zYcJ2A!wKB}bIihBsKybuH((E&0UpjD0y!O-{4hw)nM_9rjjCDLc+ZHhL*atY)G+c> zg$e9RkyZ#=AozZF?!QXqz*o3No%b>OoWC+}>*qhGhYczreN+Me$K0}e#zv^(A=b5K|v2};Je1-kuQS)6+ zly2F`{(qJZ&59}aOi-rc%sqX5NIC4YDhM7~nKA94Re6V~eo3qi8wYrK_pkIbctO|n zWBb!{YF5Acs-nyNQ#92pDl*>W#H}HR_Uwz`)x>|!7{-z347u&sE(g@JLO%8@rVtR2 z=4z=bnMfee{>o%_g%s8xVrIhS4slog^$0>UDPbtKWTw6Lu`(wY>u&V9>t;$GpTz!i zp6m?<P1cN#~#YELlt?7&`c$$rg^`| z(qaedUiUYK2;fxTQIgPpQ}QZWaHIDrJ3{b1md@CJ^j0;pYM!8;i%WmWNa$?VsISoB znv2PLQw&Oyu^$E|K-D@%~^{|*!e*)#gC6rA6`Kf2fs>C*9YOmAvv36aFu&d<DG^};G?b5iXGk2tL8@VzE2_-#^3#5j^MgjJ?(NRhKWb`)Yku&8tEed}+prFs+Fu8>|0kC#>ozoCA6 z85M=UB}8*!s!)Y1U#*EUW=Ooonu`gF(8b(spmc?|7TD!xnw+n_jDY)#4nRtrK`9yW zbE@&U@qEOfOg+$MF~mHaF!`I;sqb;I(Z)Ht=OaW|Pm#mI$S6mOsVfCH zeq-2lR8;SqN!(feVJ-{KV4w=Zk|)q?uC$a=FLcosl!~L!J4chEU8zR84>Gd@x40V2 zB{<`sS!4PpAy^+I1cKE1(ONey1X&Rkx5{`@ ze+5LGd+vP0VfxQZQKjv?edOAKUVy3L6r3@DqN(8`iM%cbh@WRu|1lfdBq2Bnn)dha z;rQX9I{JWL6ITg{07C?q3>8vpy^&Bt)MqRSt}Q`XEqMI++3}Da0;BlpU7FSm-^Wm(%VEvMbpB9xYI>82{H z7jC`)vctI626YKIIh}S4F2g2pr64Yz_23J7$ms|p0ZP0X@?TSPX^ZQiUy^UE&hUI% zfWPDn*RsxhV=G3?Df?oOvtioQdns^7&{;R{*zOWf{&DU+-CF3Y^_9LIyjKxGRA5LR zvZA49h*(g*Nx8GT$O*Sr3)HZlrNaALN!vvN{MsqWFdr6n#(Yjto7g-{(cEE1oZ{IX zmg8ryUXFG(WlFAHx?k7WPJLiHu-3jKj1qu?Xu;(fF3jJhMYdHL5(t_=VJ^4p^1L{l zrN5A?m?-GP9cvm%?3(_x0=cZ@5yQ1xALhMm?;XR93M0TsV~jo`w5jEqgXR(l2q91y z0ttRh2vHz%h(+8ZqyGEwDlO^r?}?H5bFa9>qr<%I6Ym+2;y&j>i;$D-pMCJZ-y(-- za4H`WTG%W*f3Z7PZwsnvAZ;=$noQDhpW1}J@8$fuJf-b_3(JM?<*YlDph0_?Vc^v| zBm)jg3-p5Ou}xtn!sHzK@#9F2Kp`Ux4gKy z^gvpQRA$B-?YLwA=i^L7+Jwq38pk%!R>|xJ=gF+zLue%FQq!FqxLGjTASu~a;_i?! zIgBDQAf#;PcXTIlD|5CM^|8HUf|0!lbFM-695oloR07Nhq(5q;Fdn~pQqTjlE2S6K z!(*g|Bh0AJmJYNUXoMC*X+Bk}Q4Va>U`^=F2H%l7v8{%K)&0na+2gs&%B83)m;d>R z5G$1OvVXV(?l)(hcA`*WyW#p1P>2`hCL)9gk-`|vzJ+>Tz!_p12IH_BPiOwmD8-Y*cF~0+26PZ;Gz~(OY?@b4 z6+eSx(uqjQE&z3%IZgp!oBGVMrChq>O z@-lvLEHn#gRZAHv%H}VAVW<){U|<%{2K7mBzw={2F#26EMS_2q1HaK-iemm}oTU)w z@L0*PSe-3L%YN9Wc>Ko^fz`cc28g+w9)`aK1wzk6#;i4B0;rIvLOWEOf!9r)Gh2L& zf@s*QZg5@cYNOZg^fqvz#*YJtoAdI(^yW!O`tNQ~3Tw~nE0<*BX0saxCQNuE<{#%; z=C6J32-P5~PWEC8+QO+1c%X<;8Cs6;1Cljjup;k`N*V zqJ^1v{@;j@gEdD}|GJ^H@poA&6~-o~MTK9tkEOkK;)sN~_DO?~tAEFoj%{~NPH)7_ zEi8j9wale1=%W`>)@7@(eUHCdA@7jQ~lkqdMlZ`Og?@UG=z|9 zQjgGY*U1SFx+xFx9n%hC(0BL8Do=64AGJocjar)o{WBFph#sV+NAp3{^gVGd;n<#sI$QvK*dS(dK8xw?J&BQH(_syWAjILf*gPb$AOU^aW|8MTD zOUlm?rRfflJ5wzhv(Cd>NuOHuI}bC{c1(w}ByKy+&-vUhf3p}`E`R2Iz?u3O|Fr@z z*vFpUN%(3L=S!;7$z#^^CHItHuYA;J4|8ZHyB^wJ5$JJVtQIZVs}BCxb-qy_5l;{g VL_euwWg-B7wAA!f>y**q{|8m8?hyb0 literal 0 HcmV?d00001 diff --git a/docs/memory_management.md b/docs/memory_management.md new file mode 100644 index 000000000..ed1cf41b6 --- /dev/null +++ b/docs/memory_management.md @@ -0,0 +1,40 @@ +## Memory management + +The Spark-RAPIDS plugin manages device memory to effectively allocate the limited device memory resource among concurrent tasks. +For memory management, the plugin tracks every device memory allocation and de-allocation request during processing. +While there is enough memory available, the allocation request succeeds and the task continues processing. +However, when the allocation request cannot succeed due to lack of memory, the plugin pauses that thread. When all of the active tasks have at least one thread paused, the plugin starts to roll back some of those paused threads to points where all of their input data is spillable, and let the other threads try to complete. If every thread except one has been rolled back and the one remaining thread cannot still make progress, then pluging picks up one thread to split its input and try again. + +### State machine for OOM handler + +The Spark-RAPIDS plugin keeps track of the state of the individual threads. Note that one Spark task can use multiple threads during execution. + +The thread can have one of these states at a time: + +- `UNKNOWN`: the thread has not been registered with the tracking system. +- `THREAD_RUNNING`: the thread is running normally. +- `THREAD_ALLOC`: the thread has initiated a memory allocation. +- `THREAD_ALLOC_FREE`: the thread has requested a memory free before the allocation completes. +- `THREAD_BLOCKED`: the allocation is blocked due to lack of memory. The thread is waiting for enough memory to be available. +- `THREAD_BUFN_THROW`: a deadlock has been detected as all threads are blocked, and this thread has been selected to roll back to the point where all its data is spillable. +- `THREAD_BUFN_WAIT`: the thread has initiated the rollback. +- `THREAD_BUFN`: the thread has rolled back and is now blocked until further notice (BUFN). The task will be unblocked once high priority tasks release enough memory. +- `THREAD_SPLIT_THROW`: a deadlock has been detected as all threads are BUFN, and this thread has been selected to roll back, split its input, and retry. +- `THREAD_REMOVE_THROW`: the task has been unregistered while blocked. + +The thread state can change based on the diagram below. Note that the thread state can transition from any state to `UNKNOWN`, but it is omitted in the diagram for brevity. + +![alt text](img/memory_state_machine.png "Thread state machine") + +### Thread priority + +The Spark-RAPIDS plugin uses the thread priority when it needs to break ties between threads. See the [Deadlock busting](#deadlock-busting) section below for an example use case. The thread priority is currently decoupled with the query priority. That is, the threads processing a high priority query do not necessarily have the same high priority. Instead, each task thread is assigned a priority based on their `task_id` and `thread_id`. Shuffle threads have the highest priority, and thus are always prioritized over task threads. This is because other task threads may depend on shuffle indirectly, and this lets us avoid situations of priority inversion. In the future, we may consider taking the query priority into the thread priority. + +### Deadlock busting + +The deadlock can occur when every active task has at least one thread that is either directly blocked on a memory allocation or indirectly blocked by shuffle being blocked on a memory allocation. When this happens, the lowest priority thread (see the above [Thread priority](thread-priority) section for the thread priority) is selected to break the deadlock. There are two cases of the deadlock. + +1) All threads are blocked and there is at least one thread in the `THREAD_BLOCKED` state. In this case, the lowest priority thread is selected among `THREAD_BLOCKED` threads to break the deadlock. The thread selected transitions its state to `THREAD_BUFN_THROW` and initiates the rollback-and-retry process. After the rollback, all input data of the thread will be spillable and the thread will block before allocating more GPU memory until enough memory is freed up for other threads. +2) If all threads are blocked and are in the `THREAD_BUFN` state, the lowest priority thread is selected to split its input first and then retry with a smaller input. The thread selected transitions its state to `THREAD_SPLIT_THROW` and initiates the rollback-split-and-retry process. + +If the thread selected is a task thread and its priority is not the highest priority, the thread will transition its state into the `THREAD_BUFN_THROW` state. Any threads that was just marked as `THREAD_BUFN_THROW` will be awaken to start the rollback process and initiate the retry. After the rollback, all input data of the thread will be spillable and the thread will block before allocating more GPU memory until enough memory is freed up for other threads. diff --git a/src/main/cpp/src/SparkResourceAdaptorJni.cpp b/src/main/cpp/src/SparkResourceAdaptorJni.cpp index 31a603411..e7aae0e8d 100644 --- a/src/main/cpp/src/SparkResourceAdaptorJni.cpp +++ b/src/main/cpp/src/SparkResourceAdaptorJni.cpp @@ -80,6 +80,8 @@ void cache_thread_reg_jni(JNIEnv* env) // again until we know that progress has been made. We might add an API // in the future to know when a retry section has passed, which would // probably be a preferable time to restart all BUFN threads. +// +// See `docs/memory_management.md` for the design of the state machine. enum class thread_state { UNKNOWN = -1, // unknown state, this is really here for logging and anything transitioning to // this state should actually be accomplished by deleting the thread from the state From d61032c57569b9d48e645dd8da658c0558cbf6c7 Mon Sep 17 00:00:00 2001 From: Jihoon Son Date: Tue, 3 Dec 2024 13:55:31 -0800 Subject: [PATCH 2/2] address comments and revise --- docs/memory_management.md | 68 +++++++++++++++++++++++---------------- 1 file changed, 41 insertions(+), 27 deletions(-) diff --git a/docs/memory_management.md b/docs/memory_management.md index ed1cf41b6..4f67fae88 100644 --- a/docs/memory_management.md +++ b/docs/memory_management.md @@ -1,40 +1,54 @@ -## Memory management +## Memory Management Overview -The Spark-RAPIDS plugin manages device memory to effectively allocate the limited device memory resource among concurrent tasks. -For memory management, the plugin tracks every device memory allocation and de-allocation request during processing. -While there is enough memory available, the allocation request succeeds and the task continues processing. -However, when the allocation request cannot succeed due to lack of memory, the plugin pauses that thread. When all of the active tasks have at least one thread paused, the plugin starts to roll back some of those paused threads to points where all of their input data is spillable, and let the other threads try to complete. If every thread except one has been rolled back and the one remaining thread cannot still make progress, then pluging picks up one thread to split its input and try again. +Effective memory management is crucial for processing queries successfully with limited memory resources. +The Spark-RAPIDS plugin leverages the [RAPIDS Memory Manager (RMM)](https://github.com/rapidsai/rmm) to handle and recover from out-of-memory (OOM) errors during query processing. This document describes the mechanisms and the state management implemented in [SparkResourceAdaptorJni.cpp](../src/main/cpp/src/SparkResourceAdaptorJni.cpp). The plugin tracks every memory allocation and deallocation request to handle various OOM situations. It chooses the appropriate recovery mechanism, such as spilling, rollback-and-retry, or split-and-retry, based on the situation. If recovery is not possible, the plugin fails gracefully. -### State machine for OOM handler -The Spark-RAPIDS plugin keeps track of the state of the individual threads. Note that one Spark task can use multiple threads during execution. +### Handling Out-of-Memory Errors -The thread can have one of these states at a time: +The Spark-RAPIDS plugin manages both device memory and host memory (optional). It tracks all memory allocations to detect OOM errors. While an allocation request succeeds, the plugin does not interfere with the running threads. However, when the allocation request fails due to insufficient memory, the plugin pauses the requesting thread and allows it to retry later when more memory becomes available. The plugin employs several strategies to free up memory: -- `UNKNOWN`: the thread has not been registered with the tracking system. -- `THREAD_RUNNING`: the thread is running normally. -- `THREAD_ALLOC`: the thread has initiated a memory allocation. -- `THREAD_ALLOC_FREE`: the thread has requested a memory free before the allocation completes. -- `THREAD_BLOCKED`: the allocation is blocked due to lack of memory. The thread is waiting for enough memory to be available. -- `THREAD_BUFN_THROW`: a deadlock has been detected as all threads are blocked, and this thread has been selected to roll back to the point where all its data is spillable. -- `THREAD_BUFN_WAIT`: the thread has initiated the rollback. -- `THREAD_BUFN`: the thread has rolled back and is now blocked until further notice (BUFN). The task will be unblocked once high priority tasks release enough memory. -- `THREAD_SPLIT_THROW`: a deadlock has been detected as all threads are BUFN, and this thread has been selected to roll back, split its input, and retry. -- `THREAD_REMOVE_THROW`: the task has been unregistered while blocked. +- Spilling: Data marked as spillable is moved out of memory. +- Rollback: If no thread can make progress even after spilling, the plugin starts rolling back threads to the point where their inputs are spillable, allowing other thread to proceed. +- Split and retry: If no thread can still make progress even after rolling back all threads, the inputs of some threads are split, and the threads retry with smaller data. -The thread state can change based on the diagram below. Note that the thread state can transition from any state to `UNKNOWN`, but it is omitted in the diagram for brevity. +If no further splitting is possible, the plugin gracefully cancels the query and reports the OOM error. -![alt text](img/memory_state_machine.png "Thread state machine") +### State Machine for OOM Handler + +To handle various OOM situations, the Spark-RAPIDS plugin keeps track of the state of individual threads. Note that one Spark task can use multiple threads during execution. + +A thread can have one of these states at a time: -### Thread priority +- `UNKNOWN`: The thread has not been registered with the tracking system. +- `THREAD_RUNNING`: The thread is running normally and has no memory allocations pending. +- `THREAD_ALLOC`: The thread has initiated a memory allocation (either CPU or GPU). +- `THREAD_ALLOC_FREE`: A separate thread has freed memory before the allocation of this thread completes. The allocation will be retried in case that there is enough memory available after the free. +- `THREAD_BLOCKED`: The allocation request failed and the thread has been paused, waiting for more memory to become available. Note that the thread pause is usually expected to last for a short period of time as the GPU typically has a lot of memory churn. +- `THREAD_BUFN_THROW`: A deadlock has been detected as all threads are blocked, and this thread has been selected to roll back to the point where all its data is spillable. An exception will be thrown to trigger this rollback when the thread awakes. +- `THREAD_BUFN_WAIT`: An exception has been thrown to initiate the rollback. The thread might be doing some preparation for the retry. +- `THREAD_BUFN`: The thread has rolled back and is now blocked until further notice (BUFN). The task will be unblocked once another task completes. +- `THREAD_SPLIT_THROW`: A deadlock has been detected as all threads are BUFN, and this thread has been selected to split its input and retry. Note that the processing will fail without retrying if the input cannot be further split. +- `THREAD_REMOVE_THROW`: The task has been unregistered from another thread while it is blocked. The blocked thread will be awakened to throw an exception. This should not happen under normal operation unless the Spark process is being shut down before all task threads have exited. + +The thread state can change based on the diagram below. Note that the thread state can transition from any state to `UNKNOWN` unless it is blocked, either `THREAD_BLOCKED` or `THREAD_BUFN`. Thist is omitted in the diagram for brevity. + +![alt text](img/memory_state_machine.png "Thread state machine") -The Spark-RAPIDS plugin uses the thread priority when it needs to break ties between threads. See the [Deadlock busting](#deadlock-busting) section below for an example use case. The thread priority is currently decoupled with the query priority. That is, the threads processing a high priority query do not necessarily have the same high priority. Instead, each task thread is assigned a priority based on their `task_id` and `thread_id`. Shuffle threads have the highest priority, and thus are always prioritized over task threads. This is because other task threads may depend on shuffle indirectly, and this lets us avoid situations of priority inversion. In the future, we may consider taking the query priority into the thread priority. +### Thread Priority -### Deadlock busting +The Spark-RAPIDS plugin uses thread priority to break ties between threads. +Note that the thread priority is currently decoupled from query priority. Each task thread is assigned a priority based on their `task_id` and `thread_id`. +Shuffle threads have the highest priority to avoid priority inversion as the task threads may depend on the shuffle indirectly. -The deadlock can occur when every active task has at least one thread that is either directly blocked on a memory allocation or indirectly blocked by shuffle being blocked on a memory allocation. When this happens, the lowest priority thread (see the above [Thread priority](thread-priority) section for the thread priority) is selected to break the deadlock. There are two cases of the deadlock. +### Deadlock Resolution -1) All threads are blocked and there is at least one thread in the `THREAD_BLOCKED` state. In this case, the lowest priority thread is selected among `THREAD_BLOCKED` threads to break the deadlock. The thread selected transitions its state to `THREAD_BUFN_THROW` and initiates the rollback-and-retry process. After the rollback, all input data of the thread will be spillable and the thread will block before allocating more GPU memory until enough memory is freed up for other threads. -2) If all threads are blocked and are in the `THREAD_BUFN` state, the lowest priority thread is selected to split its input first and then retry with a smaller input. The thread selected transitions its state to `THREAD_SPLIT_THROW` and initiates the rollback-split-and-retry process. +Deadlocks occur when every active task has at least one thread blocked on memory allocation, either directly during its execution or indirectly by dependencies on shuffle blocked. +The lowest priority thread (see the above [Thread priority](thread-priority) section for the thread priority) is selected to break the deadlock. There are two kinds of deadlocks. -If the thread selected is a task thread and its priority is not the highest priority, the thread will transition its state into the `THREAD_BUFN_THROW` state. Any threads that was just marked as `THREAD_BUFN_THROW` will be awaken to start the rollback process and initiate the retry. After the rollback, all input data of the thread will be spillable and the thread will block before allocating more GPU memory until enough memory is freed up for other threads. +1) All threads are blocked, either `THREAD_BLOCKED` or `THREAD_BUFN`, and there is at least one thread in the `THREAD_BLOCKED` state. +In this case, the lowest priority thread is selected among `THREAD_BLOCKED` threads to break the deadlock. +The selected thread transitions its state to `THREAD_BUFN_THROW`. Any threads that was just marked as `THREAD_BUFN_THROW` will be awakened to start the rollback process and initiate the retry. +After the rollback, all data of the thread will be spillable and the thread will be blocked before allocating more GPU memory until enough memory is freed up for other threads. +2) If all threads are in the `THREAD_BUFN` state, the lowest priority thread is selected to split its data first and then retry. +The selected thread transitions its state to `THREAD_SPLIT_THROW` and throws an exception to initiate the split-and-retry process.