Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Aug 12, 2024
1 parent 500b5b0 commit 67a50d4
Show file tree
Hide file tree
Showing 11 changed files with 193 additions and 111 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8802de40
c86e4085
2 changes: 1 addition & 1 deletion materials/0_housekeeping.html
Original file line number Diff line number Diff line change
Expand Up @@ -413,7 +413,7 @@ <h2>Meet Each Other <svg aria-hidden="true" role="img" viewbox="0 0 640 512" sty
<section id="getting-help-today" class="slide level2">
<h2>Getting Help Today <svg aria-hidden="true" role="img" viewbox="0 0 448 512" style="height:1em;width:0.88em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M48 24C48 10.7 37.3 0 24 0S0 10.7 0 24V64 350.5 400v88c0 13.3 10.7 24 24 24s24-10.7 24-24V388l80.3-20.1c41.1-10.3 84.6-5.5 122.5 13.4c44.2 22.1 95.5 24.8 141.7 7.4l34.7-13c12.5-4.7 20.8-16.6 20.8-30V66.1c0-23-24.2-38-44.8-27.7l-9.6 4.8c-46.3 23.2-100.8 23.2-147.1 0c-35.1-17.6-75.4-22-113.5-12.5L48 52V24zm0 77.5l96.6-24.2c27-6.7 55.5-3.6 80.4 8.8c54.9 27.4 118.7 29.7 175 6.8V334.7l-24.4 9.1c-33.7 12.6-71.2 10.7-103.4-5.4c-48.2-24.1-103.3-30.1-155.6-17.1L48 338.5v-237z"></path></svg></h2>
<p><br></p>
<p><span style="color:teal;">TEAL</span> sticky note: I am OK / I am done</p>
<p><span style="color:green;">GREEN</span> sticky note: I am OK / I am done</p>
<p><span style="color:pink;">PINK</span> sticky note: I need support / I am working</p>
<p><br></p>
<p><svg aria-hidden="true" role="img" viewbox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M256 0c-25.3 0-47.2 14.7-57.6 36c-7-2.6-14.5-4-22.4-4c-35.3 0-64 28.7-64 64V261.5l-2.7-2.7c-25-25-65.5-25-90.5 0s-25 65.5 0 90.5L106.5 437c48 48 113.1 75 181 75H296h8c1.5 0 3-.1 4.5-.4c91.7-6.2 165-79.4 171.1-171.1c.3-1.5 .4-3 .4-4.5V160c0-35.3-28.7-64-64-64c-5.5 0-10.9 .7-16 2V96c0-35.3-28.7-64-64-64c-7.9 0-15.4 1.4-22.4 4C303.2 14.7 281.3 0 256 0zM240 96.1c0 0 0-.1 0-.1V64c0-8.8 7.2-16 16-16s16 7.2 16 16V95.9c0 0 0 .1 0 .1V232c0 13.3 10.7 24 24 24s24-10.7 24-24V96c0 0 0 0 0-.1c0-8.8 7.2-16 16-16s16 7.2 16 16v55.9c0 0 0 .1 0 .1v80c0 13.3 10.7 24 24 24s24-10.7 24-24V160.1c0 0 0-.1 0-.1c0-8.8 7.2-16 16-16s16 7.2 16 16V332.9c-.1 .6-.1 1.3-.2 1.9c-3.4 69.7-59.3 125.6-129 129c-.6 0-1.3 .1-1.9 .2H296h-8.5c-55.2 0-108.1-21.9-147.1-60.9L52.7 315.3c-6.2-6.2-6.2-16.4 0-22.6s16.4-6.2 22.6 0L119 336.4c6.9 6.9 17.2 8.9 26.2 5.2s14.8-12.5 14.8-22.2V96c0-8.8 7.2-16 16-16c8.8 0 16 7.1 16 15.9V232c0 13.3 10.7 24 24 24s24-10.7 24-24V96.1z"></path></svg> You can ask questions at any time during the workshop</p>
Expand Down
24 changes: 7 additions & 17 deletions materials/1_hello_arrow.html
Original file line number Diff line number Diff line change
Expand Up @@ -394,16 +394,6 @@
<section id="hello-arrow" class="title-slide slide level1 center">
<h1>Hello Arrow</h1>

</section>
<section id="kick-off-qa" class="slide level2">
<h2>Kick-off Q&amp;A</h2>
<p><br></p>
<ul>
<li>What brings you to this workshop?</li>
<li>What challenges have you faced related to larger-than-memory data in R?</li>
<li>What is one thing you want to learn or achieve from today’s workshop?</li>
<li>…?</li>
</ul>
</section>
<section id="poll-arrow" class="slide level2">
<h2>Poll: Arrow</h2>
Expand Down Expand Up @@ -515,12 +505,12 @@ <h2>NYC Taxi Dataset: A dplyr pipeline</h2>
2 2013 173179759 51215013 29.6
3 2014 165114361 48816505 29.6
4 2015 146112989 43081091 29.5
5 2017 113495512 32296166 28.5
6 2018 102797401 28796633 28.0
7 2019 84393604 23515989 27.9
8 2020 24647055 5837960 23.7
9 2021 30902618 7221844 23.4
10 2016 131165043 38163870 29.1</code></pre>
5 2016 131165043 38163870 29.1
6 2017 113495512 32296166 28.5
7 2018 102797401 28796633 28.0
8 2019 84393604 23515989 27.9
9 2020 24647055 5837960 23.7
10 2021 30902618 7221844 23.4</code></pre>
</div>
</div>
</section>
Expand Down Expand Up @@ -603,7 +593,7 @@ <h2>Accelerated In-Memory Processing</h2>
<section id="arrow" class="slide level2">
<h2>arrow 📦</h2>
<p><br></p>
<p><img data-src="images/arrow-r-pkg.png" class="absolute" style="top: 0px; left: 300px; width: 700px; height: 900px; "></p>
<p><img data-src="images/arrow-r-pkg-highlights.png" class="absolute" style="top: 0px; left: 300px; width: 700px; height: 900px; "></p>
</section>
<section id="arrow-1" class="slide level2">
<h2>arrow 📦</h2>
Expand Down
73 changes: 39 additions & 34 deletions materials/2_data_manipulation_1.html
Original file line number Diff line number Diff line change
Expand Up @@ -543,12 +543,12 @@ <h2>Use <code>head()</code> and <code>collect()</code> to preview results</h2>
<pre><code># A tibble: 6 × 2
fare_amount fare_pounds
&lt;dbl&gt; &lt;dbl&gt;
1 8 6.32
2 17 13.4
3 6.5 5.14
4 7 5.53
5 6.5 5.14
6 42 33.2 </code></pre>
1 16.5 13.0
2 21.5 17.0
3 5 3.95
4 10.5 8.30
5 11 8.69
6 5.5 4.35</code></pre>
</div>
</div>
</section>
Expand Down Expand Up @@ -607,7 +607,7 @@ <h2>Example - <code>slice()</code></h2>
<section id="head-to-the-docs" class="slide level2">
<h2>Head to the docs!</h2>
<div class="cell">
<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a></a>?<span class="st">`</span><span class="at">arrow-dplyr</span><span class="st">`</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="sourceCode cell-code" id="cb18"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb18-1"><a></a>?acero</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>or view them at <a href="https://arrow.apache.org/docs/r/reference/acero.html" class="uri">https://arrow.apache.org/docs/r/reference/acero.html</a></p>
</section>
Expand All @@ -621,9 +621,9 @@ <h2>A different function</h2>
<pre><code># A tibble: 3 × 3
pickup_datetime year trip_distance
&lt;dttm&gt; &lt;int&gt; &lt;dbl&gt;
1 2021-11-16 12:55:00 2021 351613.
2 2021-10-27 17:46:00 2021 345124.
3 2021-12-11 10:48:00 2021 335094.</code></pre>
1 2021-11-16 04:55:00 2021 351613.
2 2021-10-27 09:46:00 2021 345124.
3 2021-12-11 02:48:00 2021 335094.</code></pre>
</div>
</div>
</section>
Expand All @@ -637,9 +637,9 @@ <h2>Or call <code>collect()</code> first</h2>
<pre><code># A tibble: 3 × 3
pickup_datetime year trip_distance
&lt;dttm&gt; &lt;int&gt; &lt;dbl&gt;
1 2021-10-02 15:04:53 2021 188.
2 2021-10-03 16:45:02 2021 134
3 2021-10-03 17:29:35 2021 218.</code></pre>
1 2021-01-03 01:01:26 2021 216.
2 2021-01-03 03:36:52 2021 268.
3 2021-10-02 07:04:53 2021 188.</code></pre>
</div>
</div>
</section>
Expand Down Expand Up @@ -746,10 +746,14 @@ <h2>Morning vs afternoon - without namespacing</h2>
</section>
<section id="how-does-this-work" class="slide level2">
<h2>How does this work?</h2>

<img data-src="images/dplyr-backend.png" class="r-stretch"></section>
<section id="arrow-c" class="slide level2">
<h2>arrow C++</h2>
<p><img data-src="images/dplyr-backend.png"> ## Acero</p>
<ul>
<li>arrow’s query execution engine</li>
<li>use Arrow functions on Arrow Datasets</li>
</ul>
</section>
<section id="acero" class="slide level2">
<h2>Acero</h2>

<img data-src="images/arrow_cpp_functions.png" class="r-stretch"></section>
<section id="arrow-dplyr-queries-1" class="slide level2">
Expand All @@ -764,15 +768,16 @@ <h2>What if a function isn’t implemented?</h2>
<span id="cb31-3"><a></a> <span class="fu">head</span>() <span class="sc">|&gt;</span></span>
<span id="cb31-4"><a></a> <span class="fu">collect</span>()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-error">
<pre><code>Error: Expression na_if(vendor_name, "CMT") not supported in Arrow
Call collect() first to pull data into R.</code></pre>
<pre><code>Error in `na_if()`:
! Expression not supported in Arrow
→ Call collect() first to pull data into R.</code></pre>
</div>
</div>
</section>
<section id="head-to-the-docs-again-to-see-whats-implemented" class="slide level2">
<h2>Head to the docs again to see what’s implemented!</h2>
<div class="cell">
<div class="sourceCode cell-code" id="cb33"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb33-1"><a></a>?<span class="st">`</span><span class="at">arrow-dplyr</span><span class="st">`</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="sourceCode cell-code" id="cb33"><pre class="sourceCode numberSource r number-lines code-with-copy"><code class="sourceCode r"><span id="cb33-1"><a></a>?acero</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>or view them at <a href="https://arrow.apache.org/docs/r/reference/acero.html" class="uri">https://arrow.apache.org/docs/r/reference/acero.html</a></p>
</section>
Expand All @@ -787,12 +792,12 @@ <h2>Option 1 - find a workaround!</h2>
<pre><code># A tibble: 6 × 24
vendor_name pickup_datetime dropoff_datetime passenger_count
&lt;chr&gt; &lt;dttm&gt; &lt;dttm&gt; &lt;int&gt;
1 &lt;NA&gt; 2012-01-20 14:09:36 2012-01-20 14:42:25 1
2 &lt;NA&gt; 2012-01-20 14:54:10 2012-01-20 15:06:55 1
3 &lt;NA&gt; 2012-01-20 08:08:01 2012-01-20 08:11:02 1
4 &lt;NA&gt; 2012-01-20 08:36:22 2012-01-20 08:39:44 1
5 &lt;NA&gt; 2012-01-20 20:58:32 2012-01-20 21:03:04 1
6 &lt;NA&gt; 2012-01-20 19:40:20 2012-01-20 19:43:43 2
1 &lt;NA&gt; 2012-01-20 06:09:36 2012-01-20 06:42:25 1
2 &lt;NA&gt; 2012-01-20 06:54:10 2012-01-20 07:06:55 1
3 &lt;NA&gt; 2012-01-20 00:08:01 2012-01-20 00:11:02 1
4 &lt;NA&gt; 2012-01-20 00:36:22 2012-01-20 00:39:44 1
5 &lt;NA&gt; 2012-01-20 12:58:32 2012-01-20 13:03:04 1
6 &lt;NA&gt; 2012-01-20 11:40:20 2012-01-20 11:43:43 2
# ℹ 20 more variables: trip_distance &lt;dbl&gt;, pickup_longitude &lt;dbl&gt;,
# pickup_latitude &lt;dbl&gt;, rate_code &lt;chr&gt;, store_and_fwd &lt;chr&gt;,
# dropoff_longitude &lt;dbl&gt;, dropoff_latitude &lt;dbl&gt;, payment_type &lt;chr&gt;,
Expand Down Expand Up @@ -839,12 +844,12 @@ <h2>Working with custom functions</h2>
<pre><code># A tibble: 6 × 2
pickup_datetime pickup_text
&lt;dttm&gt; &lt;chr&gt;
1 2012-01-08 20:50:38 Sunday PM
2 2012-01-08 20:52:01 Sunday PM
3 2012-01-08 02:39:26 Sunday AM
4 2012-01-08 02:40:49 Sunday AM
5 2012-01-09 03:42:37 Monday AM
6 2012-01-08 20:51:47 Sunday PM </code></pre>
1 2012-01-08 12:50:38 Sunday PM
2 2012-01-08 12:52:01 Sunday PM
3 2012-01-07 18:39:26 Sunday AM
4 2012-01-07 18:40:49 Sunday AM
5 2012-01-08 19:42:37 Monday AM
6 2012-01-08 12:51:47 Sunday PM </code></pre>
</div>
</div>
</section>
Expand All @@ -871,14 +876,14 @@ <h2>Anything else to be aware of?</h2>
<ul>
<li>arrow 17.0.0 or later</li>
<li>this will only work for functions which have Arrow bindings</li>
<li>use <code>?`arrow-dplyr`</code> to see which ones do</li>
<li>use <code>?acero</code> to see which ones do</li>
</ul>
</section>
<section id="summary-1" class="slide level2">
<h2>Summary</h2>
<ul>
<li>Working with Arrow Datasets allow you to manipulate data which is larger-than-memory</li>
<li>You can use many dplyr functions with arrow - run <code>?`arrow-dplyr`</code> to view the docs</li>
<li>You can use many dplyr functions with arrow - run <code>?acero</code> to view the docs</li>
<li>You can pass data to duckdb to use functions implemented in duckdb but not arrow</li>
</ul>

Expand Down
Loading

0 comments on commit 67a50d4

Please sign in to comment.