Skip to content

Commit 9e95429

Browse files
authored
Merge pull request #4290 from szarnyasg/nits-20241207b
Nits 20241207b
2 parents 5cb477d + 76ff600 commit 9e95429

9 files changed

+22
-17
lines changed

_posts/2023-04-14-h2oai.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ The queries have not changed since the benchmark went dormant. The data is gener
4747
| advanced groupby #2 | `SELECT id3, max(v1)-min(v2) AS range_v1_v2 FROM tbl GROUP BY id3` | Range selection over small cardinality groups, grouped by integer |
4848
| advanced groupby #3 | `SELECT id6, v3 AS largest2_v3 FROM (SELECT id6, v3, row_number() OVER (PARTITION BY id6 ORDER BY v3 DESC) AS order_v3 FROM x WHERE v3 IS NOT NULL) sub_query WHERE order_v3 <= 2` |Advanced group by query |
4949
| advanced groupby #4 | `SELECT id2, id4, pow(corr(v1, v2), 2) AS r2 FROM tbl GROUP BY id2, id4` | Arithmetic over medium sized groups, grouped by varchar, integer. |
50-
| advanced groupby #5 | `SELECT id1, id2, id3, id4, id5, id6, sum(v3) AS v3, count(*) AS count FROM tbl GROUP BY id1, id2, id3, id4, id5, id6` | Many many small groups, the number of groups is the cardinality of the dataset |
50+
| advanced groupby #5 | `SELECT id1, id2, id3, id4, id5, id6, sum(v3) AS v3, count(*) AS count FROM tbl GROUP BY id1, id2, id3, id4, id5, id6` | Many small groups, the number of groups is the cardinality of the dataset |
5151
| join #1 |`SELECT x.*, small.id4 AS small_id4, v2 FROM x JOIN small USING (id1)` | Joining a large table (x) with a small-sized table on integer type |
5252
| join #2 |`SELECT x.*, medium.id1 AS medium_id1, medium.id4 AS medium_id4, medium.id5 AS medium_id5, v2 FROM x JOIN medium USING (id2)` | Joining a large table (x) with a medium-sized table on integer type |
5353
| join #3 |`SELECT x.*, medium.id1 AS medium_id1, medium.id4 AS medium_id4, medium.id5 AS medium_id5, v2 FROM x LEFT JOIN medium USING (id2)` | Left join a large table (x) with a medium-sized table on integer type|

_posts/2024-09-27-sql-only-extensions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ git push
131131

132132
#### Write Your SQL Macros
133133

134-
It it likely a bit faster to iterate if you test your macros directly in DuckDB.
134+
It is likely a bit faster to iterate if you test your macros directly in DuckDB.
135135
After you have written your SQL, we will move it into the extension.
136136
The example we will use demonstrates how to pull a dynamic set of columns from a dynamic table name (or a view name!).
137137

_posts/2024-11-29-duckdb-tricks-part-3.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ We have now a table with all the data from January to October, amounting to almo
179179
## Reordering Parquet Files
180180

181181
Suppose we want to analyze the average delay of the [Intercity Direct trains](https://en.wikipedia.org/wiki/Intercity_Direct) operated by the [Nederlandse Spoorwegen (NS)](https://en.wikipedia.org/wiki/Nederlandse_Spoorwegen), measured at the final destination of the train service.
182-
While we can run this analysis directly on the the `.csv` files, the lack of metadata (such as schema and min-max indexes) will limit the performance.
182+
While we can run this analysis directly on the `.csv` files, the lack of metadata (such as schema and min-max indexes) will limit the performance.
183183
Let's measure this in the CLI client by turning on the [timer]({% link docs/api/cli/dot_commands.md %}):
184184

185185
```plsql

_posts/2024-12-05-csv-files-dethroning-parquet-or-not.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ Furthermore, the reader became one of the fastest CSV readers in analytical syst
6464

6565
## Comparing CSV and Parquet
6666

67-
With the large boost boost in usability and performance for the CSV reader, one might ask: what is the actual difference in performance when loading a CSV file compared to a Parquet file into a table? Additionally, how do these formats differ when running queries directly on them?
67+
With the large boost in usability and performance for the CSV reader, one might ask: what is the actual difference in performance when loading a CSV file compared to a Parquet file into a table? Additionally, how do these formats differ when running queries directly on them?
6868

6969
To find out, we will run a few examples using both CSV and Parquet files containing TPC-H data to shed light on their differences. All scripts used to generate the benchmarks of this blogpost can be found in a [repository](https://github.com/pdet/csv_vs_parquet).
7070

_posts/2024-12-06-duckdb-tpch-sf100-on-mobile.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ The table contains a summary of the DuckDB benchmark results.
7878

7979
## Historical Context
8080

81-
So why did we set out to run these these experiments in the first place?
81+
So why did we set out to run these experiments in the first place?
8282

8383
Just a few weeks ago, [CWI](https://cwi.nl/), the birthplace of DuckDB, held a ceremony for the [Dijkstra Fellowship](https://www.cwi.nl/en/events/dijkstra-awards/cwi-lectures-dijkstra-fellowship/).
8484
The fellowship was awarded to Marcin Żukowski for his pioneering role in the development of database management systems and his successful entrepreneurial career that resulted in systems such as [VectorWise](https://en.wikipedia.org/wiki/Actian_Vector) and [Snowflake](https://en.wikipedia.org/wiki/Snowflake_Inc.).

docs/extensions/spatial/functions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1702,7 +1702,7 @@ VARCHAR ST_QuadKey (col0 GEOMETRY, col1 INTEGER)
17021702
#### Description
17031703

17041704
Compute the [quadkey](https://learn.microsoft.com/en-us/bingmaps/articles/bing-maps-tile-system) for a given lon/lat point at a given level.
1705-
Note that the the parameter order is **longitude**, **latitude**.
1705+
Note that the parameter order is **longitude**, **latitude**.
17061706

17071707
`level` has to be between 1 and 23, inclusive.
17081708

docs/extensions/spatial/r-tree_indexes.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ EXPLAIN SELECT count(*) FROM t1 WHERE ST_Within(geom, ST_MakeEnvelope(45, 45, 65
109109

110110
Creating R-trees on top of an already populated table is much faster than first creating the index and then inserting the data. This is because the R-tree will have to periodically rebalance itself and perform a somewhat costly splitting operation when a node reaches max capacity after an insert, potentially causing additional splits to cascade up the tree. However, when the R-tree index is created on an already populated table, a special bottom up "bulk loading algorithm" (Sort-Tile-Recursive) is used, which divides all entries into an already balanced tree as the total number of required nodes can be computed from the beginning.
111111

112-
Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree.
112+
Additionally, using the bulk loading algorithm tends to create a R-tree with a better structure (less overlap between bounding boxes), which usually leads to better query performance. If you find that the performance of querying the R-tree starts to deteriorate after a large number of updates or deletions, dropping and re-creating the index might produce a higher quality R-tree.
113113

114114
### Memory Usage
115115

single-file-document/concatenate_to_single_file.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ def adjust_links_in_doc_body(doc_body):
105105
"]({% link docs/python/overview.md %})"
106106
)
107107

108-
# replace "`, `" (with its typical surroundings) with "`,` " to allow line breaking
108+
# replace "`, `" (with the surrounding characters used for emphasis) with "`,` " to allow line breaking
109109
# see https://stackoverflow.com/questions/76951040/pandoc-preserve-whitespace-in-inline-code
110110
doc_body = doc_body.replace("`*`, `*`", "`*`,` *`")
111111

@@ -115,8 +115,11 @@ def adjust_links_in_doc_body(doc_body):
115115
# replace links to data sets to point to the website
116116
doc_body = doc_body.replace("](/data/", "](https://duckdb.org/data/")
117117

118+
# remove '<div>' HTML tags
119+
doc_body = re.sub(r'<div[^>]*?>[\n ]*([^§]*?)[\n ]*</div>', r'\1', doc_body, flags=re.MULTILINE)
120+
118121
# replace '<img>' HTML tags with Markdown's '![]()' construct
119-
doc_body = re.sub(r'<img src="([^"]*)"[^§]*?/>', r'![](\1)', doc_body, flags=re.MULTILINE)
122+
doc_body = re.sub(r'<img src="([^"]*)"[^§]*?/>', r'![](\1)\n', doc_body, flags=re.MULTILINE)
120123

121124
# use relative path for images in Markdown
122125
doc_body = doc_body.replace("](/images", "](../images")

single-file-document/templates/eisvogel2.tex

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -389,15 +389,17 @@
389389
$if(graphics)$
390390
\usepackage{graphicx}
391391
\makeatletter
392-
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
393-
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
394-
\makeatother
395-
% Scale images if necessary, so that they will not overflow the page
396-
% margins by default, and it is still possible to overwrite the defaults
397-
% using explicit options in \includegraphics[width, height, ...]{}
398-
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
392+
\newsavebox\pandoc@box
393+
\newcommand*\pandocbounded[1]{% scales image to fit in text height/width
394+
\sbox\pandoc@box{#1}%
395+
\Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}%
396+
\Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}%
397+
\ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both
398+
\ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}%
399+
\else\usebox{\pandoc@box}%
400+
\fi%
401+
}
399402
% Set default figure placement to htbp
400-
\makeatletter
401403
% Make use of float-package and set default placement for figures to H.
402404
% The option H means 'PUT IT HERE' (as opposed to the standard h option which means 'You may put it here if you like').
403405
\usepackage{float}

0 commit comments

Comments
 (0)