Skip to content

Commit a142b5c

Browse files
authored
Merge pull request #2926 from ClickHouse/importing-geojson-with-deeply-nested-object-array
KB article that shows how to import GeoJSON with a deeply nested object array.
2 parents 97a1958 + b80a47d commit a142b5c

File tree

1 file changed

+244
-0
lines changed

1 file changed

+244
-0
lines changed
Lines changed: 244 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,244 @@
1+
---
2+
title: Importing GeoJSON with a deeply nested object array
3+
description: “Importing GeoJSON with a deeply nested object array“
4+
date: 2024-12-18
5+
---
6+
7+
# Importing GeoJSON with a deeply nested object array
8+
9+
### Question
10+
11+
How do I import GeoJSON with a nested object array?
12+
13+
### Answer
14+
15+
For this tutorial, we will use open data publicly available [here](https://opendata.esri.es/datasets/ComunidadSIG::municipios-ign/explore?location=39.536006%2C-0.303882%2C6.57). A copy can be found [here](https://datasets-documentation.s3.eu-west-3.amazonaws.com/geoJSON/Municipios.geojson).
16+
17+
1. Download the data in GeoJSON format and rename the file to `geojson.json`.
18+
19+
2. Understand the structure.
20+
21+
```sql
22+
DESCRIBE TABLE file('geojson.json', 'JSON')
23+
┌─name─────┬─type─────────────────────────────────────────────────────────────────────────────────────────┐
24+
│ type │ Nullable(String) │
25+
│ name │ Nullable(String) │
26+
│ crs │ Tuple( properties Tuple(name Nullable(String)),type Nullable(String)) │
27+
│ features │ Array(Tuple( │
28+
│ │ geometry Tuple(coordinates Array(Array(Array(Array(Nullable(Float64))))), │
29+
│ │ type Nullable(String)), │
30+
│ │ properties Tuple( CODIGOINE Nullable(String), │
31+
│ │ CODNUT1 Nullable(String), │
32+
│ │ CODNUT2 Nullable(String), │
33+
│ │ CODNUT3 Nullable(String), │
34+
│ │ FID Nullable(Int64), │
35+
│ │ INSPIREID Nullable(String), │
36+
│ │ NAMEUNIT Nullable(String), │
37+
│ │ NATCODE Nullable(String), │
38+
│ │ SHAPE_Area Nullable(Float64), │
39+
│ │ SHAPE_Length Nullable(Float64) │
40+
│ │ ), │
41+
│ │ type Nullable(String) │
42+
│ │ ) │
43+
│ │ ) │
44+
└──────────┴──────────────────────────────────────────────────────────────────────────────────────────────┘
45+
```
46+
47+
3. Create a table to store the GeoJSON rows.
48+
49+
<br/>
50+
51+
The requirement here is to generate a row for each `object` in the `features array`.
52+
The data type inferred for the field `geometry` suggests that it translates to ClickHouse's **MultiPolygon** [data type](https://clickhouse.com/docs/en/sql-reference/data-types/geo#multipolygon).
53+
54+
```sql
55+
create table geojson
56+
(
57+
type String,
58+
name String,
59+
crsType String,
60+
crsName String,
61+
featureType String,
62+
id Int64,
63+
inspiredId String,
64+
natCode String,
65+
nameUnit String,
66+
codNut1 String,
67+
codNut2 String,
68+
codNut3 String,
69+
codigoIne String,
70+
shapeLength Float64,
71+
shapeArea Float64,
72+
geometryType String,
73+
geometry MultiPolygon
74+
)
75+
engine = MergeTree
76+
order by id;
77+
```
78+
79+
4. Prepare the data.
80+
81+
<br/>
82+
83+
The main purpose of the query is to verify that we obtain one row for each **object** in the **features array**.
84+
85+
86+
:::note
87+
The field `features.geometry.coordinates` is commented to make the result set more readable.
88+
:::
89+
90+
```sql
91+
SELECT
92+
type AS type,
93+
name AS name,
94+
crs.type AS crsType,
95+
crs.properties.name AS crsName,
96+
features.type AS featureType,
97+
features.properties.FID AS id,
98+
features.properties.INSPIREID AS inspiredId,
99+
features.properties.NATCODE AS natCode,
100+
features.properties.NAMEUNIT AS nameUnit,
101+
features.properties.CODNUT1 AS codNut1,
102+
features.properties.CODNUT2 AS codNut2,
103+
features.properties.CODNUT3 AS codNut3,
104+
features.properties.CODIGOINE AS codigoIne,
105+
features.properties.SHAPE_Length AS shapeLength,
106+
features.properties.SHAPE_Area AS shapeArea,
107+
features.geometry.type AS geometryType
108+
--,features.geometry.coordinates
109+
FROM file('municipios_ign.geojson', 'JSON')
110+
ARRAY JOIN features
111+
LIMIT 5
112+
113+
┌─type──────────────┬─name───────────┬─crsType─┬─crsName───────────────────────┬─featureType─┬─id─┬─inspiredId───────────────┬─natCode─────┬─nameUnit──────────────┬─codNut1─┬─codNut2─┬─codNut3─┬─codigoIne─┬────────shapeLength─┬─────────────shapeArea─┬─geometryType─┐
114+
│ FeatureCollection │ Municipios_IGN │ name │ urn:ogc:def:crs:OGC:1.3:CRS84 │ Feature │ 1ES.IGN.SIGLIM34081616266 │ 34081616266 │ Villarejo-Periesteban │ ES4 │ ES42 │ ES423 │ 162660.26974769973041210.0035198414406406673 │ MultiPolygon │
115+
│ FeatureCollection │ Municipios_IGN │ name │ urn:ogc:def:crs:OGC:1.3:CRS84 │ Feature │ 2ES.IGN.SIGLIM34081616269 │ 34081616269 │ Villares del Saz │ ES4 │ ES42 │ ES423 │ 162690.44760839012699050.00738179315030249 │ MultiPolygon │
116+
│ FeatureCollection │ Municipios_IGN │ name │ urn:ogc:def:crs:OGC:1.3:CRS84 │ Feature │ 3ES.IGN.SIGLIM34081616270 │ 34081616270 │ Villarrubio │ ES4 │ ES42 │ ES423 │ 162700.30539422739941790.0029777582813496337 │ MultiPolygon │
117+
│ FeatureCollection │ Municipios_IGN │ name │ urn:ogc:def:crs:OGC:1.3:CRS84 │ Feature │ 4ES.IGN.SIGLIM34081616271 │ 34081616271 │ Villarta │ ES4 │ ES42 │ ES423 │ 162710.28312269798211840.002680273189024594 │ MultiPolygon │
118+
│ FeatureCollection │ Municipios_IGN │ name │ urn:ogc:def:crs:OGC:1.3:CRS84 │ Feature │ 5ES.IGN.SIGLIM34081616272 │ 34081616272 │ Villas de la Ventosa │ ES4 │ ES42 │ ES423 │ 162720.59582767492467770.015354885085133583 │ MultiPolygon │
119+
└───────────────────┴────────────────┴─────────┴───────────────────────────────┴─────────────┴────┴──────────────────────────┴─────────────┴───────────────────────┴─────────┴─────────┴─────────┴───────────┴────────────────────┴───────────────────────┴──────────────┘
120+
```
121+
122+
5. Insert the data.
123+
124+
<br/>
125+
126+
```sql
127+
INSERT INTO geojson
128+
SELECT
129+
type AS type,
130+
name AS name,
131+
crs.type AS crsType,
132+
crs.properties.name AS crsName,
133+
features.type AS featureType,
134+
features.properties.FID AS id,
135+
features.properties.INSPIREID AS inspiredId,
136+
features.properties.NATCODE AS natCode,
137+
features.properties.NAMEUNIT AS nameUnit,
138+
features.properties.CODNUT1 AS codNut1,
139+
features.properties.CODNUT2 AS codNut2,
140+
features.properties.CODNUT3 AS codNut3,
141+
features.properties.CODIGOINE AS codigoIne,
142+
features.properties.SHAPE_Length AS shapeLength,
143+
features.properties.SHAPE_Area AS shapeArea,
144+
features.geometry.type AS geometryType,
145+
features.geometry.coordinates as geometry
146+
FROM file('municipios_ign.geojson', 'JSON')
147+
ARRAY JOIN features
148+
```
149+
150+
Here, we get the following error:
151+
152+
```
153+
Code: 53. DB::Exception: Received from localhost:9000. DB::Exception: ARRAY JOIN requires array or map argument. (TYPE_MISMATCH)
154+
Received exception from server (version 24.1.2):
155+
```
156+
157+
This is caused by the parsing of `features.geometry.coordinates`.
158+
159+
6. Let's check its data type.
160+
161+
<br/>
162+
163+
``` sql
164+
SELECT DISTINCT toTypeName(features.geometry.coordinates) AS geometry
165+
FROM file('municipios_ign.geojson', 'JSON')
166+
ARRAY JOIN features
167+
168+
┌─geometry──────────────────────────────────────┐
169+
│ Array(Array(Array(Array(Nullable(Float64))))) │
170+
└───────────────────────────────────────────────┘
171+
```
172+
173+
It can be fixed by casting `multipolygon.properties.coordinates` to `Array(Array(Array(Tuple(Float64,Float64))))`.
174+
To do so, we can use the function [arrayMap(func,arr1,...)](https://clickhouse.com/docs/en/sql-reference/functions/array-functions#arraymapfunc-arr1-).
175+
176+
```sql
177+
SELECT distinct
178+
toTypeName(
179+
arrayMap(features.geometry.coordinates->
180+
arrayMap(features.geometry.coordinates->
181+
arrayMap(features.geometry.coordinates-> (features.geometry.coordinates[1],features.geometry.coordinates[2])
182+
,features.geometry.coordinates),
183+
features.geometry.coordinates),
184+
features.geometry.coordinates)
185+
) as toTypeName
186+
FROM file('municipios_ign.geojson', 'JSON')
187+
ARRAY JOIN features;
188+
189+
┌─toTypeName───────────────────────────────────────────────────────┐
190+
│ Array(Array(Array(Tuple(Nullable(Float64), Nullable(Float64))))) │
191+
└──────────────────────────────────────────────────────────────────┘
192+
```
193+
194+
7. Insert the data.
195+
196+
<br/>
197+
198+
```sql
199+
INSERT INTO geojson
200+
SELECT
201+
type as type,
202+
name as name,
203+
crs.type as crsType,
204+
crs.properties.name as crsName,
205+
features.type as featureType,
206+
features.properties.FID id,
207+
features.properties.INSPIREID inspiredId,
208+
features.properties.NATCODE natCode,
209+
features.properties.NAMEUNIT nameUnit,
210+
features.properties.CODNUT1 codNut1,
211+
features.properties.CODNUT2 codNut2,
212+
features.properties.CODNUT3 codNut3,
213+
features.properties.CODIGOINE codigoIne,
214+
features.properties.SHAPE_Length shapeLength,
215+
features.properties.SHAPE_Area shapeArea,
216+
features.geometry.type geometryType,
217+
arrayMap(features.geometry.coordinates->
218+
arrayMap(features.geometry.coordinates->
219+
arrayMap(features.geometry.coordinates-> (features.geometry.coordinates[1],features.geometry.coordinates[2]),features.geometry.coordinates)
220+
,features.geometry.coordinates)
221+
,features.geometry.coordinates) geometry
222+
FROM file('municipios_ign.geojson', 'JSON')
223+
ARRAY JOIN features;
224+
```
225+
226+
```sql
227+
SELECT count()
228+
FROM geojson
229+
230+
┌─count()─┐
231+
8205
232+
└─────────┘
233+
234+
SELECT DISTINCT toTypeName(geometry)
235+
FROM geojson
236+
237+
┌─toTypeName(geometry)─┐
238+
│ MultiPolygon │
239+
└──────────────────────┘
240+
```
241+
242+
### Conclusion
243+
Handling JSON can result in a complex task. This tutorial addressed a scenario where a nested object array could make this task even more difficult.
244+
For any other JSON-related requirements, please refer to our [documentation](https://clickhouse.com/docs/en/integrations/data-formats/json).

0 commit comments

Comments
 (0)