Skip to content

Commit 37f9e1a

Browse files
authored
feat Nextcloudclient (#29)
* added deploy script with uploading to given rclone remote * added webdav-url argument * added deploying to the databus without upload to nextcloud * updated pyproject.toml and content-hash * updated README.md * added checksum validation * updated upload_to_nextcloud function to accept list of source_paths * only add result if upload successful * use os.path.basename instead of .split("/")[-1] * added __init__.py and updated README.md * changed append to extend (no nested list) * fixed windows separators and added rclone error message * moved deploy.py to cli upload_and_deploy * changed metadata to dict list * removed python-dotenv * small updates * refactored upload_and_deploy function * updated README.md * updated metadata_string for new metadata format * updated README.md * updated README.md * Changed context url back * added check for known compressions * updated checksum to sha256 * updated README.md * size check * updated checksum validation * added doc * - refactored deploy, upload_and_deploy and deploy_with_metadata to one single deploy command - added simple validate_distributions function * updated README.md * fixed docstring * removed metadata.json * moved COMPRESSION_EXTS out of loop * removed unnecessary f-strings * set file_format and compression to None * get file_format and compression from metadata file * updated README.md * chores * updated metadata format (removed filename - used url instead)
1 parent cfdca3b commit 37f9e1a

File tree

6 files changed

+329
-21
lines changed

6 files changed

+329
-21
lines changed

README.md

Lines changed: 82 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -163,13 +163,25 @@ databusclient download 'PREFIX dcat: <http://www.w3.org/ns/dcat#> SELECT ?x WHER
163163
databusclient deploy --help
164164
```
165165
```
166-
Usage: databusclient deploy [OPTIONS] DISTRIBUTIONS...
166+
Usage: databusclient deploy [OPTIONS] [DISTRIBUTIONS]...
167167
168-
Arguments:
169-
DISTRIBUTIONS... distributions in the form of List[URL|CV|fileext|compression|sha256sum:contentlength] where URL is the
170-
download URL and CV the key=value pairs (_ separted)
171-
content variants of a distribution, fileExt and Compression can be set, if not they are inferred from the path [required]
168+
Flexible deploy to databus command:
169+
170+
- Classic dataset deployment
171+
172+
- Metadata-based deployment
173+
174+
- Upload & deploy via Nextcloud
172175
176+
Arguments:
177+
DISTRIBUTIONS... Depending on mode:
178+
- Classic mode: List of distributions in the form
179+
URL|CV|fileext|compression|sha256sum:contentlength
180+
(where URL is the download URL and CV the key=value pairs,
181+
separated by underscores)
182+
- Upload mode: List of local file or folder paths (must exist)
183+
- Metdata mode: None
184+
173185
Options:
174186
--version-id TEXT Target databus version/dataset identifier of the form <h
175187
ttps://databus.dbpedia.org/$ACCOUNT/$GROUP/$ARTIFACT/$VE
@@ -179,24 +191,86 @@ Options:
179191
--description TEXT Dataset description [required]
180192
--license TEXT License (see dalicc.net) [required]
181193
--apikey TEXT API key [required]
194+
--metadata PATH Path to metadata JSON file (for metadata mode)
195+
--webdav-url TEXT WebDAV URL (e.g.,
196+
https://cloud.example.com/remote.php/webdav)
197+
--remote TEXT rclone remote name (e.g., 'nextcloud')
198+
--path TEXT Remote path on Nextcloud (e.g., 'datasets/mydataset')
182199
--help Show this message and exit.
200+
183201
```
184-
Examples of using deploy command
202+
#### Examples of using deploy command
203+
##### Mode 1: Classic Deploy (Distributions)
185204
```
186205
databusclient deploy --version-id https://databus.dbpedia.org/user1/group1/artifact1/2022-05-18 --title title1 --abstract abstract1 --description description1 --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
187206
```
188207

189208
```
190209
databusclient deploy --version-id https://dev.databus.dbpedia.org/denis/group1/artifact1/2022-05-18 --title "Client Testing" --abstract "Testing the client...." --description "Testing the client...." --license http://dalicc.net/licenselibrary/AdaptivePublicLicense10 --apikey MYSTERIOUS 'https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml|type=swagger'
191210
```
192-
193211
A few more notes for CLI usage:
194212

195213
* The content variants can be left out ONLY IF there is just one distribution
196214
* For complete inferred: Just use the URL with `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml`
197215
* If other parameters are used, you need to leave them empty like `https://raw.githubusercontent.com/dbpedia/databus/master/server/app/api/swagger.yml||yml|7a751b6dd5eb8d73d97793c3c564c71ab7b565fa4ba619e4a8fd05a6f80ff653:367116`
198216

199217

218+
##### Mode 2: Deploy with Metadata File
219+
220+
Use a JSON metadata file to define all distributions.
221+
The metadata.json should list all distributions and their metadata.
222+
All files referenced there will be registered on the Databus.
223+
```bash
224+
databusclient deploy \
225+
--metadata /home/metadata.json \
226+
--version-id https://databus.org/user/dataset/version/1.0 \
227+
--title "Metadata Deploy Example" \
228+
--abstract "This is a short abstract of the dataset." \
229+
--description "This dataset was uploaded using metadata.json." \
230+
--license https://dalicc.net/licenselibrary/Apache-2.0 \
231+
--apikey "API-KEY"
232+
```
233+
Metadata file structure (file_format and compression are optional):
234+
```json
235+
[
236+
{
237+
"checksum": "0929436d44bba110fc7578c138ed770ae9f548e195d19c2f00d813cca24b9f39",
238+
"size": 12345,
239+
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.ttl",
240+
"file_format": "ttl"
241+
},
242+
{
243+
"checksum": "2238acdd7cf6bc8d9c9963a9f6014051c754bf8a04aacc5cb10448e2da72c537",
244+
"size": 54321,
245+
"url": "https://cloud.example.com/remote.php/webdav/datasets/mydataset/example.csv.gz",
246+
"file_format": "csv",
247+
"compression": "gz"
248+
}
249+
]
250+
251+
```
252+
253+
254+
##### Mode 3: Upload & Deploy via Nextcloud
255+
256+
Upload local files or folders to a WebDAV/Nextcloud instance and automatically deploy to DBpedia Databus.
257+
Rclone is required.
258+
259+
```bash
260+
databusclient deploy \
261+
--webdav-url https://cloud.example.com/remote.php/webdav \
262+
--remote nextcloud \
263+
--path datasets/mydataset \
264+
--version-id https://databus.org/user/dataset/version/1.0 \
265+
--title "Test Dataset" \
266+
--abstract "Short abstract of dataset" \
267+
--description "This dataset was uploaded for testing the Nextcloud → Databus pipeline." \
268+
--license https://dalicc.net/licenselibrary/Apache-2.0 \
269+
--apikey "API-KEY" \
270+
./localfile1.ttl \
271+
./data_folder
272+
```
273+
200274

201275
#### Authentication with vault
202276

@@ -221,8 +295,8 @@ If using vault authentication, make sure the token file is available in the cont
221295
docker run --rm -v $(pwd):/data dbpedia/databus-python-client download https://databus.dbpedia.org/dbpedia-enterprise/live-fusion-snapshots/fusion/2025-08-23/fusion_props=all_subjectns=commons-wikimedia-org_vocab=all.ttl.gz --token vault-token.dat
222296
```
223297

224-
## Module Usage
225298

299+
## Module Usage
226300
### Step 1: Create lists of distributions for the dataset
227301

228302
```python

databusclient/cli.py

Lines changed: 64 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
#!/usr/bin/env python3
2+
import json
3+
import os
4+
25
import click
36
from typing import List
47
from databusclient import client
58

9+
from nextcloudclient import upload
610

711
@click.group()
812
def app():
@@ -22,18 +26,68 @@ def app():
2226
@click.option("--description", required=True, help="Dataset description")
2327
@click.option("--license", "license_url", required=True, help="License (see dalicc.net)")
2428
@click.option("--apikey", required=True, help="API key")
25-
@click.argument(
26-
"distributions",
27-
nargs=-1,
28-
required=True,
29-
)
30-
def deploy(version_id, title, abstract, description, license_url, apikey, distributions: List[str]):
29+
30+
@click.option("--metadata", "metadata_file", type=click.Path(exists=True),
31+
help="Path to metadata JSON file (for metadata mode)")
32+
@click.option("--webdav-url", "webdav_url", help="WebDAV URL (e.g., https://cloud.example.com/remote.php/webdav)")
33+
@click.option("--remote", help="rclone remote name (e.g., 'nextcloud')")
34+
@click.option("--path", help="Remote path on Nextcloud (e.g., 'datasets/mydataset')")
35+
36+
@click.argument("distributions", nargs=-1)
37+
def deploy(version_id, title, abstract, description, license_url, apikey,
38+
metadata_file, webdav_url, remote, path, distributions: List[str]):
3139
"""
32-
Deploy a dataset version with the provided metadata and distributions.
40+
Flexible deploy to Databus command supporting three modes:\n
41+
- Classic deploy (distributions as arguments)\n
42+
- Metadata-based deploy (--metadata <file>)\n
43+
- Upload & deploy via Nextcloud (--webdav-url, --remote, --path)
3344
"""
34-
click.echo(f"Deploying dataset version: {version_id}")
35-
dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
36-
client.deploy(dataid=dataid, api_key=apikey)
45+
46+
# Sanity checks for conflicting options
47+
if metadata_file and any([distributions, webdav_url, remote, path]):
48+
raise click.UsageError("Invalid combination: when using --metadata, do not provide --webdav-url, --remote, --path, or distributions.")
49+
if any([webdav_url, remote, path]) and not all([webdav_url, remote, path]):
50+
raise click.UsageError("Invalid combination: when using WebDAV/Nextcloud mode, please provide --webdav-url, --remote, and --path together.")
51+
52+
# === Mode 1: Classic Deploy ===
53+
if distributions and not (metadata_file or webdav_url or remote or path):
54+
click.echo("[MODE] Classic deploy with distributions")
55+
click.echo(f"Deploying dataset version: {version_id}")
56+
57+
dataid = client.create_dataset(version_id, title, abstract, description, license_url, distributions)
58+
client.deploy(dataid=dataid, api_key=apikey)
59+
return
60+
61+
# === Mode 2: Metadata File ===
62+
if metadata_file:
63+
click.echo(f"[MODE] Deploy from metadata file: {metadata_file}")
64+
with open(metadata_file, 'r') as f:
65+
metadata = json.load(f)
66+
client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)
67+
return
68+
69+
# === Mode 3: Upload & Deploy (Nextcloud) ===
70+
if webdav_url and remote and path:
71+
if not distributions:
72+
raise click.UsageError("Please provide files to upload when using WebDAV/Nextcloud mode.")
73+
74+
#Check that all given paths exist and are files or directories.#
75+
invalid = [f for f in distributions if not os.path.exists(f)]
76+
if invalid:
77+
raise click.UsageError(f"The following input files or folders do not exist: {', '.join(invalid)}")
78+
79+
click.echo("[MODE] Upload & Deploy to DBpedia Databus via Nextcloud")
80+
click.echo(f"→ Uploading to: {remote}:{path}")
81+
metadata = upload.upload_to_nextcloud(distributions, remote, path, webdav_url)
82+
client.deploy_from_metadata(metadata, version_id, title, abstract, description, license_url, apikey)
83+
return
84+
85+
raise click.UsageError(
86+
"No valid input provided. Please use one of the following modes:\n"
87+
" - Classic deploy: pass distributions as arguments\n"
88+
" - Metadata deploy: use --metadata <file>\n"
89+
" - Upload & deploy: use --webdav-url, --remote, --path, and file arguments"
90+
)
3791

3892

3993
@app.command()

databusclient/client.py

Lines changed: 99 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,6 @@
77
from SPARQLWrapper import SPARQLWrapper, JSON
88
from hashlib import sha256
99
import os
10-
import re
1110

1211
__debug = False
1312

@@ -205,6 +204,56 @@ def create_distribution(
205204

206205
return f"{url}|{meta_string}"
207206

207+
def create_distributions_from_metadata(metadata: List[Dict[str, Union[str, int]]]) -> List[str]:
208+
"""
209+
Create distributions from metadata entries.
210+
211+
Parameters
212+
----------
213+
metadata : List[Dict[str, Union[str, int]]]
214+
List of metadata entries, each containing:
215+
- checksum: str - SHA-256 hex digest (64 characters)
216+
- size: int - File size in bytes (positive integer)
217+
- url: str - Download URL for the file
218+
- file_format: str - File format of the file [optional]
219+
- compression: str - Compression format of the file [optional]
220+
221+
Returns
222+
-------
223+
List[str]
224+
List of distribution identifier strings for use with create_dataset
225+
"""
226+
distributions = []
227+
counter = 0
228+
229+
for entry in metadata:
230+
# Validate required keys
231+
required_keys = ["checksum", "size", "url"]
232+
missing_keys = [key for key in required_keys if key not in entry]
233+
if missing_keys:
234+
raise ValueError(f"Metadata entry missing required keys: {missing_keys}")
235+
236+
checksum = entry["checksum"]
237+
size = entry["size"]
238+
url = entry["url"]
239+
if not isinstance(size, int) or size <= 0:
240+
raise ValueError(f"Invalid size for {url}: expected positive integer, got {size}")
241+
# Validate SHA-256 hex digest (64 hex chars)
242+
if not isinstance(checksum, str) or len(checksum) != 64 or not all(
243+
c in '0123456789abcdefABCDEF' for c in checksum):
244+
raise ValueError(f"Invalid checksum for {url}")
245+
246+
distributions.append(
247+
create_distribution(
248+
url=url,
249+
cvs={"count": f"{counter}"},
250+
file_format=entry.get("file_format"),
251+
compression=entry.get("compression"),
252+
sha256_length_tuple=(checksum, size)
253+
)
254+
)
255+
counter += 1
256+
return distributions
208257

209258
def create_dataset(
210259
version_id: str,
@@ -393,6 +442,55 @@ def deploy(
393442
print(resp.text)
394443

395444

445+
def deploy_from_metadata(
446+
metadata: List[Dict[str, Union[str, int]]],
447+
version_id: str,
448+
title: str,
449+
abstract: str,
450+
description: str,
451+
license_url: str,
452+
apikey: str
453+
) -> None:
454+
"""
455+
Deploy a dataset from metadata entries.
456+
457+
Parameters
458+
----------
459+
metadata : List[Dict[str, Union[str, int]]]
460+
List of file metadata entries (see create_distributions_from_metadata)
461+
version_id : str
462+
Dataset version ID in the form $DATABUS_BASE/$ACCOUNT/$GROUP/$ARTIFACT/$VERSION
463+
title : str
464+
Dataset title
465+
abstract : str
466+
Short description of the dataset
467+
description : str
468+
Long description (Markdown supported)
469+
license_url : str
470+
License URI
471+
apikey : str
472+
API key for authentication
473+
"""
474+
distributions = create_distributions_from_metadata(metadata)
475+
476+
dataset = create_dataset(
477+
version_id=version_id,
478+
title=title,
479+
abstract=abstract,
480+
description=description,
481+
license_url=license_url,
482+
distributions=distributions
483+
)
484+
485+
print(f"Deploying dataset version: {version_id}")
486+
deploy(dataset, apikey)
487+
488+
print(f"Successfully deployed to {version_id}")
489+
print(f"Deployed {len(metadata)} file(s):")
490+
for entry in metadata:
491+
print(f" - {entry['url']}")
492+
493+
396494
def __download_file__(url, filename, vault_token_file=None, auth_url=None, client_id=None) -> None:
397495
"""
398496
Download a file from the internet with a progress bar using tqdm.

nextcloudclient/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)