From 057ced880ab95c38cebe78280bc610abe3f3f68a Mon Sep 17 00:00:00 2001 From: Till Prochaska Date: Wed, 10 Aug 2022 10:21:00 +0200 Subject: [PATCH] Document crawldir --nojunk flag --- developers/alephclient.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/developers/alephclient.md b/developers/alephclient.md index 46036a1..0a2b48b 100644 --- a/developers/alephclient.md +++ b/developers/alephclient.md @@ -38,6 +38,12 @@ The `crawldir` command crawls through a given directory recursively and uploads alephclient crawldir --foreign-id wikileaks-cable /Users/sunu/data/cable ``` +Optionally pass the `--nojunk` flag to exclude files and directories commonly created by operating systems, such as `thumbs.db` or `desktop.ini`, that you might not want to upload to Aleph: + +```bash +alephclient crawldir --nojunk --foreign-id wikileaks-cable /Users/sunu/data/cable +``` + When Aleph imports data, it performs optical character recognition \(OCR\) on images contained in the material. This works better when Aleph already has an idea of the language the documents might use. This can be specified with the `--language` option, which expects a 3-letter ISO 639 language code. It can be specified multiple times, for when the directory contains files in more than one language. ```bash