Skip to content

Update help text for icat.lucene 4.0.0 #1811

@patrick-austin

Description

@patrick-austin

@louise-davies I've summarised proposed changes below, let me know if anything's unclear or there are any obvious typos etc.

Examples

Replace 3rd and 4th examples:

To search for files using a visit and file type	visitId:"<your visit>" AND (location:<file type> location:*.<file type>)
e.g.	visitId:"nt20-8" AND (location:txt location:*.txt)
To search for files using a visit, partial location and file type	visitId:"<your visit>" AND location:"<partial location>" AND (location:<file type> location:*.<file type>)
e.g.	visitId:"nt20-8" AND location:"trypsin_10_dnafiles" AND (location:txt location:*.txt)

With:

To search for files using a visit and file type	visitId:"<your visit>" AND location.fileName:<file type>
e.g.	visitId:"nt20-8" AND location:txt
To search for files using a visit, partial location and file type	visitId:"<your visit>" AND location:"<partial location>" AND location.fileName:<file type>
e.g.	visitId:"nt20-8" AND location:"trypsin_10_dnafiles" AND location.fileName:txt

Special characters

Replace:

In addition to whitespace, there are other characters used to split terms based on context.

With

In addition to whitespace, there are other characters used to split terms based on context. The following applies to all text fields except the location field(s) for a Datafile, which have special handling (see below).

File paths

Replace:

The fact that file paths often contain slashes separating directories, dashes within directory names, and dots before extensions can make searching challenging, especially in combination with wildcards. As paths and the intended use case differ, a one size fits all approach is not possible, but there are some techniques that can be used.

When searching for an exact match for full or partial path without wildcards, field targeting (see below) and quoting will give the most efficient query. This will escape all slashes and other separators, but also ensure that you only get results containing all terms, i.e. every directory specified in order. For example, location:"path/to/directory".

To use wildcards in combination with other separators, manually replace the latter with whitespace and consider if AND/OR logic should be used, so instead of a??-def, +a?? +def or a?? def would be needed for AND/OR logic respectively.

Finally, when matching file extensions, the approach will differ depending on whether the extension is preceded by a number or a letter. For numbers, to match a name with any extension (or vice versa) the extension/name can be omitted. 1234.dat is stored as two terms, 1234 and dat, so one can be matched independently of the other. For letters, wildcards must be used. To match a file named abcd.dat either abcd.* or *.dat can be used, however please note that the latter trailing wildcard can take a long time to evaluate.

With:

To allow file paths to be searched, special syntax is applied to three different fields: location, location.fileName and location.exact.

**location**
For this field, the only separator character is **/**. This means each subdirectory, and the complete file name, can be matched independently. [location:"path/to/directory"](?searchText=location:"path/to/directory") and [location:directory](?searchText=location:directory) are both valid ways of searching for a single or sequence of directories in the filepath. Wildcards can be used within a single subdirectory, but will not cross into the next child directory. For example **path*directory** will not match, but **dir*** will. This field is one of those searched by default if no field is specified. 

**location.fileName**
This field uses the just the file name (whatever follows the final **/**) and splits it by **.** to make it easier to search for files with the same root but different extensions ([location.fileName:run_1234](?searchText=location.fileName:run_1234) would match both "run_1234.txt" and "run_1234.nxs"), or the same extension but different roots ([location.fileName:txt](?searchText=location.fileName:txt) would match both "run_1234.txt" and "run_5678.txt"). This field is one of those searched by default if no field is specified. 

**location.exact**
Finally, this field allows exact and hierarchical matches on the absolute file path. All files that start with the search term will be returned, and wildcards can be used to match multiple subdirectories if needed. Unlike the location field, this means an incomplete or relative path cannot be provided. [location.exact:/dls/i00/data/202?](?searchText=location.exact:/dls/i00/data/202?) would match everything at instrument i00 in any folder for the 2020s. Note that to be effective this requires knowledge of the file hierarchy and may lead to poor performance if a lot of results match, so consider combining this with other terms to make a more specific query. This field is NOT searched by default if no field is specified. 

Fields

Replace:

**Datafile**

- name
- description
- location
- visitId

With:

**Datafile**

- name
- description
- visitId
- location
- location.fileName
- location.exact

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions