Skip to content
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
309 changes: 309 additions & 0 deletions docs/path-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# Path Search in QLever

The Path Search feature in this SPARQL engine allows users to perform advanced queries
to find paths between sources and targets in a graph. It supports a variety of configurations,
including single or multiple source and target nodes, optional edge properties, and
custom algorithms for path discovery. This feature is accessed using the `SERVICE` keyword
and the service IRI `<https://qlever.cs.uni-freiburg.de/pathSearch/>`.

## Basic Syntax

The general structure of a Path Search query is as follows:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ; # Specify the algorithm
pathSearch:source <sourceNode> ; # Specify the source node(s)
pathSearch:target <targetNode> ; # Specify the target node(s)
pathSearch:pathColumn ?path ; # Bind the path variable
pathSearch:edgeColumn ?edge ; # Bind the edge variable
pathSearch:start ?start ; # Bind the edge start variable
pathSearch:end ?end ; # Bind the edge end variable
{SELECT * WHERE {
?start <predicate> ?end. # Define the edge pattern
}}
}
}
```

### Parameters

`pathSearch:algorithm`: Defines the algorithm used to search paths. Currently, only `pathSearch:allPaths` is supported.

`pathSearch:source`: Defines the source node(s) of the search.

`pathSearch:target` (optional): Defines the target node(s) of the search.

`pathSearch:pathColumn`: Defines the variable for the path.

`pathSearch:edgeColumn`: Defines the variable for the edge.

`pathSearch:start`: Defines the variable for the start of the edges.

`pathSearch:end`: Defines the variable for the end of the edges.

`pathSearch:edgeProperty` (optional): Specifies properties for the edges in the path.

`pathSearch:cartesian` (optional): Controls the behaviour of path searches between
source and target nodes. Expects a boolean. The default is `true`. If set to `true`, the search will compute the paths from each source to **all targets**. If set to `false`, the search will compute the paths from each source to exactly
**one target**. Sources and targets are paired based on their index (i.e. the paths
from the first source to the first target are searched, then the second source and
target, and so on).

`pathSearch:numPathsPerTarget` (optional): The path search will only search and store paths,
if the number of found paths is lower or equal to the value of the parameter. Expects an integer.
Example: if the value is 5, then the search will enumerate all paths until 5 paths have been found.
Other paths will be ignored.

??? note "Examples"

**Single Source and Target**

The simplest case is searching for paths between a single source and a single target:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

**Multiple Sources or Targets**

It is possible to specify a set of sources or targets for the path search.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

This query will search forall between all sources and all targets, i.e.
- (`<source1>`, `<target1>`)
- (`<source1>`, `<target2>`)
- (`<source2>`, `<target1>`)
- (`<source2>`, `<target2>`)

It is possible to specify, whether the sources and targets should be combined according
to the cartesian product (as seen above) or if they should be matched up pairwise, i.e.
- (`<source1>`, `<target1>`)
- (`<source2>`, `<target2>`)

This can be done with the parameter `pathSearch:cartesian`. This parameter expects a
boolean. If set to `true`, then the cartesian product is used to match the sources with
the targets.
If set to `false`, then the sources and targets are matched pairwise. If left
unspecified, then the default `true` is used.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:cartesian false;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

**Edge Properties**

You can also include edge properties in the path search to further refine the results:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
}
}
}
}
```

This is especially useful for [N-ary relations](https://www.w3.org/TR/swbp-n-aryRelations/).
Considering the example above, it is possible to query additional relations of `?middle`:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:edgeProperty ?edgeInfo ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
?middle <predicate3> ?edgeInfo.
}
}
}
}
```

This makes it possible to query additional properties of the edge between `?start` and `?end` (such as `?edgeInfo` in the example above).


**Source or Target as Variables**

You can also bind the source and/or target dynamically using variables. The examples
below use `VALUES` clauses, which can be convenient to specify sources and targets.
However, the source/target variables can also be bound using any regular SPARQL construct.

**Source Variable**

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?source {<source>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source ?source ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

**Target Variable**

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?target {<target>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target ?target ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

**Limit Number of Paths per Target**

It is possible to limit how many paths per target are returned. This is especially useful if
the query uses a lot of memory. In that case, it is possible to query a limited number of
paths to debug where the problem is.

The following query for example will only return one path per source and target pair.
I.e. one path for `(<source1>, <target1>)`, one path for `(<source1>, <target2>)` and so on.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:numPathsPerTarget 1;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

## Error Handling

The Path Search feature will throw errors in the following scenarios:

- **Missing Start Parameter**: If the `start` parameter is not specified, an error will be raised.
- **Multiple Start or End Variables**: If multiple `start` or `end` variables are defined, an error is raised.
- **Invalid Non-Variable Start/End**: If the `start` or `end` parameter is not bound to a variable, the query will fail.
- **Unsupported Argument**: Arguments other than those listed (like custom user arguments) will cause an error.
- **Non-IRI Predicate**: Predicates must be IRIs. If not, an error will occur.
84 changes: 84 additions & 0 deletions docs/special-features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Miscellaneous special features

## Internal Triples for SPARQL+Text and SPARQL Autocompletion

On top of the vanilla SPARQL functionality, QLever allows so-called SPARQL+Text
queries on a text corpus linked to a knowledge base via entity recognition. For
example, the following query finds all mentions of astronauts next to the words
"moon" and "walk*" in the text corpus:

```sparql
SELECT ?a TEXT(?t) SCORE(?t) WHERE {
?a <is-a> <Astronaut> .
?t ql:contains-entity ?a .
?t ql:contains-word "walk* moon"
} ORDER BY DESC(SCORE(?t))
```

Such queries can be simulated in standard SPARQL, but only with poor
performance, see the CIKM'17 paper above. Details about the required input data
and the SPARQL+text query syntax and semantics can be found
[here](text-search.md).

QLever also supports efficient SPARQL autocompletion. For example, the
following query yields a list of all predicates associated with people in the
knowledge base, ordered by the number of people which have that predicate.

```sparql
SELECT ?predicate (COUNT(?predicate) as ?count) WHERE {
?x <is-a> <Person> .
?x ql:has-predicate ?predicate
}
GROUP BY ?predicate
ORDER BY DESC(?count)
```

Note that this query could also be processed by a standard SPARQL engine simply
by replacing the second triple with `?x ?predicate ?object` and adding
`DISTINCT` inside the `COUNT()`.

However, that query will produce a very large intermediate result (all triples
of all people) with a correspondingly long query time. In contrast, the query
above takes only about 100 ms on a standard Linux machine (with 16 GB memory)
and a dataset with 360 million triples and 530 million text records.

## Statistics

You can get statistics for the currently active index in the following way:

```
<server>:<port>/?cmd=stats
```

This query will yield a JSON response that features:

* The name of the KB index
* The number of triples in the KB index
* The number of index permutations build (usually 2 or 6)
* The numbers of distinct subjects, predicates and objects (only available if 6 permutations are built)
* The name of the text index (if one is present)
* The number of text records in the text index (if a text index is present)
* The number of word occurrences/postings in the text index (if a text index is present)
* The number of entity occurrences/postings in the text index (if a text index is present)

The name of an index is the name of the input file (and wordsfile for the
text index), but can also be specified manually while building an index.
Therefore, IndexbuilderMain takes two optional arguments: `--text-index-name` (`-T`)
and `--kb-index-name` (`-K`).

## Send vs Compute

Currently, QLever does not compute partial results if there is a `LIMIT` modifier.

However, strings (for entities and text excerpts) are only resolved for those
items that that will be transmitted. Furthermore, a UI usually only requires
a limited amount of rows at a time.

While specifying a `LIMIT` is recommended, some experiments may want
to measure the time to produce the full result.
Therefore an additional HTTP parameter `&send=<x>` can be used to send only
k result rows while still computing the readable result for up to `LIMIT` rows.

**IMPORTANT: Unless you want to measure QLever's performance, using `LIMIT` (+
`OFFSET` for sequential loading) is preferred in all applications. `LIMIT` is
faster and produces the same output as the `send` parameter**
Loading