Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
311 changes: 311 additions & 0 deletions docs/path-search.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# Path Search in QLever

The Path Search feature in this SPARQL engine allows users to perform advanced queries
to find paths between sources and targets in a graph. It supports a variety of configurations,
including single or multiple source and target nodes, optional edge properties, and
custom algorithms for path discovery. This feature is accessed using the `SERVICE` keyword
and the service IRI `<https://qlever.cs.uni-freiburg.de/pathSearch/>`.

## Basic Syntax

The general structure of a Path Search query is as follows:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ; # Specify the algorithm
pathSearch:source <sourceNode> ; # Specify the source node(s)
pathSearch:target <targetNode> ; # Specify the target node(s)
pathSearch:pathColumn ?path ; # Bind the path variable
pathSearch:edgeColumn ?edge ; # Bind the edge variable
pathSearch:start ?start ; # Bind the edge start variable
pathSearch:end ?end ; # Bind the edge end variable
{SELECT * WHERE {
?start <predicate> ?end. # Define the edge pattern
}}
}
}
```

### Parameters

`pathSearch:algorithm`: Defines the algorithm used to search paths. Currently, only `pathSearch:allPaths` is supported.

`pathSearch:source`: Defines the source node(s) of the search.

`pathSearch:target` (optional): Defines the target node(s) of the search.

`pathSearch:pathColumn`: Defines the variable for the path.

`pathSearch:edgeColumn`: Defines the variable for the edge.

`pathSearch:start`: Defines the variable for the start of the edges.

`pathSearch:end`: Defines the variable for the end of the edges.

`pathSearch:edgeProperty` (optional): Specifies properties for the edges in the path.

`pathSearch:cartesian` (optional): Controls the behaviour of path searches between
source and target nodes. Expects a boolean. The default is `true`. If set to `true`, the search will compute the paths from each source to **all targets**. If set to `false`, the search will compute the paths from each source to exactly
**one target**. Sources and targets are paired based on their index (i.e. the paths
from the first source to the first target are searched, then the second source and
target, and so on).

`pathSearch:numPathsPerTarget` (optional): The path search will only search and store paths,
if the number of found paths is lower or equal to the value of the parameter. Expects an integer.
Example: if the value is 5, then the search will enumerate all paths until 5 paths have been found.
Other paths will be ignored.

??? note "Examples"

**Single Source and Target**

The simplest case is searching for paths between a single source and a single target:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

**Multiple Sources or Targets**

It is possible to specify a set of sources or targets for the path search.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

This query will search forall between all sources and all targets, i.e.

- (`<source1>`, `<target1>`)
- (`<source1>`, `<target2>`)
- (`<source2>`, `<target1>`)
- (`<source2>`, `<target2>`)

It is possible to specify, whether the sources and targets should be combined according
to the cartesian product (as seen above) or if they should be matched up pairwise, i.e.

- (`<source1>`, `<target1>`)
- (`<source2>`, `<target2>`)

This can be done with the parameter `pathSearch:cartesian`. This parameter expects a
boolean. If set to `true`, then the cartesian product is used to match the sources with
the targets.
If set to `false`, then the sources and targets are matched pairwise. If left
unspecified, then the default `true` is used.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:cartesian false;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

**Edge Properties**

You can also include edge properties in the path search to further refine the results:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
}
}
}
}
```

This is especially useful for [N-ary relations](https://www.w3.org/TR/swbp-n-aryRelations/).
Considering the example above, it is possible to query additional relations of `?middle`:

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:edgeProperty ?middle ;
pathSearch:edgeProperty ?edgeInfo ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <predicate1> ?middle.
?middle <predicate2> ?end.
?middle <predicate3> ?edgeInfo.
}
}
}
}
```

This makes it possible to query additional properties of the edge between `?start` and `?end` (such as `?edgeInfo` in the example above).


**Source or Target as Variables**

You can also bind the source and/or target dynamically using variables. The examples
below use `VALUES` clauses, which can be convenient to specify sources and targets.
However, the source/target variables can also be bound using any regular SPARQL construct.

**Source Variable**

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?source {<source>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source ?source ;
pathSearch:target <target> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

**Target Variable**

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
VALUES ?target {<target>}
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source> ;
pathSearch:target ?target ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
{
SELECT * WHERE {
?start <p> ?end.
}
}
}
}
```

**Limit Number of Paths per Target**

It is possible to limit how many paths per target are returned. This is especially useful if
the query uses a lot of memory. In that case, it is possible to query a limited number of
paths to debug where the problem is.

The following query for example will only return one path per source and target pair.
I.e. one path for `(<source1>, <target1>)`, one path for `(<source1>, <target2>)` and so on.

```sparql
PREFIX pathSearch: <https://qlever.cs.uni-freiburg.de/pathSearch/>

SELECT ?start ?end ?path ?edge WHERE {
SERVICE pathSearch: {
_:path pathSearch:algorithm pathSearch:allPaths ;
pathSearch:source <source1> ;
pathSearch:source <source2> ;
pathSearch:target <target1> ;
pathSearch:target <target2> ;
pathSearch:pathColumn ?path ;
pathSearch:edgeColumn ?edge ;
pathSearch:start ?start ;
pathSearch:end ?end ;
pathSearch:numPathsPerTarget 1;
{
SELECT * WHERE {
?start <predicate> ?end.
}
}
}
}
```

## Error Handling

The Path Search feature will throw errors in the following scenarios:

- **Missing Start Parameter**: If the `start` parameter is not specified, an error will be raised.
- **Multiple Start or End Variables**: If multiple `start` or `end` variables are defined, an error is raised.
- **Invalid Non-Variable Start/End**: If the `start` or `end` parameter is not bound to a variable, the query will fail.
- **Unsupported Argument**: Arguments other than those listed (like custom user arguments) will cause an error.
- **Non-IRI Predicate**: Predicates must be IRIs. If not, an error will occur.
2 changes: 2 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ For any of the platforms not listed above you can install the `qlever` CLI tool

This will create a SPARQL endpoint for the 120 Years of Olympics dataset. It is a great dataset for getting started because it is small, but not trivial (around 2 million triples), and the downloading and indexing should only take a few seconds.

You can fetch any of a number of example `Qleverfile`s via `qlever setup-config <config-name>`. In particular, a `Qleverfile` is available for each of the demos at <https://qlever.dev>: [list of all example `Qleverfile`s](https://github.com/qlever-dev/qlever-control/tree/main/src/qlever/Qleverfiles). To write a `Qleverfile` for your own data, pick one of these configurations as a starting point and edit the `Qleverfile` as you see fit. A detailed explanation of all `Qleverfile` options may also be found at [Qleverfile settings](qleverfile.md).

Each command will also show you the command line it uses. That way you can learn, on the side, how QLever works internally. If you just want to know the command line for a particular command, without executing it, you can append `--show` like this:

```bash
Expand Down
Loading