Rules Document content #637

afs · 2025-11-04T17:15:39Z

afs
Nov 4, 2025
Collaborator

1: Introduction

Terminology
Terminology from RDF Concepts, SPARQL Query
Document Conventions
Namespaces - including shnex, shrl:, sparql:, xsd:, rdf:
Conformance
Test suite

2: Shape Rules

Informative section (that is, non-normative) explaining the main features

Basic outline, key features, not all features.
Audience: rules engineer

What are "rules"? A set of conditions on the shape of the data that infer new information from existing data.

Apply rules: rules+data -> new information

A rule set is a collection of rules. Unit of execution.

An execution is rule set + data (base data)

2.1 Structure of a Rule

Head-Body: match body (the "if",) and generate new information ("triples")
Or as H is true when B

Body is a pattern, with value restriction.

Example of a rule that does not depend on other rules.
Example of a rule with a filter.

2.2 Evalaution

"match body, get variables, use head as a template.

Makes new information available to other rules.

Execute until no change.

Operations - infer(), query()

A rules system can provide one or both of these operations.

"infer" - creates all new information
"query" - given a simple pattern, can it be inferred by the rules+data.

2.3 Rule Dependencies

Rules can depend on rules.

Including recursion.

Example: A rule that depends on another rule. :ancestor
Example: :ancestor of :ancestor

2.4 Rule Set Stratification

"We say that rule1 depends on rule2 if ..."

2.5 Negation-as-failure

Shape Rules also supports

{ :s :p ?z . NOT EXISTS { ?z :q ?v } ... }

2.6 Assignment

and why it is beyond datalog

"safe assignment" - previous stratification - is it enough?

3. Shape Rules Abstract Syntax

Normative.

This section is the formal definition of rules and rule set execution.

3.1 Well-formedness Condition for a rule body

FILTERs, Assignments conditions so variables are defined before use.

3.2 Dependency Relationship

Definition rule A deopends on rule B if ...

3.3 Stratification

Definition

... and well-formedness conditions (NAF and recursion)

NB EXISTS - with body pattern (a limited graph pattern)

4. Concrete Syntax forms for Shapes Rules

4.1 RDF Rules Syntax

4.2 Compact Rules Syntax

4.2.1 Compact Syntax Abbreviations

4.3 SPARQL function restrictions

No BOUND, FILTER (NOT) EXISTS (available as a pattern)

Explain pattern of EXISTS/NOT EXISTS is a body pattern

5. Shape Rules Evaluation

Define evaluation

Evaluation of an expression
Evaluation of a rule
Evaluation of a Rule Set

Stratification.

Necessary if NAF.

6. Workspace named tuples

Possible addition. Tuples as space during execution. Avoids repeating pattern fragments.

Syntax:

TUPLE(name/string, varTerm1, varTerm2, ...) -- include this and maybe a short form.
?? &name(varTerm1, varTerm2, ...)
other character: % (caveat %xx), ^, *, \name, |name|(varTerm1, varTerm2, ...)

7. Attaching Rules to Shapes

Does (some) targets as patterns work for a definition?

Appendix A: Shape Rules Grammar

Appendix B: Relationship to SHACL-AF

SHACL-AF Triple rules
SHACL-AF SPARQL rules

Appendix C: Relationship to node expressions

liviorobaldo · 2025-11-06T16:48:16Z

liviorobaldo
Nov 6, 2025
Collaborator

Hello!

I don’t remember if we are expected to comment on the provisional table of contents above. If not, I’ll just delete this comment 🙂

Anyway, I have three comments, which I also discussed at the latest meeting.

============================

Attaching Rules to Shapes
Does (some) targets as patterns work for a definition?

As I mentioned at the meeting, I think this should be specified in 2. Packaging SHACL. In my view, as I proposed in the other discussion group, there should be a "cluster" (or "bundle," or any other suitable name) that groups together some data, a set of shapes, and a set of rules. We should also decide whether shapes are applied before the rules or vice versa, with respect to the two operations infer() and query().

I understood that you and the others were considering this a viable idea, but it is definitely something to discuss with the SHACL Profiling task force. How should we proceed? I can raise the point at the next WG meeting on Monday, or I could open an issue directly in the Packaging SHACL section (once I figure out how 😅), but I’m not sure if that’s the proper procedure. By the way, the SHACL 1.2 Editor Draft now displays entirely black in my browser, and the githack URLs give a 404, so I’m not sure what’s going on.

============================
We know that:

Apply rules: rules+data -> new information

could also mean creating new blank nodes or literals, which could lead to infinite loops. We discussed at the meeting that we should explain at some point how to prevent these infinite loops (but not in Section 2!).

It's not fully clear to me how to formally prevent them in the grammar, because infinite loops are triggered by rules whose antecedents are always satisfied (or satisfied, then not, then satisfied again, and so on, infinitely). Is there a way to constrain the grammar to avoid this? I can't think of one, but I'm happy to learn 😊 Alternatively, we could simply add a disclaimer noting the risk of infinite loops and that it's the user's responsibility to ensure their rule set does not generate them.

============================
I notice that the table of contents above no longer mentions aggregate functions. I think these should be enabled by the grammar, as they are needed in many use cases.

For example, with a PhD student of mine, I am developing a system to check compliance with Ghana Petroleum Commission regulations (see this paper). One regulation requires companies operating in Ghana to employ at least 80% Ghanaians among the technical staff after 5 years. To infer whether a company complies, we must: (1) count the total employees; (2) count the Ghanaian employees; (3) check that (2)>0.8*(1)

This obviously requires aggregate functions (COUNT).

I've read more carefully how aggregates are used in SHACL 1.2 Node Expressions, but I don’t think they can be directly used here. This isn't a validation problem: the data are valid, we must infer whether the data comply with the regulations or not. Also, these regulations can include exceptions, e.g., companies may be exempt under certain conditions, but evaluating these conditions may require several additional operations, potentially including more aggregates. Therefore, I don't think this is a validation issue, it looks like an inference one.

The problem is indeed more general: some inferences are required only when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit, or in similar situations. Think, for example, of applications in finance.

Nevertheless, I’m not an expert in stratification enough to know if the basic stratification method, used for negation-as-failure, can be easily extended to aggregates as well, e.g., evaluating aggregate rules only after all non-aggregate rules have been evaluated.

Cheers,
Livio

1 reply

afs Nov 6, 2025
Collaborator Author

============================

Attaching Rules to Shapes
Does (some) targets as patterns work for a definition?

As I mentioned at the meeting, I think this should be specified in 2. Packaging SHACL.

Maybe raise a github issue and label it appropriately?

We should also decide whether shapes are applied before the rules or vice versa, with respect to the two operations infer() and query().

If shapes are triggered by targets, including in the pattern (targets are ways to find nodes in the data graph) may be the way to treat them as rules like any other.

[ ] sh:targetClass :SomeClass .

is

. . . WHERE { ?x rdf:type :SomeClass .  ... }

Some thing to explore.

Does packaging change that?

============================
We know that:

Apply rules: rules+data -> new information

could also mean creating new blank nodes or literals, which could lead to infinite loops. We discussed at the meeting that we should explain at some point how to prevent these infinite loops (but not in Section 2!).

It's not fully clear to me how to formally prevent them in the grammar,

Not by syntax. The unbounded outputs come from dependency loops.

What could be done is ensuring that there aren't dependency loops (same situation as NAF).

============================
I notice that the table of contents above no longer mentions aggregate functions. I think these should be enabled by the grammar, as they are needed in many use cases.

"many" need qualifying.

This obviously requires aggregate functions (COUNT).

I've read more carefully how aggregates are used in SHACL 1.2 Node Expressions, but I don’t think they can be directly used here.

What's the problem with node expression aggregates?

liviorobaldo · 2025-11-06T21:16:37Z

liviorobaldo
Nov 6, 2025
Collaborator

Maybe raise a github issue and label it appropriately?

Ok, let me try... tomorrow :-)

If shapes are triggered by targets, including in the pattern (targets are ways to find nodes in the data graph) may be the way to treat them as rules like any other.

Indeed. I was already told that shapes are like rules, with the main difference being that they produce an error message rather than new triples. So, in principle, everything could be modelled as rules. As you say, it’s something worth exploring. However, even if it works technically, I’m not sure it’s a good idea conceptually. Perhaps we should instead maintain a clear conceptual distinction between validation and inference, using two separate constructs to better emphasize this difference.

Okay about infinite loops. I also thought about the parallel with NAF, but the difference is that an infinite loop can be triggered even by a single rule. However, I can now see that the rule would depend on itself, so what you propose would still work.

"many" need qualifying.
What's the problem with node expression aggregates?

As I mentioned in my previous reply, we might need rules to infer new values when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit. I haven’t conducted empirical analyses to determine how often this need arises, but intuitively, it seems like it could be fairly common. I actually worked on a small use case with my PhD student and already ran into this need. Maybe I was "unlucky" and found a rare case, but even if it is seldom needed, I don’t see why we should prevent aggregates in the bodies. Are they really that much harder to handle than negation-as-failure?

Perhaps the problem (my problem) is that I haven’t fully understood node expressions. Can they also be used for rules? From the current draft, I understood that they can only be used for shapes, i.e., for validating the data.

How would that work, for example, in the use case I described earlier? Suppose we have two classes: Company and Employee. Individuals of type Company are linked to individuals of type Employee via the property employs, while another property, has-nationality, links employees to their nationality, e.g., UK or Ghana.

Can we use node expressions to state that if at least 80% of a company’s employees are from Ghana, then the company belongs to the class Compliant? In my experience, this shouldn't be modeled as a validation problem. Compliance with regulations is an inference problem. In fact, determining whether a norm applies can require complex reasoning, because exceptions may exist. Similarly, if we infer that a company is not compliant, we may need to trigger additional rules to determine the sanctions or compensations the company must provide for non-compliance.

With SHACL-SPARQL rules, we still had to specify sh:targetClass. Node expression aggregates could then be used in this sh:targetClass. But with the new SHACL 1.2 format, we no longer have sh:targetClass. Can we use node expression aggregates in the bodies of rules?

As I understand it, with the current grammar we cannot. That’s why I was proposing allowing aggregates in the bodies and then extending stratification to them (evaluating aggregate rules only after all non-aggregate rules have been evaluated). However, there may be some reason (which I don’t know) why stratification works for negation-as-failure but not for aggregates.

1 reply

osavkovic Nov 10, 2025
Collaborator

Hi all,

Just a quick answer to Livio’s comment on aggregates. We also did some work on introducing aggregates in SHACL, in collaboration with Bosch AI in Germany, on designing a constraint language for ML pipelines.

https://dl.acm.org/doi/10.1007/978-3-032-02215-8_13

In our use case, we have:

first, a small inference language (like a simple rule language), and
once the rules finish inferencing, we perform validation and apply, among other things, aggregate constraints that, for instance, check whether there is appropriate data to perform ML tasks.

Long story short, we use aggregates only in the check phase (2) (we don’t want to store them).
So in our case, aggregates were not part of the rules, but of the (node) shapes, if you will.

Now, there might be cases where you want to produce that new aggregate value, perhaps store it, etc., but that doesn’t seem to be the case for you either (but maybe I should read the paper first :), to get all the details)

liviorobaldo · 2025-11-11T14:53:16Z

liviorobaldo
Nov 11, 2025
Collaborator

Hi Oggy,

that doesn’t seem to be the case for you either (but maybe I should read the paper first :), to get all the details)

No, we actually need to store the aggregates in order to reason about them. However, once companies are inferred to be compliant or non-compliant (which may trigger further inferences, such as sanctions or compensations), those aggregates could be discarded. And of course, if the number of a company’s employees changes, they must be recomputed.

As I mentioned, I can also easily imagine use cases in finance that require aggregates. For example, inferring the sum of total profits made by a company and checking whether they exceed a certain threshold (e.g., for tax purposes).

It’s not clear to me whether aggregates introduce complications in the inference system that are more difficult to handle than those caused by stratified negation-as-failure. If not, then I think we should allow both.

Cheers
Livio

0 replies

afs · 2025-11-11T15:23:22Z

afs
Nov 11, 2025
Collaborator Author

However, once companies are inferred to be compliant or non-compliant (which may trigger further inferences, such as sanctions or compensations), those aggregates could be discarded.

When would sanctions be removed? Would it not be a separate graph and having a rule set execution without merging the inference graph outcome into the base graph

Alternatively, deletion can be handled by SPARQL Update.

I think we should allow both.

~~two~~ three things need to be clarified:

Which aggregates? There are some core ones like count, but also various statistical ones.
What about grouping (GROUP BY)?
What is being aggregated? Is it a body pattern expression (with no inner aggregates)? Or something less expressive?

0 replies

liviorobaldo · 2025-11-13T13:20:01Z

liviorobaldo
Nov 13, 2025
Collaborator

Hi Andy,

When would sanctions be removed?

Theoretically, never. Sanctions can be paid, but we can still keep them in the KB for record-keeping purposes. For example, a company may have received a sanction two years ago and subsequently paid it, but we would still want to retain a record of the event. We might decide to, for instance, delete all data older than five years, though this would depend on the specific application.

Would it not be a separate graph and having a rule set execution without merging the inference graph outcome into the base graph
Alternatively, deletion can be handled by SPARQL Update.

This is another option. However, I don’t quite see how this relates to the decision about whether or not to allow aggregates.

two three things need to be clarified:
Which aggregates? There are some core ones like count, but also various statistical ones.
What about grouping (GROUP BY)?
What is being aggregated? Is it a body pattern expression (with no inner aggregates)? Or something less expressive?

Here is my proposal. Aggregates are allowed only within the function BIND.

We might introduce a two-parameter function SELECT as an argument of BIND. The first parameter of SELECT is a single aggregate function, while the second parameter is a knowledge graph with variables. For instance:

RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum)}

This rule calculates the total profits for each company. For example, if we have two companies, Livio and Andy, and the knowledge graph stores the following data:

:Livio a :Company.
:Andy a :Company.
:Livio :has-profit 100.
:Livio :has-profit 50.
:Andy :has-profit 200.
:Andy :has-profit 150.

The rule above generates the triples:

:Livio :has-total-profits 150.
:Andy :has-total-profits 350.

This rule is equivalent to the following SPARQL query:

SELECT ?company (SUM(?profit) AS ?totalProfit)
WHERE 
{
	?company a :Company ;
	:has-profit ?profit .
}
GROUP BY ?company

Note the use of GROUP BY, since the aggregation is performed per company. To compute the sum of all profits across all companies, we could leave the variable unbound, for example:

RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?any :has-profit ?amount}) AS ?sum)}

The variable ?any is now unbound, so the sum is calculated over all possible values of ?any. This rule then produces the following triples:

:Livio :has-total-profits 500.
:Andy :has-total-profits 500.

We can also implement HAVING. The following query:

SELECT ?company (SUM(?profit) AS ?totalProfit)
WHERE 
{
	?company a :Company ;
	:has-profit ?profit .
}
GROUP BY ?company
HAVING SUM(Amount)>200;

is equivalent to:

RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum) FILTER(?sum>200)}

Implementing aggregates in this way prevents nested queries (i.e., only one level of nesting is allowed, since we cannot nest a BIND inside a SELECT). Therefore, in response to your question 3, in my approach there are no inner aggregates — these would have to be implemented through two separate rules.

To implement aggregates in the way I propose, the grammar should be modified as follows:

[24]  	Bind	  ::=  	'BIND' '(' Expression 'AS' Var ')'

becomes:

[24]  	Bind	  ::=  	'BIND' '(' (Expression | Aggregate) 'AS' Var ')'

where:

[*]  	Aggregate	  ::=  	'SELECT' '(' COUNT|SUM|... '(' Var '),' TriplesTemplateBlock ')'

In response to your question 1, I think we could retain all aggregate functions, though I do not know each of them in detail. If any particular function introduces additional complications, we can exclude it.

Of course, we must apply stratification to aggregates as well. In cases where there are inference rules that generate additional profits, the rule above must be executed after those. However, there will be a dependency from the first rule to the second, and we must ensure there are no cycles in these dependencies — exactly as with negation-as-failure. From this perspective, I do not see any conceptual difference between negation-as-failure and aggregates.

The above is just an initial idea. If we agree it is promising, we can investigate it further. As I mentioned, I believe rules should be allowed to infer aggregate values.

Cheers
Livio

0 replies

osavkovic · 2025-11-17T12:39:56Z

osavkovic
Nov 17, 2025
Collaborator

Hi All, @livio, I understand your case now. You want to use those aggregates later in the graph for validation. Ideally, I still think that SHACL itself should have node expressions with aggregates to address your issue, since as you said, the data may be updated continuously and one would need to re-calculate the aggregates. A small comment on the stratification for aggregates: here stratification is not an ideal word, since we may need a total order on the predicates. That is, predicates defined by means of aggregates cannot mutually depend on one another (regardless of whether the dependencies are positive or not). My preferred approach would be to start from an abstract syntax for the rules and then introduce a concrete RDF-like syntax. @andy, do you think I can work on some parts of the document: #637? I would really prefer latex here, but I guess thats not possible :) Best Ognjen

…

On 13 Nov 2025, at 14:20, liviorobaldo ***@***.***> wrote: Hi Andy, When would sanctions be removed? Theoretically, never. Sanctions can be paid, but we can still keep them in the KB for record-keeping purposes. For example, a company may have received a sanction two years ago and subsequently paid it, but we would still want to retain a record of the event. We might decide to delete all data older than five years, though this would depend on the specific application. Would it not be a separate graph and having a rule set execution without merging the inference graph outcome into the base graph Alternatively, deletion can be handled by SPARQL Update. This is another option. However, I don’t quite see how this relates to the decision about whether or not to allow aggregates. two three things need to be clarified: Which aggregates? There are some core ones like count, but also various statistical ones. What about grouping (GROUP BY)? What is being aggregated? Is it a body pattern expression (with no inner aggregates)? Or something less expressive? Here is my proposal. Aggregates are allowed only within the function BIND. We might introduce a two-parameter function SELECT as an argument of BIND. The first parameter of SELECT is a single aggregate function, while the second parameter is a knowledge graph with variables. For instance: RULE{?x :has-total-profits ?sum.} WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum)} This rule calculates the total profits for each company. For example, if we have two companies, Livio and Andy, and the knowledge graph stores the following data: :Livio a :Company. :Andy a :Company. :Livio :has-profit 100. :Livio :has-profit 50. :Andy :has-profit 200. :Andy :has-profit 150. The rule above generates the triples: :Livio :has-total-profits 150. :Andy :has-total-profits 350. This rule is equivalent to the following SPARQL query: SELECT ?company (SUM(?profit) AS ?totalProfit) WHERE { ?company a :Company ; :has-profit ?profit . } GROUP BY ?company Note the use of GROUP BY, since the aggregation is performed per company. To compute the sum of all profits across all companies, we could leave the variable unbound, for example: RULE{?x :has-total-profits ?sum.} WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?any :has-profit ?amount}) AS ?sum)} The variable ?any is now unbound, so the sum is calculated over all possible values of ?any. This rule then produces the following triples: :Livio :has-total-profits 500. :Andy :has-total-profits 500. We can also implement HAVING. The following query: SELECT ?company (SUM(?profit) AS ?totalProfit) WHERE { ?company a :Company ; :has-profit ?profit . } GROUP BY ?company HAVING SUM(Amount)>200; is equivalent to: RULE{?x :has-total-profits ?sum.} WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum) FILTER(?sum>200)} Implementing aggregates in this way prevents nested queries (i.e., only one level of nesting is allowed, since we cannot nest a BIND inside a SELECT). Therefore, in response to your question 3, in my approach there are no inner aggregates — these would have to be implemented through two separate rules. To implement aggregates in this way, the grammar should be modified as follows: [24] Bind ::= 'BIND' '(' Expression 'AS' Var ')' becomes: [24] Bind ::= 'BIND' '(' (Expression | Aggregate) 'AS' Var ')' where: [*] Aggregate ::= 'SELECT' '(' COUNT|SUM|... '(' Var '),' TriplesTemplateBlock ')' In response to your question 1, I think we could retain all aggregate functions, though I do not know each of them in detail. If any particular function introduces additional complications, we can exclude it. Of course, we must apply stratification to aggregates as well. In cases where there are inference rules that generate additional profits, the rule above must be executed after those. However, there will be a dependency from the first rule to the second, and we must ensure there are no cycles in these dependencies — exactly as with negation-as-failure. From this perspective, I do not see any conceptual difference between negation-as-failure and aggregates. The above is just an initial idea. If we agree it is promising, we can investigate it further. As I mentioned, I believe rules should be allowed to infer aggregate values. Cheers Livio — Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSORPMP46XTQP72A3LSL34SARPAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJVHA4TMNQ>. You are receiving this because you commented.

0 replies

afs · 2025-11-17T16:02:49Z

afs
Nov 17, 2025
Collaborator Author

A small comment on the stratification for aggregates: here stratification is not an ideal word, since we may need a total order on the predicates. That is, predicates defined by means of aggregates cannot mutually depend on one another (regardless of whether the dependencies are positive or not).

Right - the rule containing an aggregate must not depend on a rule in the same stratum layer, only ones nears the base data.
The stratification criterion is extended to include the pattern of the aggregation. (We could only aggregate very simple patterns - it pushes the complex expression into another rule so no lost of expressivity.)

(I tend to thing of stratification as numbered layers with the base data being stratum 0.)

A layer becomes "closed" after evaluation in that the output up to that point won't change.

This is (I think) the same requirement as NAF; NAF is like aggregation and filter of COUNT = 0

Is there a proper name for this? A "strict stratification".

The evaluation algorithm will can't be blind fixed point but each strata evaluation is monotonic can be evaluated as pure positive datalog over the graph output of the strata below it.

An alternative to consider is the simpler (to define) "aggregate only over the base graph": just run the aggregates first.
But if we have NAF, not limited to semi-positive datatlog, then its no necessary to only have the simpler version.

Currently, I think some form of "GROUP BY" is possible, although not necessary because the grouping can be via another rule.

2 replies

ajnelson-nist Nov 20, 2025
Collaborator

On "strict stratification" other possible names - epochs? generations?
FWIW, "Stratum" does sound right to me, having seen "stratigraphy" in research articles (between digital forensics and not-digital archaeology).

afs Nov 20, 2025
Collaborator Author

On "strict stratification" other possible names - epochs? generations? FWIW, "Stratum" does sound right to me, having seen "stratigraphy" in research articles (between digital forensics and not-digital archaeology).

Tricky. "stratification" is the word used in datalog for the process.

osavkovic · 2025-11-17T16:41:23Z

osavkovic
Nov 17, 2025
Collaborator

I did some searching for terminology, mostly looking at aggregates in ASP (specifically in the DLV system), and there they used the term "aggregate-stratified.” People people familiar with ASP should be able to recognize it immediately. (https://www.ijcai.org/Proceedings/03/Papers/122.pdf) In a short ISWC paper few years back, we used "strict stratification” (though I’m not aware of anyone using it afterwards) for a stricter version of stratification where a connection between strata can be only positive or only negative. This made validation in recursive SHACL tractable. (https://ceur-ws.org/Vol-2180/paper-11.pdf) Best Ognjen

…

On 17 Nov 2025, at 17:03, Andy Seaborne ***@***.***> wrote: A small comment on the stratification for aggregates: here stratification is not an ideal word, since we may need a total order on the predicates. That is, predicates defined by means of aggregates cannot mutually depend on one another (regardless of whether the dependencies are positive or not). Right - the rule containing an aggregate must not depend on a rule in the same stratum layer, only ones nears the base data. The stratification criterion is extended to include the pattern of the aggregation. (We could only aggregate very simple patterns - it pushes the complex expression into another rule so no lost of expressivity.) (I tend to thing of stratification as numbered layers with the base data being stratum 0.) A layer becomes "closed" after evaluation in that the output up to that point won't change. This is (I think) the same requirement as NAF; NAF is like aggregation and filter of COUNT = 0 Is there a proper name for this? A "strict stratification". The evaluation algorithm will can't be blind fixed point but each strata evaluation is monotonic can be evaluated as pure positive datalog over the graph output of the strata below it. An alternative to consider is the simpler (to define) "aggregate only over the base graph": just run the aggregates first. But if we have NAF, not limited to semi-positive datatlog, then its no necessary to only have the simpler version. Currently, I think some form of "GROUP BY" is possible, although not necessary because the grouping can be via another rule. — Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSOQRXILJGHHG43MPMCT35HWT7AVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJZGE2TANI>. You are receiving this because you commented.

0 replies

liviorobaldo · 2025-11-17T17:37:26Z

liviorobaldo
Nov 17, 2025
Collaborator

Dear both,

I'm not enough expert about stratification, so I won't comment about that.

I totally agree about:

We could only aggregate very simple patterns - it pushes the complex expression into another rule so no lost of expressivity.

That is essentially what I was proposing in my previous post: to allow a single aggregate per rule, and aggregates are permitted only through simple one-level if-then patterns. This avoids embedded rules and instead distributes complex computations across multiple rules when needed.

I wonder instead about the syntax of the if-then rules inferring aggregates. Is the syntax I sketched in my previous post in the right direction?

Anyway, we can talk about this during our bi-weekly meeting tomorrow.

Have a nice evening
Livio

0 replies

robert-david · 2025-11-20T14:38:26Z

robert-david
Nov 20, 2025
Collaborator

Regarding the infinite loops because of generated blank nodes (rule head with blank node), i was thinking if we can solve this by restricting it via stratification. If a triple with a blank node occurs in a head, it may only appear in rule bodies with a lower stratum.
Not sure if this was discussed already.

Best,
Robert

1 reply

afs Nov 20, 2025
Collaborator Author

For some cases yes.

It has to be stronger - a rule engine may execute the rule twice in the naive algorithm, once to generate the head triples and once to see there is no further change (e.g. recursion in the same level of stratification). There maybe other reason why we can't assume a single execution of a rule - that's implementation.

What might work for rule head with blank nodes (c.f. CONSTRUCT) is looking whether the solution sequence from the body has changed - i.e. pre-head template instantiation.

It would be shame to loose the possibility because of writing RDF like:

RULE { :s :p [ :value "42" : units :Kelvin ] } WHERE ...

That can be broken up but that is moving the burden onto the user.

osavkovic · 2025-11-20T16:01:46Z

osavkovic
Nov 20, 2025
Collaborator

I would agree with Andy here. Stratification is a well established word in logic programming and in (computational) logic in general. In my view, it’s just a matter of finding the right adjective to put in front of the word. Best Ognjen

…

On 20 Nov 2025, at 15:42, Andy Seaborne ***@***.***> wrote: On "strict stratification" other possible names - epochs? generations? FWIW, "Stratum" does sound right to me, having seen "stratigraphy" in research articles (between digital forensics and not-digital archaeology). Tricky. "stratification" is the word used in datalog for the process. — Reply to this email directly, view it on GitHub <#637 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSORJJKKPJIVUNGOKAJD35XHNTAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMBSG42DMMQ>. You are receiving this because you commented.

0 replies

osavkovic · 2025-11-20T17:55:19Z

osavkovic
Nov 20, 2025
Collaborator

We did, in a way. Please note (as I mentioned in one of my previous emails) that "classical" stratification is not enough. We wanted to apply the same restrictions on the aggregates. So I guess it would be good to have a name for that. As you suggested, a simple way to solve the problem is, as you said, to forbid cyclic dependencies on predicates that occur in the head of some rules with generated blank nodes (*). Note that in the literature there exist more sophisticated conditions to ensure the so-called finiteness of the grounding. However, this is a rather complex problem (undecidable), and I believe condition (*) would be sufficient in practice. Note that this would not cover cases like the following, where still there is a final grounding: (from https://drops.dagstuhl.de/storage/00lipics/lipics-vol017-iclp2012/LIPIcs.ICLP.2012.323/LIPIcs.ICLP.2012.323.pdf) :a :p :a. RULE { _fX :p _gX } WHERE {?X :p ?X}. I used letters f and g here to make it easier to think of them as values produced for every matching X (I hope I got the semantics right). In that work, as a minimal restriction for finiteness, they consider “finite-domain programs,” which look at the predicate + variable position to define the dependency graph. I believe the same condition is implemented in DLV system (same authors) On the other hand, if I understood correctly, blank nodes construct behave at least as "badly" as functional symbols that contain all variables from the body and perhaps one extra to ensure freshness. For example: RULE { _blank :p a } WHERE {?x :p ?y. ?y :r ?z. }. is roughly equivalent to datalog: p(f(x,y,z,clocktime()), a) :- p(x,y), r(y,z) so “finite-domain programs” or “finite-domain rules” would fit our intentions. Alternative names could be: domain-safe rules, safe-domain rules, or simply safe rules so we put aggregates there as well. What do you think? Best Ognjen

…

On 20 Nov 2025, at 15:38, Robert David ***@***.***> wrote: Regarding the infinite loops because of generated blank nodes (rule head with blank node), i was thinking if we can solve this by restricting it via stratification. If a triple with a blank node occurs in a head, it may only appear in rule bodies with a lower stratum. Not sure if this was discussed already. Best, Robert — Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSOQAKHCSUT2VOYKBZV335XG7RAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMBSG42DANA>. You are receiving this because you commented.

1 reply

afs Nov 24, 2025
Collaborator Author

a simple way to solve the problem is, as you said, to forbid cyclic dependencies on predicates that occur in the head of some rules with generated blank nodes (*). Note that in the literature there exist more sophisticated conditions to ensure the so-called finiteness of the grounding. However, this is a rather complex problem (undecidable), and I believe condition (*) would be sufficient in practice.

Agreed.

Future versions of SHACL rule may allow for more sophisticated evaluation. But as a first version, we need a solid base. As long as new features add, but do not remove, capability that can easily be added. Removing/withdrawing features mean that newer version are not compatible with older versions.

afs · 2025-11-24T14:13:25Z

afs
Nov 24, 2025
Collaborator Author

Blank nodes in the head may be distracting from the necessary essence, especially as overall we have set semantics of RDF graphs.

Evaluation (datalog-style, forwards; backwards may be less problematic) needs to know if a rule body outcome changes the solutions (by adding solutions - it can not retract solutions).

Normally, this is observed by whether the head produces new triples to add to the inference graph.

But it does not have to be that way - instead, keep the body outcomes and only produce the head triples, with fresh blank nodes, based on the solution sequence at the end of the layer (trying to use the terminology in evaluation draft - it's quite concrete focusing on the "how", not the design goals) does it once.

(there are other ways to achieve the separation of evaluation from producing the blank nodes written in the head)

2 replies

liviorobaldo Dec 7, 2025
Collaborator

Hi all,

I have been reviewing our earlier notes and discussions about rule evaluation, as well as the latest version of the document. Andy previously proposed the two operations shown above (I made some searches and I now agree that we should support both):

2.2 Evalaution
"match body, get variables, use head as a template.

Makes new information available to other rules.

Execute until no change.

Operations - infer(), query()

A rules system can provide one or both of these operations.

"infer" - creates all new information
"query" - given a simple pattern, can it be inferred by the rules+data.

In the current draft, the evaluation graph is defined as the graph obtained by adding all inferred triples to the base (initial) graph. The algorithm computes the evaluation graph from the base graph.

However, the two operations above - infer and query - are not explicitly defined/distinguished in the document.

I think we should extend Section 5. Rule Set Evaluation to define these two operations clearly. infer corresponds to computing the evaluation graph, so this is already covered. For query, I see two possible interpretations:

(1) it is a yes/no operation that returns "yes" if rule application produces an evaluation graph different from the base graph; or
(2) it returns the first evaluation graph produced by the rules. If we choose this second interpretation, we would need to define a new algorithm, specifically without merge, since we are looking only for one compatible solution.

More generally, I think these two operations correspond to two different inference paradigms:

infer reflects bottom-up, forward-chaining inference (i.e., the algorithm that is already currently defined in the document), while
query requires top-down, backward-chaining inference, as in Prolog, where the first solution (if any) found along a path in the search tree from the root to the leaves is added to the base graph.

What do you think?

Cheers,
Livio

afs Dec 8, 2025
Collaborator Author

are not explicitly defined/distinguished in the document.

I think they are worth describing in section 2 (but not defining).

infer and query are related -- infer() is the same as query(?, ?, ?) and query(s, p, o) is infer().matchTriples(s, p, o).

To define, "SHACL Rules", I think we should give the clearist exposition that defines the correct outcomes. Making the whole inference graph is simpler to define ; partial evaluation gets complicated with recursion.

It isn't a simple as forward/backward chaining. Those are about the implementation.

e.g: RDFS: one technique is to materialize ("forward chaining") the transitive sub-class and a sub-property hierarchies. At runtime for query it is then fixed inference steps for domain/range as well as rdf:type. Completely expanding a graph once with a deep schema, can greatly increase the size of the data the number of triples -- it is a time/space tradeoff.

For implementations, caching can also play a part. e.g. materialize a query pattern, possibly more general than the query request on the expectation that there will be cache hits in subsequent query operations.

Rules Document content #637

Uh oh!

Uh oh!

afs Nov 4, 2025 Collaborator

1: Introduction

2: Shape Rules

2.1 Structure of a Rule

2.2 Evalaution

2.3 Rule Dependencies

2.4 Rule Set Stratification

2.5 Negation-as-failure

2.6 Assignment

3. Shape Rules Abstract Syntax

3.1 Well-formedness Condition for a rule body

3.2 Dependency Relationship

3.3 Stratification

4. Concrete Syntax forms for Shapes Rules

4.1 RDF Rules Syntax

4.2 Compact Rules Syntax

4.2.1 Compact Syntax Abbreviations

4.3 SPARQL function restrictions

5. Shape Rules Evaluation

6. Workspace named tuples

7. Attaching Rules to Shapes

Appendix A: Shape Rules Grammar

Appendix B: Relationship to SHACL-AF

Appendix C: Relationship to node expressions

Replies: 13 comments · 8 replies

Uh oh!

Uh oh!

liviorobaldo Nov 6, 2025 Collaborator

Uh oh!

afs Nov 6, 2025 Collaborator Author

Uh oh!

Uh oh!

liviorobaldo Nov 6, 2025 Collaborator

Uh oh!

Uh oh!

osavkovic Nov 10, 2025 Collaborator

Uh oh!

liviorobaldo Nov 11, 2025 Collaborator

Uh oh!

afs Nov 11, 2025 Collaborator Author

Uh oh!

Uh oh!

liviorobaldo Nov 13, 2025 Collaborator

Uh oh!

osavkovic Nov 17, 2025 Collaborator

Uh oh!

afs Nov 17, 2025 Collaborator Author

Uh oh!

ajnelson-nist Nov 20, 2025 Collaborator

Uh oh!

afs Nov 20, 2025 Collaborator Author

Uh oh!

osavkovic Nov 17, 2025 Collaborator

Uh oh!

liviorobaldo Nov 17, 2025 Collaborator

Uh oh!

robert-david Nov 20, 2025 Collaborator

Uh oh!

afs Nov 20, 2025 Collaborator Author

Uh oh!

osavkovic Nov 20, 2025 Collaborator

Uh oh!

Uh oh!

osavkovic Nov 20, 2025 Collaborator

Uh oh!

Uh oh!

afs Nov 24, 2025 Collaborator Author

Uh oh!

Uh oh!

afs Nov 24, 2025 Collaborator Author

Uh oh!

liviorobaldo Dec 7, 2025 Collaborator

Uh oh!

afs
Nov 4, 2025
Collaborator

Replies: 13 comments 8 replies

liviorobaldo
Nov 6, 2025
Collaborator

afs Nov 6, 2025
Collaborator Author

liviorobaldo
Nov 6, 2025
Collaborator

osavkovic Nov 10, 2025
Collaborator

liviorobaldo
Nov 11, 2025
Collaborator

afs
Nov 11, 2025
Collaborator Author

liviorobaldo
Nov 13, 2025
Collaborator

osavkovic
Nov 17, 2025
Collaborator

afs
Nov 17, 2025
Collaborator Author

ajnelson-nist Nov 20, 2025
Collaborator

afs Nov 20, 2025
Collaborator Author

osavkovic
Nov 17, 2025
Collaborator

liviorobaldo
Nov 17, 2025
Collaborator

robert-david
Nov 20, 2025
Collaborator

afs Nov 20, 2025
Collaborator Author

osavkovic
Nov 20, 2025
Collaborator

osavkovic
Nov 20, 2025
Collaborator

afs Nov 24, 2025
Collaborator Author

afs
Nov 24, 2025
Collaborator Author

liviorobaldo Dec 7, 2025
Collaborator