Rules Document content #637
Replies: 13 comments 8 replies
-
|
Hello! I don’t remember if we are expected to comment on the provisional table of contents above. If not, I’ll just delete this comment 🙂 Anyway, I have three comments, which I also discussed at the latest meeting. ============================
As I mentioned at the meeting, I think this should be specified in 2. Packaging SHACL. In my view, as I proposed in the other discussion group, there should be a "cluster" (or "bundle," or any other suitable name) that groups together some data, a set of shapes, and a set of rules. We should also decide whether shapes are applied before the rules or vice versa, with respect to the two operations infer() and query(). I understood that you and the others were considering this a viable idea, but it is definitely something to discuss with the SHACL Profiling task force. How should we proceed? I can raise the point at the next WG meeting on Monday, or I could open an issue directly in the Packaging SHACL section (once I figure out how 😅), but I’m not sure if that’s the proper procedure. By the way, the SHACL 1.2 Editor Draft now displays entirely black in my browser, and the githack URLs give a 404, so I’m not sure what’s going on. ============================
could also mean creating new blank nodes or literals, which could lead to infinite loops. We discussed at the meeting that we should explain at some point how to prevent these infinite loops (but not in Section 2!). It's not fully clear to me how to formally prevent them in the grammar, because infinite loops are triggered by rules whose antecedents are always satisfied (or satisfied, then not, then satisfied again, and so on, infinitely). Is there a way to constrain the grammar to avoid this? I can't think of one, but I'm happy to learn 😊 Alternatively, we could simply add a disclaimer noting the risk of infinite loops and that it's the user's responsibility to ensure their rule set does not generate them. ============================ For example, with a PhD student of mine, I am developing a system to check compliance with Ghana Petroleum Commission regulations (see this paper). One regulation requires companies operating in Ghana to employ at least 80% Ghanaians among the technical staff after 5 years. To infer whether a company complies, we must: (1) count the total employees; (2) count the Ghanaian employees; (3) check that (2)>0.8*(1) This obviously requires aggregate functions (COUNT). I've read more carefully how aggregates are used in SHACL 1.2 Node Expressions, but I don’t think they can be directly used here. This isn't a validation problem: the data are valid, we must infer whether the data comply with the regulations or not. Also, these regulations can include exceptions, e.g., companies may be exempt under certain conditions, but evaluating these conditions may require several additional operations, potentially including more aggregates. Therefore, I don't think this is a validation issue, it looks like an inference one. The problem is indeed more general: some inferences are required only when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit, or in similar situations. Think, for example, of applications in finance. Nevertheless, I’m not an expert in stratification enough to know if the basic stratification method, used for negation-as-failure, can be easily extended to aggregates as well, e.g., evaluating aggregate rules only after all non-aggregate rules have been evaluated. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Ok, let me try... tomorrow :-)
Indeed. I was already told that shapes are like rules, with the main difference being that they produce an error message rather than new triples. So, in principle, everything could be modelled as rules. As you say, it’s something worth exploring. However, even if it works technically, I’m not sure it’s a good idea conceptually. Perhaps we should instead maintain a clear conceptual distinction between validation and inference, using two separate constructs to better emphasize this difference. Okay about infinite loops. I also thought about the parallel with NAF, but the difference is that an infinite loop can be triggered even by a single rule. However, I can now see that the rule would depend on itself, so what you propose would still work.
As I mentioned in my previous reply, we might need rules to infer new values when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit. I haven’t conducted empirical analyses to determine how often this need arises, but intuitively, it seems like it could be fairly common. I actually worked on a small use case with my PhD student and already ran into this need. Maybe I was "unlucky" and found a rare case, but even if it is seldom needed, I don’t see why we should prevent aggregates in the bodies. Are they really that much harder to handle than negation-as-failure? Perhaps the problem (my problem) is that I haven’t fully understood node expressions. Can they also be used for rules? From the current draft, I understood that they can only be used for shapes, i.e., for validating the data. How would that work, for example, in the use case I described earlier? Suppose we have two classes: Can we use node expressions to state that if at least 80% of a company’s employees are from Ghana, then the company belongs to the class With SHACL-SPARQL rules, we still had to specify As I understand it, with the current grammar we cannot. That’s why I was proposing allowing aggregates in the bodies and then extending stratification to them (evaluating aggregate rules only after all non-aggregate rules have been evaluated). However, there may be some reason (which I don’t know) why stratification works for negation-as-failure but not for aggregates. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Oggy,
No, we actually need to store the aggregates in order to reason about them. However, once companies are inferred to be compliant or non-compliant (which may trigger further inferences, such as sanctions or compensations), those aggregates could be discarded. And of course, if the number of a company’s employees changes, they must be recomputed. As I mentioned, I can also easily imagine use cases in finance that require aggregates. For example, inferring the sum of total profits made by a company and checking whether they exceed a certain threshold (e.g., for tax purposes). It’s not clear to me whether aggregates introduce complications in the inference system that are more difficult to handle than those caused by stratified negation-as-failure. If not, then I think we should allow both. Cheers |
Beta Was this translation helpful? Give feedback.
-
When would sanctions be removed? Would it not be a separate graph and having a rule set execution without merging the inference graph outcome into the base graph Alternatively, deletion can be handled by SPARQL Update.
|
Beta Was this translation helpful? Give feedback.
-
|
Hi Andy,
Theoretically, never. Sanctions can be paid, but we can still keep them in the KB for record-keeping purposes. For example, a company may have received a sanction two years ago and subsequently paid it, but we would still want to retain a record of the event. We might decide to, for instance, delete all data older than five years, though this would depend on the specific application.
This is another option. However, I don’t quite see how this relates to the decision about whether or not to allow aggregates.
Here is my proposal. Aggregates are allowed only within the function We might introduce a two-parameter function This rule calculates the total profits for each company. For example, if we have two companies, The rule above generates the triples: This rule is equivalent to the following SPARQL query: Note the use of The variable We can also implement is equivalent to: Implementing aggregates in this way prevents nested queries (i.e., only one level of nesting is allowed, since we cannot nest a To implement aggregates in the way I propose, the grammar should be modified as follows: becomes: where: In response to your question 1, I think we could retain all aggregate functions, though I do not know each of them in detail. If any particular function introduces additional complications, we can exclude it. Of course, we must apply stratification to aggregates as well. In cases where there are inference rules that generate additional profits, the rule above must be executed after those. However, there will be a dependency from the first rule to the second, and we must ensure there are no cycles in these dependencies — exactly as with negation-as-failure. From this perspective, I do not see any conceptual difference between negation-as-failure and aggregates. The above is just an initial idea. If we agree it is promising, we can investigate it further. As I mentioned, I believe rules should be allowed to infer aggregate values. Cheers |
Beta Was this translation helpful? Give feedback.
-
|
Hi All,
@livio, I understand your case now. You want to use those aggregates later in the graph for validation. Ideally, I still think that SHACL itself should have node expressions with aggregates to address your issue, since as you said, the data may be updated continuously and one would need to re-calculate the aggregates.
A small comment on the stratification for aggregates: here stratification is not an ideal word, since we may need a total order on the predicates. That is, predicates defined by means of aggregates cannot mutually depend on one another (regardless of whether the dependencies are positive or not).
My preferred approach would be to start from an abstract syntax for the rules and then introduce a concrete RDF-like syntax.
@andy, do you think I can work on some parts of the document: #637?
I would really prefer latex here, but I guess thats not possible :)
Best
Ognjen
… On 13 Nov 2025, at 14:20, liviorobaldo ***@***.***> wrote:
Hi Andy,
When would sanctions be removed?
Theoretically, never. Sanctions can be paid, but we can still keep them in the KB for record-keeping purposes. For example, a company may have received a sanction two years ago and subsequently paid it, but we would still want to retain a record of the event. We might decide to delete all data older than five years, though this would depend on the specific application.
Would it not be a separate graph and having a rule set execution without merging the inference graph outcome into the base graph
Alternatively, deletion can be handled by SPARQL Update.
This is another option. However, I don’t quite see how this relates to the decision about whether or not to allow aggregates.
two three things need to be clarified:
Which aggregates? There are some core ones like count, but also various statistical ones.
What about grouping (GROUP BY)?
What is being aggregated? Is it a body pattern expression (with no inner aggregates)? Or something less expressive?
Here is my proposal. Aggregates are allowed only within the function BIND.
We might introduce a two-parameter function SELECT as an argument of BIND. The first parameter of SELECT is a single aggregate function, while the second parameter is a knowledge graph with variables. For instance:
RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum)}
This rule calculates the total profits for each company. For example, if we have two companies, Livio and Andy, and the knowledge graph stores the following data:
:Livio a :Company.
:Andy a :Company.
:Livio :has-profit 100.
:Livio :has-profit 50.
:Andy :has-profit 200.
:Andy :has-profit 150.
The rule above generates the triples:
:Livio :has-total-profits 150.
:Andy :has-total-profits 350.
This rule is equivalent to the following SPARQL query:
SELECT ?company (SUM(?profit) AS ?totalProfit)
WHERE
{
?company a :Company ;
:has-profit ?profit .
}
GROUP BY ?company
Note the use of GROUP BY, since the aggregation is performed per company. To compute the sum of all profits across all companies, we could leave the variable unbound, for example:
RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?any :has-profit ?amount}) AS ?sum)}
The variable ?any is now unbound, so the sum is calculated over all possible values of ?any. This rule then produces the following triples:
:Livio :has-total-profits 500.
:Andy :has-total-profits 500.
We can also implement HAVING. The following query:
SELECT ?company (SUM(?profit) AS ?totalProfit)
WHERE
{
?company a :Company ;
:has-profit ?profit .
}
GROUP BY ?company
HAVING SUM(Amount)>200;
is equivalent to:
RULE{?x :has-total-profits ?sum.}
WHERE{?x a :Company. BIND(SELECT(SUM(?amount),{?x :has-profit ?amount}) AS ?sum) FILTER(?sum>200)}
Implementing aggregates in this way prevents nested queries (i.e., only one level of nesting is allowed, since we cannot nest a BIND inside a SELECT). Therefore, in response to your question 3, in my approach there are no inner aggregates — these would have to be implemented through two separate rules.
To implement aggregates in this way, the grammar should be modified as follows:
[24] Bind ::= 'BIND' '(' Expression 'AS' Var ')'
becomes:
[24] Bind ::= 'BIND' '(' (Expression | Aggregate) 'AS' Var ')'
where:
[*] Aggregate ::= 'SELECT' '(' COUNT|SUM|... '(' Var '),' TriplesTemplateBlock ')'
In response to your question 1, I think we could retain all aggregate functions, though I do not know each of them in detail. If any particular function introduces additional complications, we can exclude it.
Of course, we must apply stratification to aggregates as well. In cases where there are inference rules that generate additional profits, the rule above must be executed after those. However, there will be a dependency from the first rule to the second, and we must ensure there are no cycles in these dependencies — exactly as with negation-as-failure. From this perspective, I do not see any conceptual difference between negation-as-failure and aggregates.
The above is just an initial idea. If we agree it is promising, we can investigate it further. As I mentioned, I believe rules should be allowed to infer aggregate values.
Cheers
Livio
—
Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSORPMP46XTQP72A3LSL34SARPAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJVHA4TMNQ>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
Right - the rule containing an aggregate must not depend on a rule in the same stratum layer, only ones nears the base data. (I tend to thing of stratification as numbered layers with the base data being stratum 0.) A layer becomes "closed" after evaluation in that the output up to that point won't change. This is (I think) the same requirement as NAF; NAF is like aggregation and filter of Is there a proper name for this? A "strict stratification". The evaluation algorithm will can't be blind fixed point but each strata evaluation is monotonic can be evaluated as pure positive datalog over the graph output of the strata below it. An alternative to consider is the simpler (to define) "aggregate only over the base graph": just run the aggregates first. Currently, I think some form of "GROUP BY" is possible, although not necessary because the grouping can be via another rule. |
Beta Was this translation helpful? Give feedback.
-
|
I did some searching for terminology, mostly looking at aggregates in ASP (specifically in the DLV system), and there they used the term "aggregate-stratified.” People people familiar with ASP should be able to recognize it immediately. (https://www.ijcai.org/Proceedings/03/Papers/122.pdf)
In a short ISWC paper few years back, we used "strict stratification” (though I’m not aware of anyone using it afterwards) for a stricter version of stratification where a connection between strata can be only positive or only negative. This made validation in recursive SHACL tractable. (https://ceur-ws.org/Vol-2180/paper-11.pdf)
Best
Ognjen
…On 17 Nov 2025, at 17:03, Andy Seaborne ***@***.***> wrote:
A small comment on the stratification for aggregates: here stratification is not an ideal word, since we may need a total order on the predicates. That is, predicates defined by means of aggregates cannot mutually depend on one another (regardless of whether the dependencies are positive or not).
Right - the rule containing an aggregate must not depend on a rule in the same stratum layer, only ones nears the base data.
The stratification criterion is extended to include the pattern of the aggregation. (We could only aggregate very simple patterns - it pushes the complex expression into another rule so no lost of expressivity.)
(I tend to thing of stratification as numbered layers with the base data being stratum 0.)
A layer becomes "closed" after evaluation in that the output up to that point won't change.
This is (I think) the same requirement as NAF; NAF is like aggregation and filter of COUNT = 0
Is there a proper name for this? A "strict stratification".
The evaluation algorithm will can't be blind fixed point but each strata evaluation is monotonic can be evaluated as pure positive datalog over the graph output of the strata below it.
An alternative to consider is the simpler (to define) "aggregate only over the base graph": just run the aggregates first.
But if we have NAF, not limited to semi-positive datatlog, then its no necessary to only have the simpler version.
Currently, I think some form of "GROUP BY" is possible, although not necessary because the grouping can be via another rule.
—
Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSOQRXILJGHHG43MPMCT35HWT7AVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTIOJZGE2TANI>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
|
Dear both, I'm not enough expert about stratification, so I won't comment about that. I totally agree about:
That is essentially what I was proposing in my previous post: to allow a single aggregate per rule, and aggregates are permitted only through simple one-level if-then patterns. This avoids embedded rules and instead distributes complex computations across multiple rules when needed. I wonder instead about the syntax of the if-then rules inferring aggregates. Is the syntax I sketched in my previous post in the right direction? Anyway, we can talk about this during our bi-weekly meeting tomorrow. Have a nice evening |
Beta Was this translation helpful? Give feedback.
-
|
Regarding the infinite loops because of generated blank nodes (rule head with blank node), i was thinking if we can solve this by restricting it via stratification. If a triple with a blank node occurs in a head, it may only appear in rule bodies with a lower stratum. Best, |
Beta Was this translation helpful? Give feedback.
-
|
I would agree with Andy here. Stratification is a well established word in logic programming and in (computational) logic in general.
In my view, it’s just a matter of finding the right adjective to put in front of the word.
Best
Ognjen
… On 20 Nov 2025, at 15:42, Andy Seaborne ***@***.***> wrote:
On "strict stratification" other possible names - epochs? generations? FWIW, "Stratum" does sound right to me, having seen "stratigraphy" in research articles (between digital forensics and not-digital archaeology).
Tricky. "stratification" is the word used in datalog for the process.
—
Reply to this email directly, view it on GitHub <#637 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSORJJKKPJIVUNGOKAJD35XHNTAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMBSG42DMMQ>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
|
We did, in a way. Please note (as I mentioned in one of my previous emails) that "classical" stratification is not enough. We wanted to apply the same restrictions on the aggregates. So I guess it would be good to have a name for that.
As you suggested, a simple way to solve the problem is, as you said, to forbid cyclic dependencies on predicates that occur in the head of some rules with generated blank nodes (*).
Note that in the literature there exist more sophisticated conditions to ensure the so-called finiteness of the grounding. However, this is a rather complex problem (undecidable), and I believe condition (*) would be sufficient in practice.
Note that this would not cover cases like the following, where still there is a final grounding:
(from https://drops.dagstuhl.de/storage/00lipics/lipics-vol017-iclp2012/LIPIcs.ICLP.2012.323/LIPIcs.ICLP.2012.323.pdf)
:a :p :a.
RULE { _fX :p _gX } WHERE {?X :p ?X}.
I used letters f and g here to make it easier to think of them as values produced for every matching X (I hope I got the semantics right).
In that work, as a minimal restriction for finiteness, they consider “finite-domain programs,” which look at the predicate + variable position to define the dependency graph. I believe the same condition is implemented in DLV system (same authors)
On the other hand, if I understood correctly, blank nodes construct behave at least as "badly" as functional symbols that contain all variables from the body and perhaps one extra to ensure freshness.
For example:
RULE { _blank :p a } WHERE {?x :p ?y. ?y :r ?z. }.
is roughly equivalent to datalog:
p(f(x,y,z,clocktime()), a) :- p(x,y), r(y,z)
so “finite-domain programs” or “finite-domain rules” would fit our intentions.
Alternative names could be: domain-safe rules, safe-domain rules, or simply safe rules so we put aggregates there as well.
What do you think?
Best
Ognjen
… On 20 Nov 2025, at 15:38, Robert David ***@***.***> wrote:
Regarding the infinite loops because of generated blank nodes (rule head with blank node), i was thinking if we can solve this by restricting it via stratification. If a triple with a blank node occurs in a head, it may only appear in rule bodies with a lower stratum.
Not sure if this was discussed already.
Best,
Robert
—
Reply to this email directly, view it on GitHub <#637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZFSOQAKHCSUT2VOYKBZV335XG7RAVCNFSM6AAAAACLD5OAJ6VHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMBSG42DANA>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
|
Blank nodes in the head may be distracting from the necessary essence, especially as overall we have set semantics of RDF graphs. Evaluation (datalog-style, forwards; backwards may be less problematic) needs to know if a rule body outcome changes the solutions (by adding solutions - it can not retract solutions). Normally, this is observed by whether the head produces new triples to add to the inference graph. But it does not have to be that way - instead, keep the body outcomes and only produce the head triples, with fresh blank nodes, based on the solution sequence at the end of the layer (trying to use the terminology in evaluation draft - it's quite concrete focusing on the "how", not the design goals) does it once. (there are other ways to achieve the separation of evaluation from producing the blank nodes written in the head) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1: Introduction
Terminology from RDF Concepts, SPARQL Query
Namespaces - including
shnex,shrl:,sparql:,xsd:,rdf:Test suite
2: Shape Rules
Informative section (that is, non-normative) explaining the main features
Basic outline, key features, not all features.
Audience: rules engineer
What are "rules"? A set of conditions on the shape of the data that infer new information from existing data.
Apply rules: rules+data -> new information
A rule set is a collection of rules. Unit of execution.
An execution is rule set + data (base data)
2.1 Structure of a Rule
Head-Body: match body (the "if",) and generate new information ("triples")
Or as H is true when B
Body is a pattern, with value restriction.
2.2 Evalaution
"match body, get variables, use head as a template.
Makes new information available to other rules.
Execute until no change.
Operations - infer(), query()
A rules system can provide one or both of these operations.
2.3 Rule Dependencies
Rules can depend on rules.
Including recursion.
:ancestor:ancestorof:ancestor2.4 Rule Set Stratification
"We say that rule1 depends on rule2 if ..."
2.5 Negation-as-failure
Shape Rules also supports
2.6 Assignment
and why it is beyond datalog
"safe assignment" - previous stratification - is it enough?
3. Shape Rules Abstract Syntax
Normative.
This section is the formal definition of rules and rule set execution.
3.1 Well-formedness Condition for a rule body
FILTERs, Assignments conditions so variables are defined before use.
3.2 Dependency Relationship
Definition rule A deopends on rule B if ...
3.3 Stratification
Definition
... and well-formedness conditions (NAF and recursion)
NB EXISTS - with body pattern (a limited graph pattern)
4. Concrete Syntax forms for Shapes Rules
4.1 RDF Rules Syntax
4.2 Compact Rules Syntax
4.2.1 Compact Syntax Abbreviations
4.3 SPARQL function restrictions
No
BOUND,FILTER (NOT) EXISTS(available as a pattern)Explain pattern of EXISTS/NOT EXISTS is a body pattern
5. Shape Rules Evaluation
Define evaluation
Stratification.
Necessary if NAF.
6. Workspace named tuples
Possible addition. Tuples as space during execution. Avoids repeating pattern fragments.
Syntax:
TUPLE(name/string, varTerm1, varTerm2, ...)-- include this and maybe a short form.&name(varTerm1, varTerm2, ...)7. Attaching Rules to Shapes
Does (some) targets as patterns work for a definition?
Appendix A: Shape Rules Grammar
Appendix B: Relationship to SHACL-AF
Appendix C: Relationship to node expressions
Beta Was this translation helpful? Give feedback.
All reactions