-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
Context
WDP content extraction DSL allows to specify content extraction rules in a JSON file and be able to update the rules dynamically without a need to alter the code of the WDP itself. The new DSL moves away from the legacy format and allows to modify content extraction rules in more intuitive and simpler way
WDP dependency uplift PR - brave/brave-core#30133
Comparison between old and new DSL formats
Here is the difference in specifying an extraction rule for a piece of data (search query and widget title in this case)
Old
"#search": {
"q": {
"item": "#rso",
"type": "searchQuery",
"etype": "data-async-context",
"keyName": "q",
"functionsApplied": [["splitF", "query:", 1]]
},
"widgetTitle": {
"item": "div.EfDVh.viOShc div.ofy7ae, div.EfDVh.viOShc table.torspo_view__table span.tsp-ht",
"type": "widgetTitle",
"etype": "textContent",
"keyName": "wt"
}
}
New
"#search": {
"first": {
"q": {
"select": "#rso",
"attr": "data-async-context",
"transform": [
[
"trySplit",
"query:",
1
],
[
"decodeURIComponent"
]
]
},
"wt": {
"select": "div.EfDVh.viOShc div.ofy7ae, div.EfDVh.viOShc table.torspo_view__table span.tsp-ht",
"attr": "textContent"
}
}
},
More information about the new DSL can be found in the specification
Testing guidelines for QA
You may test the new DSL by launching Brave Nightly starting with the version 1.80.123
and following the testing guidelines specified in this comment with the only difference being the name of the patterns which are now named patterns-v2.json