Skip to content

Web Discovery Project DSL update #47900

@yshym

Description

@yshym

Context

WDP content extraction DSL allows to specify content extraction rules in a JSON file and be able to update the rules dynamically without a need to alter the code of the WDP itself. The new DSL moves away from the legacy format and allows to modify content extraction rules in more intuitive and simpler way

WDP dependency uplift PR - brave/brave-core#30133

Comparison between old and new DSL formats

Here is the difference in specifying an extraction rule for a piece of data (search query and widget title in this case)

Old

"#search": {
  "q": {
    "item": "#rso",
    "type": "searchQuery",
    "etype": "data-async-context",
    "keyName": "q",
    "functionsApplied": [["splitF", "query:", 1]]
  },
  "widgetTitle": {
    "item": "div.EfDVh.viOShc div.ofy7ae, div.EfDVh.viOShc table.torspo_view__table span.tsp-ht",
    "type": "widgetTitle",
    "etype": "textContent",
    "keyName": "wt"
  }
}

New

"#search": {
  "first": {
    "q": {
      "select": "#rso",
      "attr": "data-async-context",
      "transform": [
        [
          "trySplit",
          "query:",
          1
        ],
        [
          "decodeURIComponent"
        ]
      ]
    },
    "wt": {
      "select": "div.EfDVh.viOShc div.ofy7ae, div.EfDVh.viOShc table.torspo_view__table span.tsp-ht",
      "attr": "textContent"
    }
  }  
},

More information about the new DSL can be found in the specification

Testing guidelines for QA

You may test the new DSL by launching Brave Nightly starting with the version 1.80.123 and following the testing guidelines specified in this comment with the only difference being the name of the patterns which are now named patterns-v2.json

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions