Skip to content

Adding the complete architecture for search benchmarking #2740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 48 commits into from
Jul 16, 2025

Conversation

Rahban1
Copy link
Contributor

@Rahban1 Rahban1 commented Jun 11, 2025

this closes #2417, this still has a lot of work to do, I am just putting it out there so everybody can see and give their suggestions on it. this is nowhere completed. I am going to work on it, maybe end up changing a substantial amount but right now it is just a proof of concept. would love to get some feedback on it. what do you think could be improved and what do you guys have in mind???
Thank you

end

# Run the wrapper
result = read(`node -e $wrapper_js`, String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have a file, can't we just do node file.js?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we're replacing the placeholders with their values here

wrapper_js = replace(wrapper_js, "__SEARCH_INDEX__" => JSON.json(search_index_data))
wrapper_js = replace(wrapper_js, "__QUERY__" => "\"" * query * "\"")

Copy link
Member

@mortenpi mortenpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates the search.js, so it would be good to somehow avoid the duplication here by reusing the search.js in the tests. It would likely require some changes on how we handle the search.js file.

let index = new MiniSearch({
fields: ["title", "text"], // fields to index for full-text search
storeFields: ["location", "title", "text", "category", "page"], // fields to return with results
processTerm: (term) => {
let word = stopWords.has(term) ? null : term;
if (word) {
// custom trimmer that doesn't strip @ and !, which are used in julia macro and function names
word = word
.replace(/^[^a-zA-Z0-9@!]+/, "")
.replace(/[^a-zA-Z0-9@!]+$/, "");
word = word.toLowerCase();
}
return word ?? null;
},
// add . as a separator, because otherwise "title": "Documenter.Anchors.add!", would not
// find anything if searching for "add!", only for the entire qualification
tokenize: (string) => string.split(/[\s\-\.]+/),
// options which will be applied during the search
searchOptions: {
prefix: true,
boost: { title: 100 },
fuzzy: 2,
},
});
index.addAll(data);

@mortenpi mortenpi added Type: Tests Format: HTML Related to the default HTML output labels Jun 29, 2025
Copy link
Member

@mortenpi mortenpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would be happy to merge this as the first iteration if we could get the duplication & version number sorted as well!

write(io, wrapper_js)
close(io)
cd(@__DIR__) do
result = read(`node $path`, String)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this use the Node JLL? This means that the user wouldn't have to install Node just to run the benchmarks -- we can pull it in via the Julia package manager.

There's some example usage here for example: https://github.com/JuliaComputing/MultiDocumenter.jl/blob/e112da8b744f3393d037a7380e544797c2a41953/src/search/pagefind.jl#L31-L55

Comment on lines 257 to 262
- uses: actions/setup-node@v4
with:
node-version: '20.x'
- name: Install Node.js dependencies
run: npm install
working-directory: test/search
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we switch to the Node JLL, we can remove these.

Suggested change
- uses: actions/setup-node@v4
with:
node-version: '20.x'
- name: Install Node.js dependencies
run: npm install
working-directory: test/search

@mortenpi mortenpi added the Skip Changelog Allows the CHANGELOG.md check to pass without edit to the file. label Jul 13, 2025
@Rahban1
Copy link
Contributor Author

Rahban1 commented Jul 15, 2025

how's this implementation @mortenpi ?

Copy link
Member

@mortenpi mortenpi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rahban1 LGTM! I think we should work a bit on getting more test cases in etc. But I link this infra, so let's get this merged!

@mortenpi mortenpi merged commit de5d587 into JuliaDocs:master Jul 16, 2025
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Format: HTML Related to the default HTML output Skip Changelog Allows the CHANGELOG.md check to pass without edit to the file. Type: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add search benchmarks
3 participants