-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keywords search dropdown for Advanced Search #231
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! I'm not a Librivox dev, so feel free to ignore my comments, but I do contribute occasionally, so I thought I'd throw in my suggestions 🙂
I like the look of this auto-complete feature a lot, but I have a few small technical questions.
@@ -122,7 +122,7 @@ function advanced_title_search($params) | |||
$keyword_clause = ''; | |||
if (!empty($params['keywords'])) | |||
{ | |||
$keywords = explode(' ', $params['keywords']); //maybe preg_match if extra spaces cause trouble - thinnk we're ok | |||
$keywords = array($params['keywords']); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously, users could type in something like "duck sheep" and get items that match either "duck" or "sheep". I'm not sure why it was that way exactly, but I suspect it's because it was hard for users to know which tags were valid, so they probably went "duck ducky duckling ente pato ..." in the hopes of striking lucky.
Now, we'd only be matching items that had the literal tag "duck sheep", but I think this is a good change -- the auto-complete should make it way easier to discover tags. (Might need to label the field "Keyword" rather than "Keywords" though.)
It might be worth running past the admin team, if you haven't already :)
(Comma-separated tags or something would be awesome, but I don't think that the jQuery autocomplete supports it out of the box, so you'd have to do something like this if you wanted that.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run this change past the admin team. There didn't seem any point until I could convince myself that I could get it working. What HAS been run past the admin team, though, is another proposal I've currently got under consideration (it's a formal pull request now) that would see the keywords associated with every project displayed on that project's Catalog page. The keywords would be hyperlinked so that clicking a keywords entry immediately displays all the other projects sharing the same keywords term. As part of that change (when it was still a paper proposal) I suggested that the "keywords" terminology was inappropriate, and should be changed to "keyterm". I argued that calling an entry like "keywords" was confusing, and that "keyterm" was a more sensible description. That would leave the word "keyterms" (plural) to signify, say, "political fiction" AND "French revolution". In my proposal, I was going to alter "keywords" on the New Project template page to "keyterms". The administrators argued, I believe, that (a) matters are fine as they stand and (b) the New Project template is built around multilanguage labels, and if we were to change one label like "keywords" on that page, it would be necessary to pay someone to translate this into all the other languages we use for that page.
As for comma-separated tags, MAYBE that could be got to work with enough effort — but I'd prefer to take this in smaller steps. If this dropdown is accepted AND my pull request to display hyperlinked keywords for each project becomes a reality, I think it will become very much more feasible than it has been so far to conduct meaningful searches using keywords. I'm not suggesting this proposal is perfect — only that it's a lot better than what we have at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this dropdown is accepted AND my pull request to display hyperlinked keywords for each project becomes a reality, I think it will become very much more feasible than it has been so far to conduct meaningful searches using keywords.
Very much so, I look forward to seeing it happen!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not suggesting this proposal is perfect — only that it's a lot better than what we have at the moment.
Noted! If we ever get around to an auto-completing, multi-keyword-entering box, we can use that in the Project Template Generator and here in Advanced Search. Meanwhile, the visibility of what keywords we might search, straight from within the search form, is a big step forward.
Now, we'd only be matching items that had the literal tag "duck sheep", but I think this is a good change -- the auto-complete should make it way easier to discover tags. (Might need to label the field "Keyword" rather than "Keywords" though.)
This part also seems like at least a slight improvement overall, but comes with a trade-off. If there's a way to alleviate that down-side, great! Otherwise, we may take two steps forward,one step back, and then be happy about it.
The problem that this particular change would solve:
Searching by keyword for "American Revolution" currently shows keywords like "American Humor" and "French Revolution".
Yeah, that's bad. 😅
But, on the other hand:
Searching by keyword for "Winnie the Pooh" currently shows results tagged with "winnie", "winnie-the-pooh", "winnie the pooh", and simply "pooh".
This is good. Dear Pooh may be a silly old bear, but his nose will always lead him somewhere.
(Oh, and this same search will also show the one project with keyword "Thé". Excuse me while I make that not a thing, on live...)
A few projects will certainly have their keywords updated, as those become both more visible and more useful, but overhauling the keywords on our thousands of projects is actually even more work than most power-users would imagine, and is not in the scope of this PR.
If quoting a multi-word "keyword" to keep it together is easier for either of you "hotshots" than it is for me (I've not written so much new code as either of you!), or if you can think of another way to keep the best of both worlds, then I'm all ears. Otherwise, I'll certainly be here for the "running by" when we're sure we know our limitations. 😉
$query = $this->db->query($sql); | ||
return $query->result_array(); | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be a good idea to use the escaping methods provided by CodeIgniter for this: https://www.codeigniter.com/userguide3/database/queries.html#escaping-queries
It would look something like this:
public function autocomplete($term)
{
// Escaping -- https://www.codeigniter.com/userguide3/database/queries.html#escaping-queries
$escaped_term = $this->db->escape_like_str($term);
$sql = 'SELECT DISTINCT k.value
FROM keywords k
JOIN project_keywords pk ON k.id = pk.keyword_id
WHERE k.value like "' . $escaped_term . '%" ESCAPE \'!\'
AND pk.project_id IS NOT NULL
ORDER BY k.value ASC';
$query = $this->db->query($sql);
return $query->result_array();
}
This helps prevent nasty SQL injection attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keywords are currently being escaped already in Librivox_search.php, thus:
$keyword_clause = '';
if (!empty($params['keywords']))
{
$keywords = array($params['keywords']);
$keywords = array_map('trim', $keywords); // clean it up
$escaped_keywords = [];
foreach ($keywords as $keyword)
$escaped_keywords[] = $this->db->escape($keyword);
$in_keywords = implode(", ", $escaped_keywords);
Are you suggesting we need further protection beyond this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! I get it now. Did not include a fix for this in my latest update. Will take another look at this tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, once the keywords are put into the search, it's fine, but we're putting $term
directly into the auto-complete SQL itself, which could do bad things too 🙂
Here's the query again:
$sql = 'SELECT DISTINCT k.value
FROM keywords k
JOIN project_keywords pk
ON k.id = pk.keyword_id
WHERE k.value like "' . $term . '%"
ORDER BY k.value ASC';
$query = $this->db->query($sql);
return $query->result_array();
Having the ORDER BY
bit on a newline stumps my crappy hacking skills, but imagine someone unknowingly removes the newline, changing it to this:
$sql = 'SELECT DISTINCT k.value
FROM keywords k
JOIN project_keywords pk
ON k.id = pk.keyword_id
WHERE k.value like "' . $term . '%" ORDER BY k.value ASC';
$query = $this->db->query($sql);
return $query->result_array();
Now, you could put this in the keyword field and have it dump out all the email address in the Librivox database:
" UNION SELECT email AS value FROM users --
The full query becomes something like this:
SELECT DISTINCT k.value
FROM keywords k
JOIN project_keywords pk
ON k.id = pk.keyword_id
WHERE k.value like "" UNION SELECT email AS value FROM users -- %" ORDER BY k.value ASC
So yeah, I don't know how to make a hack like that work with the newline there, but I bet there's someone who can!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was too slow to comment haha, glad you got what I meant 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've never looked at how the auto-complete works before, but man am I glad you figured it out first hahaha
Seriously though, I think it would be good to write a little docstring to help other contributors in the future. Maybe something like this:
/**
* Set up auto-complete boxes using jQuery Autocomplete.
*
* Input elements need to define the following attributes:
*
* - data-search_func - The backend ajax function to call
* - data-search_field - The name of the field
* - data-search_area - Which area on the page is being auto-completed
* - data-array_index - Used when you've got multiple inputs in one area (e.g. multiple authors)
*
* In the pages where this script is included, you'll also need to define two
* functions:
*
* - autocomplete_assign_vars - Maps backend results to values to show in the drop-down
* - autocomplete_assign_elements - Handles updating inputs when a value is selected
*
* See also: https://api.jqueryui.com/autocomplete/
*/
function set_autocomplete() {
...
}
Which also leads me into the next question -- why not follow how the others are doing it, and define the two javascript functions in advanced_search.php
rather than modify this file? I know it's ugly, but better to have consistent ugly things rather than special cases. For example, you could drop something like this in:
<!-- I've mostly stolen this from section_compiler/index.js -->
<script type="text/javascript">
function autocomplete_assign_vars(item) {
return item.value;
}
function autocomplete_assign_elements(search_area, ui, array_index) {
switch (search_area) {
case 'keywords':
document.getElementById("keywords").value = ui.item.label;
break;
}
}
</script>
<script type="text/javascript" src="https://librivox.org/js/libs/jquery-1.8.2.js?v=1710057521"></script>
<script type="text/javascript" src="https://librivox.org/js/libs/jquery.validate.js?v=1710057521"></script>
<script type="text/javascript" src="https://librivox.org/js/libs/jquery-ui-1.8.24.custom.min.js?v=1710057521"></script>
<script type="text/javascript" src="https://librivox.org/js/common/autocomplete.js?v=1710057521"></script>
<!-- Remove the `/project_launch/index.js` import, it defines it's own autocomplete methods that clobber ours -->
<!-- <script type="text/javascript" src="https://librivox.org/js/public/project_launch/index.js?v=1710057521"></script> -->
<script type="text/javascript" src="https://librivox.org/js/common/jquery.tagsinput.min.js?v=1710057521"></script>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much. These all strike me as excellent suggestions. I've implemented them on my development system and they work fine.
@@ -47,7 +49,7 @@ | |||
<div class="control-group"> | |||
<div class="controls center"> | |||
<label for="keywords" ><span class="span2">Keywords:</span> | |||
<input type="text" class="span4" id="keywords" name="keywords" value="<?= htmlspecialchars($advanced_search_form['keywords']) ?>"/> | |||
<input type="text" name="keywords" value="" id="keywords" class="autocomplete" data-search_func="autocomplete_keywords" data-search_field="keywords" data-search_area="keywords" data-array_index="0" style="float:none; vertical-align:middle;margin-top:0px; width: 356px; max-width:500px; margin-left:0; " /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: What are the styles for? E.g. why have a specific width/max width?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemed to be an essential change when I first worked through this, as changing the class of the field (required to make autocomplete work) appeared to be changing the layout. However, as I've now implemented your approach on my local environment, it seems I thankfully don't need this local styling anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, awesome :) Unlike IDs, an HTML element can have more than one class, so if span4
is important for the layout, then the input can have both classes:
<input class="span4 autocomplete" ... />
But if you're happy with how it is without the span4
, then I wouldn't change it!
<script type="text/javascript" src="https://librivox.org/js/libs/jquery-ui-1.8.24.custom.min.js?v=1710057521"></script> | ||
<script type="text/javascript" src="https://librivox.org/js/common/autocomplete.js?v=1710057521"></script> | ||
<script type="text/javascript" src="https://librivox.org/js/public/project_launch/index.js?v=1710057521"></script> | ||
<script type="text/javascript" src="https://librivox.org/js/common/jquery.tagsinput.min.js?v=1710057521"></script> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Are all of these needed? E.g. https://librivox.org/js/public/project_launch/index.js?v=1710057521 seems suspicious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did check the effect of removing each one of these individually to see if they were all needed, and had come to the conclusion that they were. In some cases the only side-effect of not including one of these scripts was to cause an exception to be raised and noted in the Javascript console, while the new dropdown function continued to work fine. However, I decided it would be better not to be implementing code that was going to be causing these exceptions to be thrown.
That said, it turns out there was either something wrong with my checking process, or the new way of implementing this that you have suggested does now completely obviate the need for the script you have highlighted. Without it, the dropdown is working fine on my local machine, and not causing exceptions to be thrown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, makes sense! And yeah, if the autocomplete_assign_vars
or autocomplete_assign_elements
functions weren't set, then you'd get errors as soon as the autocomplete code ran. Bringing in those scripts would set them, which would stop the errors 🙂
application/models/Keyword_model.php
Outdated
FROM keywords k | ||
JOIN project_keywords pk | ||
ON k.id = pk.keyword_id | ||
WHERE k.value like "' . $term . '%" AND pk.project_id IS NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q: Is the AND pk.project_id IS NOT NULL
significant here? This isn't a LEFT JOIN
, so unmatched rows should be discarded I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. As you can see, I'm no hotshot developer! (Much more a Librivox reader than a Librivox techie...)
It may take me some little time as I have a few social commitments, but I expect I'll be updating my pull request to incorporate most of the changes you've suggested within the next 24 hours.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no hotshot developer!
Nah, you've done well! The auto-complete stuff is tricky to put together and you got there. The main thing is that you've made it happen, and I'm looking forward to using it.
Much more a Librivox reader
Well, you're a long way ahead of my score of 2 sections that's for sure 😉
91a3a41
to
4f0a33e
Compare
Have made a number of changes now in the light of garethsime's very helpful suggestions. |
4f0a33e
to
373a18f
Compare
Have just added keywords escaping, as suggested by Gareth. Thank you again, Gareth. Bit of a dimwit you're dealing with here! |
Also, I know this is completely off-topic for the PR, but I'm curious to know how your local dev setup works? There was a fair amount of discussion a while back on how best to support other devs wanting to make contributions, so I'm always interested in hearing how people went with the setup and what they ended up doing, if you can spare the time 🙂 |
You're as much a Librivox dev as any of us - as I've mentioned before in various places, the original author has long since moved on, so we're all trying our best to figure out the code base and make incremental improvements, hopefully without breaking anything. Your suggestions and comments are definitely valid, and I welcome your reviews of any PR. |
373a18f
to
a759c3b
Compare
Does strike me as possibly overkill, but I've now parameterised the query that looks up keywords in use, following a comment from notartom (which comment has now been deleted, I think?) |
Have looked at my old notes on how I did this the first time on a Mac running Ubuntu under Parallels Desktop. Was able to pare down the necessary instructions to a fairly simple procedure, which I was able to verify worked correctly on a second Mac of mine running a similar setup this afternoon. I have posted those instructions in the place where the original discussion took place. |
Thé IS a thing in French (tea). to a quick check before you delete that.
…On Wed, 19 Jun 2024 at 8:48 AM, redrun45 ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In application/libraries/Librivox_search.php
<#231 (comment)>
:
> @@ -122,7 +122,7 @@ function advanced_title_search($params)
$keyword_clause = '';
if (!empty($params['keywords']))
{
- $keywords = explode(' ', $params['keywords']); //maybe preg_match if extra spaces cause trouble - thinnk we're ok
+ $keywords = array($params['keywords']);
I'm not suggesting this proposal is perfect — only that it's a lot better
than what we have at the moment.
Noted! If we ever get around to an auto-completing, multi-keyword-entering
box, we can use that in the Project Template Generator *and* here in
Advanced Search. Meanwhile, the visibility of what keywords we might
search, straight from within the search form, is a big step forward.
Now, we'd only be matching items that had the literal tag "duck sheep",
but I think this is a good change -- the auto-complete should make it way
easier to discover tags. (Might need to label the field "Keyword" rather
than "Keywords" though.)
*This part* also seems like at least a slight improvement overall, but
comes with a trade-off. If there's a way to alleviate that down-side,
great! Otherwise, we may take two steps forward,one step back, and then be
happy about it.
*The problem that this particular change would solve:*
Searching by keyword for "American Revolution" currently shows keywords
like "American *Humor*" and "*French* Revolution".
Yeah, that's bad. 😅
*But, on the other hand:*
Searching by keyword for "Winnie the Pooh" currently shows results tagged
with "winnie", "winnie-the-pooh", "winnie the pooh", and simply "pooh".
This is *good*. Dear Pooh may be a silly old bear, but his nose will
always lead him somewhere.
(Oh, and this same search will also show the one project with keyword
"Thé". Excuse me while I make that *not a thing*, on live...)
A few projects will certainly have their keywords updated, as those become
both more visible and more useful, but overhauling the keywords on our
thousands of projects is actually even *more* work than most power-users
would imagine, and is not in the scope of this PR.
If quoting a multi-word "keyword" to keep it together is easier for either
of you "hotshots" than it is for me (*I've* not written so much new code
as either of you!), or if you can think of another way to keep the best of
both worlds, then I'm all ears. Otherwise, I'll certainly be here for the
"running by" when we're sure we know our limitations. 😉
—
Reply to this email directly, view it on GitHub
<#231 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APUJ53HOQIHSBGGCGUXSUW3ZIC2NHAVCNFSM6AAAAABJJXQLB6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCMRWGY4TCNJVG4>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Thanks, that's good to know! But here, it was short for 'Thénardier', having been truncated by accident. |
Very good! Yes, sadly, exposing keywords is also going to expose keyblunders — there’s no denying that. I personally believe, though,
the upside outweighs the downside.
… On 19 Jun 2024, at 9:44 AM, redrun45 ***@***.***> wrote:
Thé IS a thing in French (tea). to a quick check before you delete that.
Thanks, that's good to know! But here, it was short for 'Thénardier', having been truncated by accident.
—
Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53B4INR2TI2CAWUUCYTZIDA6VAVCNFSM6AAAAABJJXQLB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZXGI3DANJYGM>.
You are receiving this because you authored the thread.
|
Agreed there! When keywords are more useful (as they will be, thanks to your changes here and elsewhere!), the obviously incorrect ones will be corrected over time. But since we can't search the catalog by description, and since everything else has a very finite set of "correct" values, the keywords will remain the only "free-form" discoverability for a project. Since #225 will cover the "exact match" use-case, I'd prefer we still at least have the option of a fuzzy keyword search. I suppose the direction I'm going is: could we do something like the "Exact match" check-box for keywords in Advanced Search? Would that be easier than a multi-keyword-autocomplete box? |
I think an "Exact match" checkbox would be a backward step. At the moment, my code is matching all terms that begin with what the user has entered. In practice, this means that if the user enters, say, "french revolution", that term is going to show up as the first hit, but the user will see other hits as well. In effect, my current code is delivering an exact match (if there is one) but also partially fuzzy matches. Let me run past you an approach that you might find an improvement on how I've done this at the moment. Imagine a scenario where the user has entered the search term "roman" and this causes the following SQL to be executed:
I know the "limit" value here is not what we actually use, but just for the purposes of illustration right now we need a high value for "limit" ito see the effect I'm after. (Of course we would also need to strip out the "priority" values from what the SQL is returning — I'm not suggesting we're going to display those to our users). The point is, we can separately display what are, in effect, two lists in one return set, one of which lists is "fuzzier" than the other, while in effect documenting what we're showing the user within the result set itself. Why do this, and not just present the user with a "fully fuzzy" result set? Well, try it using the same example:
It seems to me the result set is much less helpful. As for the
Well, try running the same SQL but with that conditions stripped out. Turns out there really are some pretty messy entries in our keywords table that need to be rationalised. Down the track, it might be possible to write a utility page to help with that task — but for now, it might be best to hide some easily excludable bad examples from our users. |
Just to be sure we're on the same page: The part I suggest making optional is this change in
When the user intends to search for projects tagged with a particular multi-word keyword, which they've selected from the autocomplete suggestion list... then doing a fuzzy search for projects with other keywords would indeed be a bug! But if a user intends to do a broader search across multiple relevant keywords, that fuzzy matching is the best we have at the moment. Ok! Suggestion made. It is definitely more work, even if a check-box is the simplest thing I know to ask for. Let me know what you think. |
Thanks for this clarification.
Can I ask if you’ve actually had a chance to play around in a system that has implemented this PR to see how quickly and easily it helps you find stuff you might be interested in? While obviously I don’t know this for sure, I can’t help wondering if your request here is based on an old way of doing things that there may not be a lot of point in clinging to if this PR and my other PR about exposing keywords on project catalog pages do make it into production. Would it REALLY be that hard to find what you’re looking for using this new approach? (My bet is that you would actually find this new way ever so much faster and easier — but then, perhaps you have played around with this for some time, as I’ve implemented it, and you really don’t find this new approach all that great?)
To be clear, you appear to be suggesting an approach that would see the autocomplete behaviour I’ve enabled “turned off” at the selection of a checkbox. You’ve not suggested what succinct form of words you’d propose to use as a label for this checkbox that is going to make clear to users what to enter in order to get what results.
Personally, I see very little value in going to all the trouble to implement the following, given how easy keywords searching is going to be without it anyway, but I can picture a “solution” that might satisfying what I personally think is an over-the-top requirement where instead of a checkbox we had a dropdown that included the following two options for keywords searching:
“Click a term matching your entry”
“Type comma-separated list”
The first of these would be the default, and would result in the autocomplete dropdown behaviour I have implemented. If someone chose from the dropdown “political fiction”, they would see in their search results only projects that had “political fiction” as a keywords entry. They would not see all projects which have “fiction” as a keywords entry.
The second would allow a user to type, say,
‘french, france, paris, french revolution, white hoods of Paris, winnie-the-pooh, pooh bear’ in the same (now non-autocompleting) field
and see a search result set that included all projects that included any one of those literal comma-separated terms. It would NOT, however, include a project with the keywords ‘revolution’ or ‘bear’.
I can see that this could be technically possible — but really, is this necessary now, with all the required JavaScript and so on? Have you really found the approach I’ve implemented so frustrating and difficult to use in practice?
Out of interest, I did a check and found that of the 29,162 entries in the scrubbed database keywords table, nearly half are terms which contain a space not in the first position. It is currently impossible to conduct a straightforward keywords search using any of these terms without getting a host of what I regard as spurious results and unexpected results. It does seem to me that if the person who designed this system in the first place had intended the keyword search to operate as it does, I would have expected them to have prevented the use of space characters in keywords at the time they are first entered in the system. There’s certainly nothing in the current user interface that gives a user any clue that searching for “political fiction” is going to return every project that has the keywords “fiction”, and treating a space character as a list item separator is pretty unusual, in my experience. Given all the above, I don’t think it’s unreasonable to describe the current implementation as suffering from a bug.
… On 23 Jun 2024, at 1:46 AM, redrun45 ***@***.***> wrote:
Just to be sure we're on the same page:
I'd suggest a check-box for the actual search, not necessarily for making the keyword autocomplete suggestions more fuzzy. You make good points on autocomplete. The "prioritized" version looks excellent from where I sit, too. 😉
The part I suggest making optional is this change in Librivox_search.php. The opening comment for this PR describes that change as:
Fixing a bug in the catalog's Advanced Search function that currently results in the generation of an incorrect list of results if a user enters a multiterm keywords entry such as "political fiction" or "french revolution"
When the user intends to search for projects tagged with a particular multi-word keyword, which they've selected from the autocomplete suggestion list... then doing a fuzzy search for projects with other keywords would indeed be a bug!
But if a user intends to do a broader search across multiple relevant keywords, that fuzzy matching is the best we have at the moment.
Let's say I want to study the history of France in a particular period. I select the Genre as "*Non-Fiction -> History -> Early Modern". Then instead of selecting just one keyword from a list of suggestions, I check or un-check a box, and then enter a collection of words that might appear in any number of relevant keywords. Say, "french", "france", "paris".
Would it be nice if we could pick several multi-word terms like "french revolution" and "white hoods of paris"? Absolutely! But if we don't (yet) have that, we might want to keep this (otherwise buggy) code as an alternate mode.
Ok! Suggestion made. It is definitely more work, even if a check-box is the simplest thing I know to ask for. Let me know what you think.
—
Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53AJDA7VP3UYCRWJ5MTZIWL4DAVCNFSM6AAAAABJJXQLB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUGA3TQMRVGM>.
You are receiving this because you authored the thread.
|
Fair enough! I'm not going to argue. I only intended to let you know of an objection I thought would likely be raised. I'll make sure folks know this PR is ready for review as-is. |
Thank you! Can I suggest you pause any further review until I have implemented the kind of prioritised list I foreshadowed with that revised SQL? So far, that’s not yet part of this PR. I’d like to add it. Should be done within 48 hours, I expect.
… On 23 Jun 2024, at 10:40 AM, redrun45 ***@***.***> wrote:
Given all the above, I don’t think it’s unreasonable to describe the current implementation as suffering from a bug.
Fair enough! I'm not going to argue. I only intended to let you know of an objection I thought would likely be raised. I'll make sure folks know this PR is ready for review as-is.
—
Reply to this email directly, view it on GitHub <#231 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APUJ53EORSDABAMF6RRRDLTZIYKOFAVCNFSM6AAAAABJJXQLB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBUGI3TANZRGU>.
You are receiving this because you authored the thread.
|
2e314e1
to
0f63a86
Compare
Have now changed SQL so the list of keywords presented in the dropdown shows, as first priority, all keywords that START WITH the letters the user has typed, then a text separator indicating that the entries below the separator are some terms that INCLUDE the letters the user has typed, then as last priority terms not already presented that INCLUDE the letter sequence the user has typed. The maximum number of entries shown in the dropdown is 200. I did NOT implement a change I foreshadowed above which would have excluded entries including ";" or "," characters. The query term is now both "like-suitable" escaped and parameterised. To get a glimpse of the kind of tidying up work that may lie ahead (if anyone is brave enough to take it on) trying typing "libriv" in the keywords search field on a system where this change has been implemented. |
I didn't look to close until now, in case you intended to make that foreshadowed change. 😉 I was going to suggest adding I had assumed the unused keywords were deleted, so didn't even think to check that, for all my "hammering". I'll add another suggestion to #225. 😅 |
This pull request proposes:
(a) Fixing a bug in the catalog's Advanced Search function that currently results in the generation of an incorrect list of results if a user enters a multiterm keywords entry such as "political fiction" or "french revolution"
(b) Adding an autocomplete style dropdown list to the keywords field of the catalog Advanced Search
I believe implementing this dropdown will make it much easier than it is presently for users prepared to try our Advanced Search function to find an audiobook that may match a specific subject or genre interest.
Anyone reviewing this code proposal will notice that the SQL used to populate the keywords dropdown list does a look up of the project_keywords table. The purpose of this lookup is to avoid listing in the keywords dropdown keywords which are, in effect, orphans — that is to say, terms that appear in the keywords table, but are never referenced in the project_keywords table. In the scrubbed database, there are currently 2928 such terms. They can be viewed by running the following SQL against the scrubbed database:
SELECT DISTINCT k.value, k.id
FROM keywords k
where k.id NOT IN (SELECT DISTINCT pk.keyword_id from project_keywords pk)
ORDER BY k.id ASC;