-
Notifications
You must be signed in to change notification settings - Fork 22
Add ability to find events by slide text & captions in search #1189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
a2adb5a
to
1c23f3c
Compare
1c23f3c
to
163a682
Compare
1c7e401
to
0661fd5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through the code and tested a bunch but didn't find any obvious issues.
I'll do a final round of testing today and then this can be merged. My comments are of no real concern and definitely no blockers.
This comment was marked as resolved.
This comment was marked as resolved.
e15a9cc
to
203793b
Compare
This comment was marked as resolved.
This comment was marked as resolved.
First of all it's good to see this in action, thanks. Some initial observations:
|
Do you mean "slide text" vs "captions"? Yes, currently both are treated as one thing. Is that different in your current portal?
Correct, which was like that already before. There will be some improvements there in an upcoming PR, like combining a series with the page listing only that series, as having these as two separate results is fairly useless.
That is also something I'm planning to do in the upcoming PR. I am not sure if I will succeed with that, as it requires clever design, but yeah: my goal is that it's clear at a glance whether I'm looking at a video (should should be most results), a series or something else.
Filters are of course planned already, and in fact some basic ones are already implemented. That feature is still hidden though, and will be reenabled with, you guessed it, my upcoming PR. Apart from that, I would expect most users to just specify more query terms. I can't imagine a scenario where someone wants to find a video that they just remember had "internet" in it. And thanks to the clever ranking, users can just add a bunch of query words that they think are relevant, and the result containing most of these words will be shown first. Not to say we don't want filters -- we do -- but these search engines make filters less necessary as just adding more search terms usually works out.
Mh I'm not sure I agree. That's typo tolerance in action. All videos by you (with an exact "schulte" match) are sorted before all other videos. So in my book that's exactly as it should be. And as last resort, you can always search by
Mh fair, the image seems useful. We don't always have an image though, especially for search results in captions, it might not be clear what to show. And: would you not show the extracted text at all then? I think it's useful.
So more like the design in your current video portal?
Not sure I understand. |
Weird coincidence, but it's similar to what I just said for Paella: looks like five different fonts for five different text elements. |
For reference, here's how Kaltura organises search results in the UZH video portal.
|
I agree that the UZH results are too open. Olaf's example where he was looking for "Schulte" and "Schule" also showed up in a video further down does not bother me.
Me too.
I agree. Additionally:
|
I spoke too quickly: how would you even fix this? The problematic case is when the matched text is long. I think there are only these options, none of which is optimal:
I guess I can somewhat improve the situation by making the image a bit larger still and reducing the max-width of the tooltip a bit (i.e. cutting more text off). But I don't think I can come up with a perfect solution. |
I'm strongly in favour of keeping the text extract also when there is a slide. Accessibility-wise and also for all other users the shown text can help with giving context.
I'm fine to leave it as is.
I don't get this point. Lets discuss it in our todays meeting |
Since the area for showing the text is made a bit smaller, we also use less context in the backend.
Okay, I tried to change what we discussed. However, I decided to have different "tooltip size" behavior depending on whether a preview image is available:
The fixed height and some other changes make hovering over parts of the timeline a lot smoother I think, i.e. less jumping around of the tooltip. Let me know what you think about the remaining width-flexibility for large text. As a second step, I added a faint icon to the lower bottom corner of the tooltip to show what kind of result we are talking about. Please also let me know what you think. |
Good catch, should be fixed now.
Is that a Lucide icon you found there? I could not find it anywhere. I checked Lucide and found these that somewhat match your suggestion: |
No, that was me searching Google. I like "letter-text". |
30b3363
to
18ae926
Compare
18ae926
to
823f971
Compare
This fix was a easier than anticipated (once I figured this out anyway). The problem is: giving the `WithTooltip` a `z-index`, means that parent element (containing both the trigger element and the tooltip), creates a new stacking context. And a stacking context and its children behaves as one unit in regards to other stacking contexts. So even setting the z-index of the tooltip itself to 100, two timelines are two different stacking contexts, so the 100 of the one tooltip is not rendered in front of the 4 of the trigger elements of another timeline. So there were two options: - Either introduce another div in the floating components that would have the trigger element as children, but not the tooltip, and that could get a z-index then. But that would have required changing appkit. - Remove the z-index from `WithTooltip` to avoid creating stacking contexts. First I thought the only way to do this is to remove the big clickable link area of the search results. And we might still do that, as it often brings annoying problems. But for this particular problem, the solution turned out easier than I thought.
Unfortunately, this required adding a new package. `react-icons` has a super old version of lucide icons, even in its newest version. Since almost all of our icons are from Lucide anyway, it makes sense to just use their packages directly.
823f971
to
cb4721e
Compare
Yes I noticed this too :/ I think it's a glitch in some other package and we can't do a whole lot about it. But we can still look into it again. We are currently still trying to fix some weird technical bug. But once that's done, this can be merged I think. |
Luckily, Ole noticed this weird error that only occurred sometimes. I was able to find a repro: on search page, click one video, go back (via browser), then hover over some text timeline and then click that item. I am not exactly sure what relay does wrong here, but also requesting the ID on the search page makes relay understand stuff, so that the video page is not rendered with the incomplete query from the search page.
Nice, so apart from the glitch Lukas and Olaf mentioned above everybody's happy and the technical bug also appears to be fixed. I think this is a great improvement. I'll merge this now 👍 |
Fixes #677
For testers
This PR does not contain any changes to the search page except adding this timeline. This is planned for later. This PR is already big enough.
Also note that the usefulness and the UX of this feature depends a lot on the available data! On our test instance, roughly 2500 events have OCR'ed slide texts, while only very few have subtitles. I will try to upload more videos soon to simply have more videos with subtitles available. Subtitle timespans are usually shorter (in the order of seconds or 10s), while the timespans associated with slide text can have durations of many minutes.
Questions/discussions
Search terms to get started
While testing myself I found a few good queries to get started. Of course, do try your own ones and also try prefixes of these to see how well it works. Also try multiple query words.
open
: big mixed bagmeilisearch
: finds two Tobira videos talking about Meilisearch (never mentioned in metadata)tycho
: finds the "Tycho crater" in the NASA moon video subtitles AND text detectioncrater
: lots of usages in the subtitles of the NASA moon videopyroxene
: finds the mineral in NASA moon subtitleselasticsearch
: lots of matches in Opencast-related videospostgres
: showing some videos with "postgres" in title first (makes sense) and only then once that only mention postgresvideoportal
: obvious Tobira videos, but also one unrelated video screen-sharing the old ETH video portal briefly and one mentioning "videoportal" in its slidesfeynman
: further down lots of videos just mentioning feynmanTechnical info
This PR has these main parts:
event_texts
(for storing all texts belonging to an event)event_texts_queue
and process to automatically download text assets from Opencast (this is ran as part of the worker)This can be mostly reviewed commit by commit. There are two times where I move a big chunk of code around that was added in a previous commit, but it should be fairly clear what and where.
Performance is kind of important for this one, since we are dealing with potentially lots of data. So far it seems like Meili responds within 25ms in all cases I tested. That's fine, but still a big increase from before. We should make sure that we don't accidentally introduce some slowness. Though right now I also have no idea how we would optimize further....
Something I want to improve in a follow up PR: replace the busy polling in the "download assets" and "update search index" workers by
LISTEN/NOTIFY
events from Postgres. Right now, both default to 30s or sth, which means that adding an event has quite a round trip (sync + 30s + 30s) before its text assets are searchable. That can be vastly reduced. But again, this PR is already big enough.