-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: Search all podcast episodes by term #132
Comments
Episode search by term comes up as a request a lot, but it's got a lot of problems. You'd be searching description and title text, which returns so much noise. There are about 160 million episodes in the Index so even searching for simple phrases like "european union" returns millions of episodes with no way to rank their relevance since we don't have first party data to give any weightings to anything. I'd love to provide such a thing if the community could help me design how it would be more than just noise. I'm open to ideas. |
I would naively suggest sorting by recency and only returning something like 30 results with a paging parameter. I personally don't understand the value of such a search function - I'm trying to get more user info on that. Looping in original issue author for insight - @muctebanesiri, does sorting by latest episode make sense to you? |
Let me share some screenshots of what I saw so maybe it would be more helpful.
|
In order to Index cost effectively (I would need to spin up a new search server) we would probably need to Index only a fraction of the full episodes table. If recency is a decent selector then maybe a rolling X number of months window so that we don't end up pulling in very old episode data? I think we would also need a really good stop word list. |
@daveajones A stop word list being a list with words to exclude from indexing like 'and'? (We want a good list, but at the same time, we don't want it to be too extensive.)
If we're implementing a timeframe anyway (and have a limited, recent dataset), might we sort based on relevance? Relevance being how often a word shows up in title & description of the episode and it's parent podcast combined. For those episodes with a transcript, it might be considered too (this would probably favour episodes with transcripts, as they have a higher chance of a match, but I would say that's a good thing, as a policy). For work I'm on a project that allows you to search organisations, but you cannot search on relevance (only creation date, most recent) and it's… not great.
@muctebanesiri Is that Podcini then? We might have a look where they're searching against (which data source). |
No, it's FocusPodcast a more recent fork. https://github.com/allentown521/FocusPodcast |
@keunes did some digging through the code... seems they're using iTunes search which allows searching by episode: https://github.com/allentown521/FocusPodcast/blob/main/app/src/main/java/allen/town/podcast/discovery/ItunesEpisodesSearcher.kt#L12 IIRC, AntennaPod also uses iTunes as a data source but only for the top podcasts list |
That's correct. Didn't know iTunes also enabled searching through episodes. |
I have looked over the API docs and found the ability to search for podcasts by term; however, AntennaPod has recently received a request to search for episodes by term across all podcasts – this is apparently a feature some other apps provide (presumably paid apps?)
In order to prevent crawling the entire API or spinning up an intermediate API to handle the indexing/caching required to satisfy this request, I would like to request the addition of an endpoint to search podcast-episodes by term.
If this feature already exists and I've simply overlooked the endpoint, apologies for wasting your time 🙏🏻
AntennaPod/AntennaPod#7486
The text was updated successfully, but these errors were encountered: