Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Search all podcast episodes by term #132

Open
TSampley opened this issue Nov 7, 2024 · 8 comments
Open

Feature request: Search all podcast episodes by term #132

TSampley opened this issue Nov 7, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request question Further information is requested

Comments

@TSampley
Copy link

TSampley commented Nov 7, 2024

I have looked over the API docs and found the ability to search for podcasts by term; however, AntennaPod has recently received a request to search for episodes by term across all podcasts – this is apparently a feature some other apps provide (presumably paid apps?)

In order to prevent crawling the entire API or spinning up an intermediate API to handle the indexing/caching required to satisfy this request, I would like to request the addition of an endpoint to search podcast-episodes by term.

If this feature already exists and I've simply overlooked the endpoint, apologies for wasting your time 🙏🏻

AntennaPod/AntennaPod#7486

@daveajones
Copy link
Contributor

Episode search by term comes up as a request a lot, but it's got a lot of problems. You'd be searching description and title text, which returns so much noise. There are about 160 million episodes in the Index so even searching for simple phrases like "european union" returns millions of episodes with no way to rank their relevance since we don't have first party data to give any weightings to anything.

I'd love to provide such a thing if the community could help me design how it would be more than just noise. I'm open to ideas.

@daveajones daveajones self-assigned this Nov 7, 2024
@daveajones daveajones added enhancement New feature or request question Further information is requested labels Nov 7, 2024
@TSampley
Copy link
Author

TSampley commented Nov 7, 2024

I would naively suggest sorting by recency and only returning something like 30 results with a paging parameter. I personally don't understand the value of such a search function - I'm trying to get more user info on that.

Looping in original issue author for insight - @muctebanesiri, does sorting by latest episode make sense to you?

@muctebanesiri
Copy link

I would naively suggest sorting by recency and only returning something like 30 results with a paging parameter. I personally don't understand the value of such a search function - I'm trying to get more user info on that.

Looping in original issue author for insight - @muctebanesiri, does sorting by latest episode make sense to you?

Let me share some screenshots of what I saw so maybe it would be more helpful.

Imagepipe_0
This is a paid app as you correctly guessed with the feature in action. Episodes are not sorted by recency but I think sorting by recency would be more helpful.

Imagepipe_1
The second app with the exact feature (even the sorting is identical) is actually a fork of AntennaPod.

@daveajones
Copy link
Contributor

In order to Index cost effectively (I would need to spin up a new search server) we would probably need to Index only a fraction of the full episodes table. If recency is a decent selector then maybe a rolling X number of months window so that we don't end up pulling in very old episode data?

I think we would also need a really good stop word list.

@keunes
Copy link

keunes commented Nov 8, 2024

@daveajones A stop word list being a list with words to exclude from indexing like 'and'?
Here's a good starting point: https://www.ranks.nl/stopwords I'll send them an email to ask if we can use it.

(We want a good list, but at the same time, we don't want it to be too extensive.)

no way to rank their relevance since we don't have first party data to give any weightings to anything

If we're implementing a timeframe anyway (and have a limited, recent dataset), might we sort based on relevance? Relevance being how often a word shows up in title & description of the episode and it's parent podcast combined. For those episodes with a transcript, it might be considered too (this would probably favour episodes with transcripts, as they have a higher chance of a match, but I would say that's a good thing, as a policy).
When, in future, there's a more reliable 'popularity' indication at podcast level (which we're also interested in for AntennaPod, but that's for a later discussion), that could be considered also, to adjust the weights of the episodes.

For work I'm on a project that allows you to search organisations, but you cannot search on relevance (only creation date, most recent) and it's… not great.

The second app with the exact feature (even the sorting is identical) is actually a fork of AntennaPod.

@muctebanesiri Is that Podcini then? We might have a look where they're searching against (which data source).

@muctebanesiri
Copy link

@muctebanesiri Is that Podcini then? We might have a look where they're searching against (which data source).

No, it's FocusPodcast a more recent fork. https://github.com/allentown521/FocusPodcast

@TSampley
Copy link
Author

TSampley commented Nov 8, 2024

@keunes did some digging through the code... seems they're using iTunes search which allows searching by episode: https://github.com/allentown521/FocusPodcast/blob/main/app/src/main/java/allen/town/podcast/discovery/ItunesEpisodesSearcher.kt#L12

IIRC, AntennaPod also uses iTunes as a data source but only for the top podcasts list

@keunes
Copy link

keunes commented Nov 9, 2024

IIRC, AntennaPod also uses iTunes as a data source but only for the top podcasts list

That's correct. Didn't know iTunes also enabled searching through episodes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants