Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Term Cleanup Feature #815

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open

Add a Term Cleanup Feature #815

wants to merge 15 commits into from

Conversation

dkotter
Copy link
Collaborator

@dkotter dkotter commented Oct 9, 2024

Description of the Change

This PR adds in a new Feature, Term Cleanup, that uses the power of embeddings, to compare all terms against each other and then recommending similar terms that could be merged together if desired. This comparison process can be run entirely in WordPress but once you reach around 500 terms, performance really starts to slow down. Similar to the Smart 404 Feature, support for Elasticsearch (using ElasticPress) has been added to significantly speed things up.

This Feature requires two services in order to function:

  1. A Provider that can generate embeddings (right now we support OpenAI and Azure OpenAI)
  2. A data storage engine that can also run vector queries (currently supports doing this in WordPress only or using Elasticsearch powered by ElasticPress)

Once the Feature is setup, you can start the comparison process which will send all terms off to the embeddings Provider to generate vector data. If configured, this data is then sent to Elasticsearch and vector queries are run there. If not configured, this data remains in WordPress only and comparisons are done there using that data.

Once the process is complete, you'll be given the opportunity to see all the results and decide which terms you want to merge together (or which terms you want to skip). This process can be re-run whenever needed (i.e. as new terms are added).

Closes #795

How to test the Change

  1. Enable the Feature with at least one taxonomy turned on
  2. Go to the Term Cleanup page and start the process for whatever taxonomy you want
  3. Ensure the process completes and results show
  4. Ensure you can merge and skip terms
  5. If desired, set up Elasticsearch/ElasticPress and test using that

Changelog Entry

Added - New Feature, Term Cleanup, which provides the ability to compare all terms from specific taxonomies and showing recommendations for similar terms that can be merged together

Credits

Props @dkotter, @iamdharmesh, @berkod

Checklist:

@dkotter dkotter added this to the 3.2.0 milestone Oct 9, 2024
@dkotter dkotter self-assigned this Oct 9, 2024
@dkotter
Copy link
Collaborator Author

dkotter commented Oct 9, 2024

@iamdharmesh I believe I have this feature fully moved over to ClassifAI but would appreciate if you could take a look since you're most familiar with how this functions. I moved the background processing to use Action Scheduler since we are already using that in ClassifAI but I believe everything else has been left the same.

Note @jeffpaul mentioned there are some UI changes we're hoping to make so we can either leave this PR until those are ready or can merge this in (once it looks good) and open a new PR for those changes, but leaving this in draft for now

iamdharmesh
iamdharmesh previously approved these changes Oct 10, 2024
Copy link
Member

@iamdharmesh iamdharmesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkotter Thanks for working on this. The PR looks great. I’ve fixed a few issues related to the in-progress job status, canceling the running process, and similarity score for database-based comparison. Please take a quick look at those additional changes when you have time. Otherwise, everything looks good.

@jeffpaul
Copy link
Member

I would prefer to get a quick UX review to see if there are some quick iterations to improve here, but if we're otherwise near ready to ship a major release then I'm fine merging this in to ride along there

@iamdharmesh
Copy link
Member

@jeffpaul @dkotter I worked on UX items from https://github.com/10up/classifai-cleanup/issues/7, https://github.com/10up/classifai-cleanup/issues/20, https://github.com/10up/classifai-cleanup/issues/19 and https://github.com/10up/classifai-cleanup/issues/18 and listed status on each below. Some of these need discussion.

image

  • Right now we show a Count field, which represents how many times this term has been used. I know this is common terminology in WordPress so maybe is fine but I think there could easily be some confusion here thinking this is the count of how many matching terms were found. Could maybe update this to Used: 621 times to be more clear?
  • Often terms don't have a description associated with them. I think we should remove this field if it's empty, rather than showing the Description: label and never having a description shown.
  • I'm assuming when we display results we have whatever score was calculated to determine the match. Could we show that score in these results? That way you know how high of a match it actually was? Something like Score: 79% or Match: 80%
  • Since we only have a single action (Skip) we could probably change the name of that column to Action or just remove that label all together
  • I have some confusion in regards to the merge buttons. At a glance, hard to know what each button does; which term is getting merged? What happens to the other? I think we can make this more clear by either updating the button text or adding some helper text (Need help with finalizing a better wording copy here, if we want to update this.)
  • More personal opinion but I think we could benefit from a slight bump across the board with font sizes. Things seem fairly small to me at the moment
  • In addition to the above, could also maybe use a slight bump in padding between things
  • UX - Adding a link to the posts-by-tag
  • UX - An option to sort the terms alphabetically (This is a quick enhancement but as we are showing all the information together in a column, we need to figure out how we can switch between sort by alphabetically and sort by count.)
  • Add ability to search for specific term results

Thanks

@dkotter
Copy link
Collaborator Author

dkotter commented Oct 11, 2024

I would prefer to get a quick UX review to see if there are some quick iterations to improve here, but if we're otherwise near ready to ship a major release then I'm fine merging this in to ride along there

Things look cleaner to me from the last updates made by @iamdharmesh but I agree could use a more formal review, so I'm fine holding this until then

@dkotter dkotter marked this pull request as ready for review October 11, 2024 18:55
@dkotter dkotter requested review from jeffpaul and a team as code owners October 11, 2024 18:55
@github-actions github-actions bot added the needs:code-review This requires code review. label Oct 11, 2024
…ng third parties to hook in and do things like logging
@github-actions github-actions bot added the needs:refresh This requires a refreshed PR to resolve. label Nov 19, 2024
Copy link

@dkotter thanks for the PR! Could you please rebase your PR on top of the latest changes in the base branch?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:code-review This requires code review. needs:refresh This requires a refreshed PR to resolve.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add ability to do a "Term Cleanup"
3 participants