Skip to content

search stemming is overzealous #1563

@alexduryee

Description

@alexduryee

Following up from the November 4 community call, there was discussion around search term stemming in Solr, and how it's currently too aggressive. Users have found that the following terms are getting buried due to stemming:

  • eugenics matches eugene
  • organs matches organization
    There's no way for the user to search exact terms without stemming, since quotation marks only group phrases and won't bypass stemming.

Duke's approach to this was to include an unstemmed index field (https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/main/solr/arclight/conf/solrconfig.xml#L133), which is weighted above the stemmed ones.

Questions to discuss:

  • Do alternative Solr stemmers provide a better search experience?
  • Does Duke's approach meet user expectations? Are terms still being buried?
  • How important is exact-term searching via quotation marks? Can that be implemented?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions