Skip to content

Commit

Permalink
Update hit term extractor to use scores when considering hit terms (#…
Browse files Browse the repository at this point in the history
…2304)

documentation pt1

format

documentation pt2

format

set retry to 0 and add log for testing

wordsandscores test

testing

format

fix documentation and add new method

more tests, fix edge case, documentation

test, log, tostring

more methods, more tests

add getArrSize

pass skips on end of cq

squash me

squash me

working except for excerptTest

fix test

squash me

fix before/after

method not cq

upgrade retry

change start offset logic

quicker fail and retry

make scores output more user readable

change output score to 0-1

brackets around whole phrase and add override

scores, no scores, onebest excerpt

clean and test

ln for score and fix skippedword return

one best eps fix

remove google.sets from excerpt test

fix brackets around all phrases in excerpts

clean

change start to int

clean

return null when score is above 90000000

return longest word in brackets on hit in node

recommendations pt1

recommendations pt2

recommendations pt3

dont generate scored excerpt if we dont have to

only check scores of offsets in range

more recommendations

rename and comment and remove i==1

excerpt transform reccomendations

comment and clean transform

comment and clean excerpt iterator

the big clean pt1

add PhraseOffset instead of tuples and clean/update related tests

quicksave

some comments

excerpt iterator static and clean

excerpt iterator comments/naming/formatting

wordsandscores except todo

todo

better test

add recommendations

Co-authored-by: hgklohr <[email protected]>
  • Loading branch information
austin007008 and hgklohr authored Sep 12, 2024
1 parent a5d5a6e commit bbfdd3d
Show file tree
Hide file tree
Showing 14 changed files with 2,578 additions and 866 deletions.
Original file line number Diff line number Diff line change
@@ -1,22 +1,21 @@
package datawave.query.attributes;

import java.io.Serializable;
import java.util.Collection;
import java.util.Iterator;
import java.util.Map;
import java.util.Objects;
import java.util.Set;
import java.util.SortedMap;
import java.util.TreeMap;

import org.apache.commons.lang3.StringUtils;

import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonValue;
import com.google.common.collect.Multimap;

import datawave.query.Constants;
import datawave.query.jexl.JexlASTHelper;
import datawave.query.postprocessing.tf.PhraseIndexes;
import datawave.util.StringUtils;

/**
* Represents a set of fields that have been specified within an #EXCERPT_FIELDS function, as well as their corresponding target offsets that should be used to
Expand Down Expand Up @@ -51,7 +50,7 @@ public static ExcerptFields from(String string) {
return null;
}
// Strip whitespaces.
string = StringUtils.deleteWhitespace(string);
string = PhraseIndexes.whitespacePattern.matcher(string).replaceAll("");

if (string.isEmpty()) {
return new ExcerptFields();
Expand Down Expand Up @@ -202,9 +201,9 @@ public void deconstructFields() {
*/
public void expandFields(Multimap<String,String> model) {
SortedMap<String,SortedMap<Integer,String>> expandedMap = new TreeMap<>();
for (String field : fieldMap.keySet()) {
SortedMap<Integer,String> offset = fieldMap.get(field);
field = field.toUpperCase();
for (Map.Entry<String,SortedMap<Integer,String>> entry : fieldMap.entrySet()) {
String field = entry.getKey().toUpperCase();
SortedMap<Integer,String> offset = entry.getValue();
// Add the expanded fields.
if (model.containsKey(field)) {
for (String expandedField : model.get(field)) {
Expand Down

Large diffs are not rendered by default.

Loading

0 comments on commit bbfdd3d

Please sign in to comment.