You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry for replying you so late.
It's a naive filter for non-word. You can check the all characters. But words like "U.S.A." might be filtered. Usually it won't affect the outcome...
Hey I just wonder that why just only check the first character in word in Tokenizer.java?
public boolean ifWords_Eng(String tmpWord)
{
if (tmpWord.charAt(0)>='A' && tmpWord.charAt(0)<='Z') return true;
if (tmpWord.charAt(0)>='a' && tmpWord.charAt(0)<='z') return true;
return false;
}
The text was updated successfully, but these errors were encountered: