-
Notifications
You must be signed in to change notification settings - Fork 599
Open
Description
Please explain why code
kwp=KeywordProcessor()
kwp.add_keywords_from_list(['111 ', '(2)'])
print(kwp.get_all_keywords())
aa=' 111 (2) '
print(kwp.extract_keywords(aa))
returns [' 111 ', '(2)']
and almost the same code with '111 ' instead of '111' in kw list (extra space at the end):
kwp=KeywordProcessor()
kwp.add_keywords_from_list(['111 ', '(2)'])
print(kwp.get_all_keywords())
aa=' 111 (2) '
print(kwp.extract_keywords(aa))
returns only one keyword ['111 ']
Obviously '(2)' is still in keyword list and in string to search in but it was not found by the second snippet.
My guess is you use some kind of tokenization and i would like to synchronize token pattern between fasttext and my custom procedure which generates key words set.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels