Skip to content

fix: catch TypeError instead of ValueError in sent_tokenize#1375

Open
phoneee wants to merge 1 commit intoPyThaiNLP:devfrom
phoneee:fix/sent-tokenize-exception-type
Open

fix: catch TypeError instead of ValueError in sent_tokenize#1375
phoneee wants to merge 1 commit intoPyThaiNLP:devfrom
phoneee:fix/sent-tokenize-exception-type

Conversation

@phoneee
Copy link
Copy Markdown
Contributor

@phoneee phoneee commented Mar 29, 2026

What do these changes do

Fix sent_tokenize raising uncaught TypeError when list input contains non-string items — the code catches ValueError but str.join() raises TypeError

Fixes #1373

  • Passed code styles and structures
  • Passed code linting checks and unit test

@sonarqubecloud
Copy link
Copy Markdown

@bact bact added the bug bugs in the library label Mar 29, 2026
@bact bact added this to PyThaiNLP Mar 29, 2026
)
with self.assertRaises(ValueError):
sent_tokenize("ฉันไป กิน", engine="XX") # engine does not exist
# Reproduce: list with non-string items should return []
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is an expected behavior.

@wannaphong thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug bugs in the library

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

bug: word_detokenize and sent_tokenize crash on edge-case list inputs

2 participants