-
Notifications
You must be signed in to change notification settings - Fork 33
Open
Description
I've created a SentencePiece model using Python which results in a .model and a .vocab file. It is not possible to create a SentencePieceVocab from the later since Python does not seem to use protobuf but rather a plain text file. Here's an excerpt of my file:
<unk> 0
<s> 0
</s> 0
▁ -2.29038
s -3.10405
l -3.41047I didn't find an option in the Python code for creating a protobuf vocab file so I wrote a parser. Unless I'm mistaken and did something wrong, would you like that code as a PR? I.e. something like:
impl SentencePieceVocab {
...
pub fn from_vocab_txt_file(path: &str) -> Result<SentencePieceVocab, TokenizerError> {
...
}
}in rust-tokenizers/main/src/vocab/sentence_piece_vocab.rs
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels