Skip to content

Use built in function to access special tokens and ids #16

@MarkusSagen

Description

@MarkusSagen

Current implementation for mapping the tokens to their ids caused some problems when there were new words containing "token" in them. Currently, we map from the vocab file all tokens containing the word token. However, for (at lease non-SentencePiece) tokenizers in Huggignface transformers, there are already two argmuments for this:

  • tokenizer.all_special_tokens
  • tokenizer.all_special_ids

Let's test and replace our implementation with the officially supported vocab arguments

def map_special_tokens_to_ids(

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or requestfeature requestNew feature or functionality wanted

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions