System Info
transformers==5.5.4
python3.12
vs
transformers==4.57.6
python3.12
Who can help?
@ArthurZucker @Cyrilvallez
Information
Tasks
Reproduction
This script:
from transformers import CLIPTokenizer
tok1 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer", local_files_only=True)
tok2 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer")
print(tok1.model_max_length, tok2.model_max_length)
assert tok1.model_max_length == tok2.model_max_length
- empty your
~/.cache/huggingface/hub cache folder
- run with trasformers 4.57.6
- observe the expected behavior (the model doesn't load)
- empty your
~/.cache/huggingface/hub cache folder
- run with transformers 5.5.4
- observe the tokenizer's stub being loaded without any exception
Expected behavior
gives me a different behavior
Traceback (most recent call last):
File "/home/oleg.konin/mlapi-glm/test.py", line 3, in <module>
tok1 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer", local_files_only=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2113, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2359, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/models/clip/tokenization_clip.py", line 306, in __init__
with open(vocab_file, encoding="utf-8") as vocab_handle:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 705/705 [00:00<00:00, 4.32MB/s]
vocab.json: 1.06MB [00:00, 20.6MB/s]
merges.txt: 525kB [00:00, 7.86MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 6.32MB/s]
1000000000000000019884624838656 77
Traceback (most recent call last):
File "/home/oleg.konin/mlapi-glm/test.py", line 7, in <module>
assert tok1.model_max_length == tok2.model_max_length
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
In the second case, I think an exception should be thrown, otherwise it makes you think that the model is already present on disk.
System Info
vs
Who can help?
@ArthurZucker @Cyrilvallez
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
This script:
~/.cache/huggingface/hubcache folder~/.cache/huggingface/hubcache folderExpected behavior
gives me a different behavior
+ transformers==4.57.6:+ transformers==5.5.4:In the second case, I think an exception should be thrown, otherwise it makes you think that the model is already present on disk.