CLIPTokenizer uses 10**30 as `model_max_length`

### System Info

```
transformers==5.5.4
python3.12
```
vs
```
transformers==4.57.6
python3.12
```

### Who can help?

@ArthurZucker @Cyrilvallez

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

This script:

```python
from transformers import CLIPTokenizer

tok1 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer", local_files_only=True)

tok2 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer")
print(tok1.model_max_length, tok2.model_max_length)
assert tok1.model_max_length == tok2.model_max_length
```

1. empty your `~/.cache/huggingface/hub` cache folder
2. run with trasformers 4.57.6
3. observe the expected behavior (the model doesn't load)
4. empty your `~/.cache/huggingface/hub` cache folder
5. run with transformers 5.5.4
6. observe the tokenizer's stub being loaded without any exception

### Expected behavior


gives me a different behavior
- ` + transformers==4.57.6`:
```
Traceback (most recent call last):
  File "/home/oleg.konin/mlapi-glm/test.py", line 3, in <module>
    tok1 = CLIPTokenizer.from_pretrained("black-forest-labs/FLUX.1-Canny-dev", subfolder="tokenizer", local_files_only=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2113, in from_pretrained
    return cls._from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/tokenization_utils_base.py", line 2359, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/oleg.konin/mlapi-glm/.venv/lib/python3.12/site-packages/transformers/models/clip/tokenization_clip.py", line 306, in __init__
    with open(vocab_file, encoding="utf-8") as vocab_handle:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType
```
- ` + transformers==5.5.4`: 
```
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████| 705/705 [00:00<00:00, 4.32MB/s]
vocab.json: 1.06MB [00:00, 20.6MB/s]
merges.txt: 525kB [00:00, 7.86MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████████████| 588/588 [00:00<00:00, 6.32MB/s]
1000000000000000019884624838656 77
Traceback (most recent call last):
  File "/home/oleg.konin/mlapi-glm/test.py", line 7, in <module>
    assert tok1.model_max_length == tok2.model_max_length
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError
```

In the second case, I think an exception should be thrown, otherwise it makes you think that the model is already present on disk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIPTokenizer uses 10**30 as `model_max_length` #45538

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

CLIPTokenizer uses 10**30 as model_max_length #45538

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

CLIPTokenizer uses 10**30 as `model_max_length` #45538