Skip to content

Exception in training (models/matcher.py) #798

@kianwei96

Description

@kianwei96

Search before asking

  • I have searched the RF-DETR issues and found no similar bug report.

Bug

Just happened midway through a long, stable training run, using the nano model configuration.

  File "C:\Users\_\repos\_\train_det_rfdetr.py", line 53, in main
    model.train(
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\detr.py", line 105, in train
    self.train_from_config(config, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\detr.py", line 281, in train_from_config
    self.model.train(
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\main.py", line 416, in train
    train_stats = train_one_epoch(
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\engine.py", line 187, in train_one_epoch
    loss_dict = criterion(outputs, new_targets)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\models\lwdetr.py", line 706, in forward
    indices = self.matcher(outputs_without_aux, targets, group_detr=group_detr)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\models\matcher.py", line 186, in forward
    indices_g = [linear_sum_assignment(c[i]) for i, c in enumerate(C_g.split(sizes, -1))]
  File "C:\Users\_\miniconda3\envs\_\Lib\site-packages\rfdetr\models\matcher.py", line 186, in <listcomp>
    indices_g = [linear_sum_assignment(c[i]) for i, c in enumerate(C_g.split(sizes, -1))]

ValueError: matrix contains invalid numeric entries

Environment

using rfdetr==1.5.2, torch==2.10.0+cu130, on Python 3.11, Windows 11, RTX5090.

Minimal Reproducible Example

model = RFDETRNano()
model.train(
    dataset_dir=[     ],
    epochs=150,
    batch_size=16,
    grad_accum_steps=4,
    lr=config['learning_rate'],
    resolution=384,
    output_dir=[     ],
    early_stopping=True,
    early_stopping_patience=20,
    device='cuda',
    wandb=True,
    project=[     ],
)

Additional

No response

Are you willing to submit a PR?

  • Yes, I'd like to help by submitting a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions