Skip to content

Support torch validation and add to spmd tests#3003

Draft
ethanglaser wants to merge 4 commits intouxlfoundation:mainfrom
ethanglaser:dev/eglaser-add-torch-spmd-validation
Draft

Support torch validation and add to spmd tests#3003
ethanglaser wants to merge 4 commits intouxlfoundation:mainfrom
ethanglaser:dev/eglaser-add-torch-spmd-validation

Conversation

@ethanglaser
Copy link
Copy Markdown
Contributor

Description


Checklist:

Completeness and readability

  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
  • I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
  • I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@ethanglaser
Copy link
Copy Markdown
Contributor Author

/intelci: run

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
azure 79.63% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

xp = array_api_modules[target_df]
return xp.asarray(obj)
elif target_df == "torch":
if hasattr(torch, "xpu") and torch.xpu.is_available():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should probably be documented to prevent confusion

@icfaust
Copy link
Copy Markdown
Contributor

icfaust commented Mar 7, 2026

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

@david-cortes-intel
Copy link
Copy Markdown
Contributor

/intelci: run

@david-cortes-intel
Copy link
Copy Markdown
Contributor

The CI error:

        elif target_df == "torch":
            if hasattr(torch, "xpu") and torch.xpu.is_available():
                return torch.as_tensor(obj, device="xpu", *args, **kwargs)
            else:
>               return torch.as_tensor(obj, device="cpu", *args, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E               TypeError: as_tensor(): argument 'dtype' must be torch.dtype, not type

@ethanglaser
Copy link
Copy Markdown
Contributor Author

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

If installing both torch and dpnp without oneapi deps, there are no issues

return xp.asarray(obj)
elif target_df == "torch":
if hasattr(torch, "xpu") and torch.xpu.is_available():
return torch.as_tensor(obj, device="xpu", *args, **kwargs)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we are testing np.float32/np.float64 dtypes in convert_to_dataframe and torch does not take this, maybe we need to conver to torch dtype before passing to as_tensor

@icfaust
Copy link
Copy Markdown
Contributor

icfaust commented Mar 10, 2026

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

If installing both torch and dpnp without oneapi deps, there are no issues

the release schedules are not matching which caused previous issues in one of the oneAPI minor releases. may break things for a bit at 2026.0 release w. r. t. torch

@ethanglaser
Copy link
Copy Markdown
Contributor Author

CI Failure Triage Report

Run: 22786779906 | Date: 2026-03-06

Failed Jobs: WindowsNightly/venv Python3.10_Sklearn1.0, LinuxNightly [pytorch,numpy]-cpu test Python3.13_Sklearn1.6

Analysis

CI Failure Analysis

1. Failure Classification: PR-specific

2. Root Cause:

The PR introduces PyTorch tensor support but has dtype conversion incompatibilities between numpy/Python types and PyTorch's expected torch.dtype objects.

3. Evidence:

Linux failure pattern:

TypeError: as_tensor(): argument 'dtype' must be torch.dtype, not type

Windows failure pattern:

AssertionError: Regex pattern did not match.
Regex: 'Input contains infinity.'
Input: 'Input array located on a oneAPI device, but sklearnex installation does not have SYCL support.'

The Linux error shows PyTorch rejecting Python/numpy type objects where it expects torch.dtype. The Windows error suggests SYCL/oneAPI device handling issues when PyTorch tensors are processed without proper SYCL support.

4. Relevant Code Changes:

Primary issue in _dataframes_support.py:

def _convert_to_dataframe(obj, sycl_queue=None, target_df=None, *args, **kwargs):
    # ...
    elif target_df == "torch":
        if hasattr(torch, "xpu") and torch.xpu.is_available():
            return torch.as_tensor(obj, device="xpu", *args, **kwargs)  # ← dtype issue here
        else:
            return torch.as_tensor(obj, device="cpu", *args, **kwargs)   # ← dtype issue here

The *args, **kwargs likely contain numpy dtype objects (like np.float64) that need conversion to PyTorch dtypes (like torch.float64).

Secondary issue:
Adding torch to test filters without ensuring SYCL compatibility:

get_dataframes_and_queues(dataframe_filter_="dpnp,dpctl,torch", device_filter_="gpu")

5. Recommendation:

Fix the dtype conversion in _convert_to_dataframe:

elif target_df == "torch":
    # Convert numpy dtypes to torch dtypes if present
    if 'dtype' in kwargs:
        import numpy as np
        dtype_map = {
            np.float32: torch.float32,
            np.float64: torch.float64,
            np.int32: torch.int32,
            np.int64: torch.int64,
            # Add other needed mappings
        }
        if kwargs['dtype'] in dtype_map:
            kwargs['dtype'] = dtype_map[kwargs['dtype']]
    
    if hasattr(torch, "xpu") and torch.xpu.is_available():
        return torch.as_tensor(obj, device="xpu", **kwargs)
    else:
        return torch.as_tensor(obj, device="cpu", **kwargs)

Additionally:

  • Consider adding SYCL support checks before enabling torch in GPU device tests
  • Test the conversion function with various numpy dtype inputs
  • Verify that the similar failures on main branch aren't masking this new issue

The failures are directly caused by this PR's PyTorch integration and require dtype handling fixes before merging.


Similar failures in recent runs
Job Source Date Matching Errors
WindowsNightly/venv Python3.10_Sklearn1.0 main 2026-03-20 error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0 main 2026-03-20 ValueError: Input array located on a oneAPI device, AssertionError: Regex pattern did not match.
WindowsNightly/venv Python3.10_Sklearn1.0 main 2026-03-20 ValueError: Input array located on a oneAPI device, AssertionError: Regex pattern did not match.
WindowsNightly/venv Python3.10_Sklearn1.0 renovate/sphinx-9.x 2026-03-25 error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0 refactor_neighbor_array_api 2026-03-25 error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0 renovate/pytest-9.x 2026-03-25 error: JSON report files failed to be produced."

Generated by CI Triage Bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants