Support torch validation and add to spmd tests by ethanglaser · Pull Request #3003 · uxlfoundation/scikit-learn-intelex

ethanglaser · 2026-03-06T20:58:12Z

Description

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

Performance

I have measured performance for affected algorithms using scikit-learn_bench and provided at least a summary table with measured data, if performance change is expected.
I have provided justification why performance and/or quality metrics have changed or why changes are not expected.
I have extended the benchmarking suite and provided a corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

ethanglaser · 2026-03-06T20:58:22Z

/intelci: run

codecov · 2026-03-07T00:26:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
azure	`79.63% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

icfaust · 2026-03-07T22:04:39Z

onedal/tests/utils/_dataframes_support.py

        xp = array_api_modules[target_df]
        return xp.asarray(obj)
+    elif target_df == "torch":
+        if hasattr(torch, "xpu") and torch.xpu.is_available():


should probably be documented to prevent confusion

icfaust · 2026-03-07T22:09:50Z

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

david-cortes-intel · 2026-03-09T10:30:32Z

/intelci: run

david-cortes-intel · 2026-03-09T10:33:39Z

The CI error:

        elif target_df == "torch":
            if hasattr(torch, "xpu") and torch.xpu.is_available():
                return torch.as_tensor(obj, device="xpu", *args, **kwargs)
            else:
>               return torch.as_tensor(obj, device="cpu", *args, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E               TypeError: as_tensor(): argument 'dtype' must be torch.dtype, not type

ethanglaser · 2026-03-09T15:30:19Z

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

If installing both torch and dpnp without oneapi deps, there are no issues

yuejiaointel · 2026-03-10T06:34:04Z

onedal/tests/utils/_dataframes_support.py

        return xp.asarray(obj)
+    elif target_df == "torch":
+        if hasattr(torch, "xpu") and torch.xpu.is_available():
+            return torch.as_tensor(obj, device="xpu", *args, **kwargs)


It seems we are testing np.float32/np.float64 dtypes in convert_to_dataframe and torch does not take this, maybe we need to conver to torch dtype before passing to as_tensor

icfaust · 2026-03-10T20:38:23Z

I assume the dpc runtime mismatch issue between torch and dpnp is solved? otherwise this may make things difficult. good addition

If installing both torch and dpnp without oneapi deps, there are no issues

the release schedules are not matching which caused previous issues in one of the oneAPI minor releases. may break things for a bit at 2026.0 release w. r. t. torch

ethanglaser · 2026-03-25T16:35:43Z

CI Failure Triage Report

Run: 22786779906 | Date: 2026-03-06

Failed Jobs: WindowsNightly/venv Python3.10_Sklearn1.0, LinuxNightly [pytorch,numpy]-cpu test Python3.13_Sklearn1.6

WindowsNightly/venv Python3.10_Sklearn1.0 (failed step: Sklearnex testing)
LinuxNightly [pytorch,numpy]-cpu test Python3.13_Sklearn1.6 (failed step: Sklearnex testing)

Analysis

CI Failure Analysis

1. Failure Classification: PR-specific

2. Root Cause:

The PR introduces PyTorch tensor support but has dtype conversion incompatibilities between numpy/Python types and PyTorch's expected torch.dtype objects.

3. Evidence:

Linux failure pattern:

TypeError: as_tensor(): argument 'dtype' must be torch.dtype, not type

Windows failure pattern:

AssertionError: Regex pattern did not match.
Regex: 'Input contains infinity.'
Input: 'Input array located on a oneAPI device, but sklearnex installation does not have SYCL support.'

The Linux error shows PyTorch rejecting Python/numpy type objects where it expects torch.dtype. The Windows error suggests SYCL/oneAPI device handling issues when PyTorch tensors are processed without proper SYCL support.

4. Relevant Code Changes:

Primary issue in _dataframes_support.py:

def _convert_to_dataframe(obj, sycl_queue=None, target_df=None, *args, **kwargs):
    # ...
    elif target_df == "torch":
        if hasattr(torch, "xpu") and torch.xpu.is_available():
            return torch.as_tensor(obj, device="xpu", *args, **kwargs)  # ← dtype issue here
        else:
            return torch.as_tensor(obj, device="cpu", *args, **kwargs)   # ← dtype issue here

The *args, **kwargs likely contain numpy dtype objects (like np.float64) that need conversion to PyTorch dtypes (like torch.float64).

Secondary issue:
Adding torch to test filters without ensuring SYCL compatibility:

get_dataframes_and_queues(dataframe_filter_="dpnp,dpctl,torch", device_filter_="gpu")

5. Recommendation:

Fix the dtype conversion in _convert_to_dataframe:

elif target_df == "torch":
    # Convert numpy dtypes to torch dtypes if present
    if 'dtype' in kwargs:
        import numpy as np
        dtype_map = {
            np.float32: torch.float32,
            np.float64: torch.float64,
            np.int32: torch.int32,
            np.int64: torch.int64,
            # Add other needed mappings
        }
        if kwargs['dtype'] in dtype_map:
            kwargs['dtype'] = dtype_map[kwargs['dtype']]
    
    if hasattr(torch, "xpu") and torch.xpu.is_available():
        return torch.as_tensor(obj, device="xpu", **kwargs)
    else:
        return torch.as_tensor(obj, device="cpu", **kwargs)

Additionally:

Consider adding SYCL support checks before enabling torch in GPU device tests
Test the conversion function with various numpy dtype inputs
Verify that the similar failures on main branch aren't masking this new issue

The failures are directly caused by this PR's PyTorch integration and require dtype handling fixes before merging.

Similar failures in recent runs

Job	Source	Date	Matching Errors
WindowsNightly/venv Python3.10_Sklearn1.0	`main`	2026-03-20	error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0	`main`	2026-03-20	ValueError: Input array located on a oneAPI device, AssertionError: Regex pattern did not match.
WindowsNightly/venv Python3.10_Sklearn1.0	`main`	2026-03-20	ValueError: Input array located on a oneAPI device, AssertionError: Regex pattern did not match.
WindowsNightly/venv Python3.10_Sklearn1.0	`renovate/sphinx-9.x`	2026-03-25	error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0	`refactor_neighbor_array_api`	2026-03-25	error: JSON report files failed to be produced."
WindowsNightly/venv Python3.10_Sklearn1.0	`renovate/pytest-9.x`	2026-03-25	error: JSON report files failed to be produced."

Generated by CI Triage Bot

Support torch validation and add to spmd tests

71641da

formatting

f767365

icfaust reviewed Mar 7, 2026

View reviewed changes

yuejiaointel reviewed Mar 10, 2026

View reviewed changes

ethanglaser added 2 commits April 8, 2026 15:23

Merge branch 'main' into dev/eglaser-add-torch-spmd-validation

a3f3407

Add torch to remaining spmd tests

676fead

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support torch validation and add to spmd tests#3003

Support torch validation and add to spmd tests#3003
ethanglaser wants to merge 4 commits intouxlfoundation:mainfrom
ethanglaser:dev/eglaser-add-torch-spmd-validation

ethanglaser commented Mar 6, 2026

Uh oh!

ethanglaser commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 7, 2026 •

edited

Loading

Uh oh!

icfaust Mar 7, 2026

Uh oh!

icfaust commented Mar 7, 2026

Uh oh!

david-cortes-intel commented Mar 9, 2026

Uh oh!

david-cortes-intel commented Mar 9, 2026

Uh oh!

ethanglaser commented Mar 9, 2026

Uh oh!

yuejiaointel Mar 10, 2026

Uh oh!

icfaust commented Mar 10, 2026

Uh oh!

ethanglaser commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ethanglaser commented Mar 6, 2026

Description

Uh oh!

ethanglaser commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

icfaust Mar 7, 2026

Choose a reason for hiding this comment

Uh oh!

icfaust commented Mar 7, 2026

Uh oh!

david-cortes-intel commented Mar 9, 2026

Uh oh!

david-cortes-intel commented Mar 9, 2026

Uh oh!

ethanglaser commented Mar 9, 2026

Uh oh!

yuejiaointel Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

icfaust commented Mar 10, 2026

Uh oh!

ethanglaser commented Mar 25, 2026

CI Failure Triage Report

Analysis

CI Failure Analysis

1. Failure Classification: PR-specific

2. Root Cause:

3. Evidence:

4. Relevant Code Changes:

5. Recommendation:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Mar 7, 2026 •

edited

Loading