Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[enhancement] refactor onedal to_table #2151

Merged
merged 13 commits into from
Nov 11, 2024

Conversation

icfaust
Copy link
Contributor

@icfaust icfaust commented Nov 5, 2024

Description

Changes needed to simplify its interface and to correct circular import problems occurring in #2126 (i.e. onedal.validation is used throughout, and adding onedal's assert_all_finite is not possible). There is an avoidance of using the dpep_helpers because of the circular import of importing onedal.utils which imports some of onedal.utils.validation in the __init__.py file, therefore importing onedal.utils._dpep_helpers has to be done carefully. A follow up PR with an associated ticket will revert this change in the future.


PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.

You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).

Checklist to comply with before moving PR from draft:

PR completeness and readability

  • I have reviewed my changes thoroughly before submitting this pull request.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have updated the documentation to reflect the changes or created a separate PR with update and provided its number in the description, if necessary.
  • Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
  • I have added a respective label(s) to PR if I have a permission for that.
  • I have resolved any merge conflicts that might occur with the base branch.

Testing

  • I have run it locally and tested the changes extensively.
  • All CI jobs are green or I have provided justification why they aren't.
  • I have extended testing suite if new functionality was introduced in this PR.

Performance

  • I have measured performance for affected algorithms using scikit-learn_bench and provided at least summary table with measured data, if performance change is expected.
  • I have provided justification why performance has changed or why changes are not expected.
  • I have provided justification why quality metrics have changed or why changes are not expected.
  • I have extended benchmarking suite and provided corresponding scikit-learn_bench PR if new measurable functionality was introduced in this PR.

@icfaust icfaust marked this pull request as draft November 5, 2024 13:13
@icfaust icfaust marked this pull request as ready for review November 5, 2024 13:18
@icfaust
Copy link
Contributor Author

icfaust commented Nov 5, 2024

/intelci: run

@icfaust
Copy link
Contributor Author

icfaust commented Nov 5, 2024

/intelci: run

arrays, sycl_usm_ndarrays, and scalars. Tables will use pointers to the
original array data. Scalars will be copies. Arrays may be modified in-
place by oneDAL during computation. This works for data located on CPU and
SYCL-enabled Intel GPUs. Only singular datatypes are allowed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term singular datatype looks strange to me. Is it possible to come up with a more understandable wording? 'numeric data types', 'integer and floating point data types' - something like that.
Or maybe I didn't get what is meant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion, have modified it to be clearer.

@icfaust icfaust requested a review from Vika-F November 6, 2024 11:48
Copy link
Contributor

@Vika-F Vika-F left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

import dpnp

def _onedal_gpu_table_to_array(table, xp=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it is only gpu named? Will not these work with tables on CPU?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, changed to _table_to_array instead.

Comment on lines +81 to +85
#ifdef ONEDAL_DATA_PARALLEL
if (py::hasattr(obj, "__sycl_usm_array_interface__")) {
return convert_from_sua_iface(obj);
}
#endif // ONEDAL_DATA_PARALLEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

onedal/datatypes/_data_conversion.py Outdated Show resolved Hide resolved
onedal/datatypes/_data_conversion.py Show resolved Hide resolved
# investigate why dpnp.array(table, copy=False) doesn't work.
# Work around with using dpctl.tensor.asarray.
if xp == dpnp:
return dpnp.array(dpnp.dpctl.tensor.asarray(table), copy=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already asked this on #2126, but let me ask it here once again
I don't know is this dpctl module in dpnp is an official part of API? This may work now, but I am not sure for further versions of DPNP. It worth to add some comments here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question https://github.com/IntelPython/dpnp/blob/master/dpnp/__init__.py#L38 It looks like they haven't figured out how to use os.add_dll_directory and are importing dpctl to get the file path and add it to the environment path. So as long as they need to find dpctl's shared objects, they will do it this way.

@icfaust icfaust requested a review from samir-nasibli November 7, 2024 22:51
Copy link
Contributor

@samir-nasibli samir-nasibli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @icfaust ! Looks better to me.

@icfaust icfaust merged commit e2d7add into uxlfoundation:main Nov 11, 2024
24 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants