Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataclass becomes empty after running ray remote function #28838

Closed
pingsutw opened this issue Sep 28, 2022 · 9 comments · Fixed by #42730
Closed

Dataclass becomes empty after running ray remote function #28838

pingsutw opened this issue Sep 28, 2022 · 9 comments · Fixed by #42730
Assignees
Labels
api-bug Bug in which APIs behavior is wrong bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order

Comments

@pingsutw
Copy link
Contributor

What happened + What you expected to happen

In our case, We use dataclass_json.to_json to serialize the dataclass, but we failed to serialize dataclass because the dataclass value became empty for some reason after running the ray remote function

Versions / Dependencies

python==3.9
ray==1.13.0

Reproduction script

from dataclasses import dataclass
from typing import cast
from dataclasses_json import dataclass_json, DataClassJsonMixin
import ray


@dataclass_json
@dataclass
class Datum(object):
    x: int
    y: str


@ray.remote
def ray_create_datum(x: int, y: str):
    return Datum(x=x, y=y)


d = Datum(x=1, y='hello')

print("value", cast(DataClassJsonMixin, d).to_json()) # value {"x": 1, "y": "hello"}
ray.get(ray_create_datum.remote(x=1, y='hello'))
print("value", cast(DataClassJsonMixin, d).to_json()) # value {}

Issue Severity

High: It blocks me from completing my task.

@pingsutw pingsutw added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 28, 2022
@hora-anyscale hora-anyscale added core Issues that should be addressed in Ray Core P1 Issue that should be fixed within a few weeks labels Dec 9, 2022
@rickyyx rickyyx added P0 Issues that should be fixed in short order and removed P1 Issue that should be fixed within a few weeks labels Dec 9, 2022
@rickyyx
Copy link
Contributor

rickyyx commented Dec 9, 2022

Did some investigation and get stuck - but I will post my findings first.

I am able to reproduce it with current master and dataclasses-json==0.5.7 on python 3.9

So a few scenarios have to be met in order this to be reproduced:

  1. the remote function ray_create_datum has to return a Datum instance, be it nested or non-nested.
  2. ray.get() has to be called on the ray_create_datum obj ref. (not running ray.get or running ray.get on other things will not surface the issue.
# This will not surface the issue
d = Datum(x=1, y="hello")
a = ray_create_datum.remote(x=2, y="hello")
ray.get(ray.put(1)) # only if ray.get(a)
print("value", cast(DataClassJsonMixin, d).to_json())  # value {}

Root cause (but why?)

After digging into the code a bit, I think the reason this to_json call returns empty is because deep down in the serialization code, some equality check fails (thus failing to retrieve the fields).

https://github.com/python/cpython/blob/main/Lib/dataclasses.py#L1255

return tuple(f for f in fields.values() if f._field_type is _FIELD)

_Here, when the issue surfaced, the d's fields still have _FIELD type, but the actual instances have a different instance. It seems when ray.get(), we do some "re-initialization" where we messed up with other objects here.

Before ray.get

  File "/data/home/rickyx/anaconda3/envs/py3.9/lib/python3.9/dataclasses.py", line 1101, in fields
    traceback.print_stack()
d[x]._field_type=_FIELD
id(d[x]._field_type)=140585947232048
id(_FIELD)=140585947232048

After ray.get

  File "/data/home/rickyx/anaconda3/envs/py3.9/lib/python3.9/dataclasses.py", line 1101, in fields
    traceback.print_stack()
d[x]._field_type=_FIELD
id(d[x]._field_type)=140585824886208
id(_FIELD)=140585947232048

@rickyyx
Copy link
Contributor

rickyyx commented Dec 9, 2022

Removing the P0 back to P1 since I think it's a pretty narrow edge case.

@rickyyx rickyyx added P1 Issue that should be fixed within a few weeks and removed P0 Issues that should be fixed in short order labels Dec 9, 2022
@pingsutw
Copy link
Contributor Author

Thank you for digging into it. I think we need to somehow revert the change on dataclass after ray task is complete.

@rkooo567 rkooo567 removed the triage Needs triage (eg: priority, bug/not-bug, and owning component) label Dec 12, 2022
@rkooo567 rkooo567 added api-bug Bug in which APIs behavior is wrong P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared and removed P1 Issue that should be fixed within a few weeks labels Mar 24, 2023
@amit9oct
Copy link

Facing the exact same issue. Any serialization of members of a class (using 'dataclasses' library) is not working when using 'ray' annotations on methods of the class. If this is a rare case scenario then there must be some way to circumvent this, I don't know how to circumvent this issue as of now.

@guseggert
Copy link

I am also encountering this issue. My specific scenario is to use dataclasses to pass data around between tasks and then eventually pass them into pandas to construct a dataframe. I can work around this for now by using dataclass_instance.__dict__.

@pingsutw
Copy link
Contributor Author

pingsutw commented Jul 7, 2023

HI, many customers run into these issues in the production environment, does anyone know how to resolve it?

@rickyyx
Copy link
Contributor

rickyyx commented Jul 7, 2023

Hey all - thanks for reporting, I will have time to look into this again.

@alvitawa
Copy link

alvitawa commented Aug 30, 2023

I have a possibly related issue: Simply initializing a dataclass inside of a ray trainable yields a dataclass with empty fields() (but otherwise the right attributes).

@dataclass
class MainConfig:
    seed: int = 42

Inside the ray tune:

(train_tune pid=230870) (Pdb) fields(MainConfig())
(train_tune pid=230870) ()

So in my case the dataclass breaks before even being serialized. This could also explain the serialization not working.

@rickyyx rickyyx added P1 Issue that should be fixed within a few weeks and removed P1.5 Issues that will be fixed in a couple releases. It will be bumped once all P1s are cleared labels Jan 11, 2024
@rkooo567 rkooo567 added P0 Issues that should be fixed in short order and removed P1 Issue that should be fixed within a few weeks labels Jan 17, 2024
@anyscalesam anyscalesam added P1 Issue that should be fixed within a few weeks and removed P0 Issues that should be fixed in short order labels Jan 25, 2024
@anyscalesam anyscalesam removed the P1 Issue that should be fixed within a few weeks label Jan 26, 2024
@anyscalesam anyscalesam added the P0 Issues that should be fixed in short order label Jan 26, 2024
@anyscalesam
Copy link
Contributor

Typo on downgrade; keeping to p0 after discuss with @rkooo567

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-bug Bug in which APIs behavior is wrong bug Something that is supposed to be working; but isn't core Issues that should be addressed in Ray Core P0 Issues that should be fixed in short order
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants