Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray - protobuf issue #3963

Open
robhheise opened this issue Mar 11, 2024 · 5 comments
Open

Ray - protobuf issue #3963

robhheise opened this issue Mar 11, 2024 · 5 comments
Labels
bug Something isn't working ray

Comments

@robhheise
Copy link

Describe the bug
Using one of the examples, I am getting a protobuf error trace

To Reproduce
(train_stats, preprocessed_data, output_directory) = model.train(
dataset,
model_name="rotten_tomatoes",
output_directory="results_rotten_tomatoes",
)

Please provide code, yaml config file and a sample of data in order to entirely reproduce the issue.
Issues that are not reproducible will be ignored.
config_yaml = """
input_features:
- name: genres
type: set
preprocessing:
tokenizer: comma
- name: content_rating
type: category
- name: top_critic
type: binary
- name: runtime
type: number
- name: review_content
type: text
encoder:
type: embed
output_features:
- name: recommended
type: binary
"""

Screenshots
If applicable, add screenshots to help explain your problem.
Traceback (most recent call last):
File "/Users/robertheise/Documents/SD/kh-accel/model.py", line 52, in
(train_stats, preprocessed_data, output_directory) = model.train(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 654, in train
self._tune_batch_size(trainer, training_set, random_seed=random_seed)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 882, in _tune_batch_size
tuned_batch_size = trainer.tune_batch_size(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 551, in tune_batch_size
result = runner.run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 441, in run
return fit_no_exception(trainer)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 335, in fit_no_exception
result_grid = tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tuner.py", line 292, in fit
return self._local_tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 455, in fit
analysis = self._fit_internal(trainable, param_space)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 572, in _fit_internal
analysis = run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tune.py", line 678, in run
callbacks = _create_default_callbacks(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/utils/callback.py", line 105, in _create_default_callbacks
callbacks.append(TBXLoggerCallback())
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/logger/tensorboardx.py", line 165, in init
from tensorboardX import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/torchvis.py", line 11, in
from .writer import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/writer.py", line 18, in
from .event_file_writer import EventFileWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 28, in
from .proto import event_pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/event_pb2.py", line 16, in
from tensorboardX.proto import summary_pb2 as tensorboardX_dot_proto_dot_summary__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/summary_pb2.py", line 16, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/tensor_pb2.py", line 16, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 553, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Environment (please complete the following information):

  • OS: [e.g. iOS]
    MacOs M1
  • Version [e.g. 22]
    Ventura 13.3.1
  • Python version
    3.10.11
  • Ludwig version
    ludwig v0.10.1

Additional context
Add any other context about the problem here.

@arnavgarg1
Copy link
Contributor

@robhheise May I know what ray version you're using?

@robhheise
Copy link
Author

using the default local Ray version from this code:
backend_config = {
"type": "ray",
"processor": {
"parallelism": 6,
"type": "dask",
},
"trainer": {
"use_gpu": False,
"num_workers": 3,
"resources_per_worker": {
"CPU": 2,
"GPU": 0,
},
},
}
backend = initialize_backend(backend_config)

@robhheise
Copy link
Author

Maybe a clarifying question: do I need to install a running Ray cluster before initializing, and if so, do I have to use the containers available in the ludwig github repo? The documentation
https://ludwig.ai/latest/user_guide/distributed_training/ suggests that I need to install Ray. FYI - the Ray Launcher link in the documentation is giving 404: https://docs.ray.io/en/latest/cluster/launcher.html

@arnavgarg1
Copy link
Contributor

@robhheise You don't need to! Ludwig is able to connect to an existing ray cluster if its already initialized in your environment, otherwise it'll initialize a local ray cluster for you.

Ray should have been installed as a part of ludwig[distributed]

@robhheise
Copy link
Author

Thanks, so it does look the the local ray cluster is being initialized, but still getting the following trace:
Traceback (most recent call last):
File "/Users/robertheise/Documents/SD/kh-accel/model.py", line 52, in
(train_stats, preprocessed_data, output_directory) = model.train(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 654, in train
self._tune_batch_size(trainer, training_set, random_seed=random_seed)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 882, in _tune_batch_size
tuned_batch_size = trainer.tune_batch_size(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 551, in tune_batch_size
result = runner.run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 441, in run
return fit_no_exception(trainer)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 335, in fit_no_exception
result_grid = tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tuner.py", line 292, in fit
return self._local_tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 455, in fit
analysis = self._fit_internal(trainable, param_space)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 572, in _fit_internal
analysis = run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tune.py", line 678, in run
callbacks = _create_default_callbacks(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/utils/callback.py", line 105, in _create_default_callbacks
callbacks.append(TBXLoggerCallback())
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/logger/tensorboardx.py", line 165, in init
from tensorboardX import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/torchvis.py", line 11, in
from .writer import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/writer.py", line 18, in
from .event_file_writer import EventFileWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 28, in
from .proto import event_pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/event_pb2.py", line 16, in
from tensorboardX.proto import summary_pb2 as tensorboardX_dot_proto_dot_summary__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/summary_pb2.py", line 16, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/tensor_pb2.py", line 16, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 553, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
(dask:('getitem-bf2c0ba7d80a99c4d574d2369c944368', 0) pid=12348)

@mhabedank mhabedank added the ray label Oct 21, 2024
@mhabedank mhabedank assigned mhabedank and unassigned mhabedank Oct 21, 2024
@mhabedank mhabedank added the bug Something isn't working label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ray
Projects
None yet
Development

No branches or pull requests

3 participants