-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ray - protobuf issue #3963
Comments
@robhheise May I know what ray version you're using? |
using the default local Ray version from this code: |
Maybe a clarifying question: do I need to install a running Ray cluster before initializing, and if so, do I have to use the containers available in the ludwig github repo? The documentation |
@robhheise You don't need to! Ludwig is able to connect to an existing ray cluster if its already initialized in your environment, otherwise it'll initialize a local ray cluster for you. Ray should have been installed as a part of |
Thanks, so it does look the the local ray cluster is being initialized, but still getting the following trace:
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates |
Describe the bug
Using one of the examples, I am getting a protobuf error trace
To Reproduce
(train_stats, preprocessed_data, output_directory) = model.train(
dataset,
model_name="rotten_tomatoes",
output_directory="results_rotten_tomatoes",
)
Please provide code, yaml config file and a sample of data in order to entirely reproduce the issue.
Issues that are not reproducible will be ignored.
config_yaml = """
input_features:
- name: genres
type: set
preprocessing:
tokenizer: comma
- name: content_rating
type: category
- name: top_critic
type: binary
- name: runtime
type: number
- name: review_content
type: text
encoder:
type: embed
output_features:
- name: recommended
type: binary
"""
Screenshots
If applicable, add screenshots to help explain your problem.
Traceback (most recent call last):
File "/Users/robertheise/Documents/SD/kh-accel/model.py", line 52, in
(train_stats, preprocessed_data, output_directory) = model.train(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 654, in train
self._tune_batch_size(trainer, training_set, random_seed=random_seed)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/api.py", line 882, in _tune_batch_size
tuned_batch_size = trainer.tune_batch_size(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 551, in tune_batch_size
result = runner.run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 441, in run
return fit_no_exception(trainer)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ludwig/backend/ray.py", line 335, in fit_no_exception
result_grid = tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tuner.py", line 292, in fit
return self._local_tuner.fit()
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 455, in fit
analysis = self._fit_internal(trainable, param_space)
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/impl/tuner_internal.py", line 572, in _fit_internal
analysis = run(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/tune.py", line 678, in run
callbacks = _create_default_callbacks(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/utils/callback.py", line 105, in _create_default_callbacks
callbacks.append(TBXLoggerCallback())
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/ray/tune/logger/tensorboardx.py", line 165, in init
from tensorboardX import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/init.py", line 5, in
from .torchvis import TorchVis
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/torchvis.py", line 11, in
from .writer import SummaryWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/writer.py", line 18, in
from .event_file_writer import EventFileWriter
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/event_file_writer.py", line 28, in
from .proto import event_pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/event_pb2.py", line 16, in
from tensorboardX.proto import summary_pb2 as tensorboardX_dot_proto_dot_summary__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/summary_pb2.py", line 16, in
from tensorboardX.proto import tensor_pb2 as tensorboardX_dot_proto_dot_tensor__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/tensor_pb2.py", line 16, in
from tensorboardX.proto import resource_handle_pb2 as tensorboardX_dot_proto_dot_resource__handle__pb2
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/tensorboardX/proto/resource_handle_pb2.py", line 36, in
_descriptor.FieldDescriptor(
File "/Users/robertheise/Documents/SD/kh-accel/venv/lib/python3.10/site-packages/google/protobuf/descriptor.py", line 553, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates
Environment (please complete the following information):
MacOs M1
Ventura 13.3.1
3.10.11
ludwig v0.10.1
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: