-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A step that runs seq2seq LMs in inference mode #119
Conversation
examples/eval_p3/requirements.txt
Outdated
@@ -0,0 +1 @@ | |||
rouge-score |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to add this requirement to dev-requirements.txt
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's helpful for CI so we can install everything with one command. For example, pip install -e [dev,examples]
. Speaking of, it would be great to have tests for this example just like we have for the train_gpt2
example.
tango/.github/workflows/ci.yml
Lines 81 to 85 in 8e09b66
- name: GPT2 example | |
extras: dev,examples,datasets,torch | |
run: | | |
cd examples/train_gpt2 | |
pytest -v --color=yes test.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I will switch this to torchmetrics
. It's not a new dependency, because it's part of PTL. And as of today, it has Rouge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, to make this work, I had to make "examples"
into an integration. torchmetrics
is installed with lightning, but I want to guarantee the correct version. Since 0.7.0 came out today, it does not yet get installed by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does "examples"
have to be an integration? Why not put this new dependency in dev-requirements.txt
along with other dependencies for examples?
Lines 54 to 57 in 8e09b66
################################################## | |
###### Extra dev dependencies for examples ####### | |
################################################## | |
transformers # needed by: examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tango/common/logging.py
Outdated
@@ -114,3 +116,78 @@ def initialize_logging( | |||
FILE_FRIENDLY_LOGGING = True | |||
os.environ["FILE_FRIENDLY_LOGGING"] = "true" | |||
click_logger.disabled = True | |||
|
|||
|
|||
def logging_tqdm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this different from our Tqdm
wrapper with FILE_FRIENDLY_LOGGING
on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- You don't have to set an environment variable. It's always on.
- It does not redirect
stderr
in a dodgy way. - Log messages get written to your logger, not the global
"tqdm"
logger. - It does not depend on implementation details of the original tqdm.
- It is less code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me rephrase. What I'm really asking is this: (A) why not implement this using our existing Tqdm
wrapper, and (B) why have two separate approaches for file friendly progress bars?
For (A), your 2nd point is still valid, but your approach of implementing a new tqdm from scratch adds more code that looks fairly tricky. We should at least have some test coverage here.
But to emphasize (B) again, it seems like there is a lot of overlap in use-cases here for your logging_tqdm
and the existing Tqdm
wrapper with FILE_FRIENDLY_LOGGING
on. I'd rather go with one or the other. Maybe our file-friendly version of the Tqdm
wrapper uses your logging_tqdm
code instead of tqdm
internals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to write tests for it, but only if we decide to actually use it.
I looked at our current usage of TQDM, and the only feature that logging_tqdm
is missing is wrapattr()
. There are a few others, but those only apply to rendering visual progress bars, so we don't have to care about them.
Overall I think logging_tqdm()
is the superior solution, but there is a lot to do, and the success of Tango won't hinge on the quality of the progress bars. So I'll take this thing out and replace it with FILE_FRIENDLY_LOGGING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…nto RunWithoutConfig
@@ -153,3 +153,34 @@ def could_be_class_name(name: str) -> bool: | |||
|
|||
def _is_valid_python_name(name: str) -> bool: | |||
return bool(name and name[0].isalpha() and name.isalnum()) | |||
|
|||
|
|||
def threaded_generator(g, queue_size: int = 16): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want this in the API docs you can import it in tango/common/__init__.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I mean, the only reason the |
Same is true for this step. But even if it could run on multiple devices, it would not alter the results. We could still put in a |
Is this good to go then? |
Ah, I have to call |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeup, this looks good other than the merge conflicts and using resolve_device()
✅
# Conflicts: # CHANGELOG.md # tango/common/logging.py # tango/local_workspace.py
There is so much stuff in here, it's probably easiest to look at the changelog: https://github.com/allenai/tango/pull/119/files#diff-06572a96a58dc510037d5efa622f9bec8519bc1beab13c9f251e97e657a9d4ed