Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable/disable specific progress bars #1595

Closed
Wauplin opened this issue Aug 16, 2023 · 5 comments
Closed

Enable/disable specific progress bars #1595

Wauplin opened this issue Aug 16, 2023 · 5 comments
Labels
good first issue Good for newcomers

Comments

@Wauplin
Copy link
Contributor

Wauplin commented Aug 16, 2023

At the moment, huggingface_hub overrides tqdm progress bars to be able to enable and disable them globally either using HF_HUB_DISABLE_PROGRESS_BARS environment variable or enable_progress_bars/disable_progress_bars helpers (env variable having priority). These helpers are meant to be reused by other HF libraries built on top of huggingface_hub.

As mentioned by @BenjaminBossan on slack (private), the problem with this approach is that a library can disable progress bars from other libraries which can be misleading. A possible solution would be to add a new parameter group: Optional[str] = None to the progress bars working similarly to logger names. Then, we can enable/disable progress bars only from specific libs:

disable_progress_bars() # disable everything
disable_progress_bars("peft") # disable progress bar in a group starting by "peft" (e.g. `peft.foo.bar`)
disable_progress_bars("peft.foo")

Having something like this could prove useful to enable/disable progress bars when running PEFT, a transformers training or a huggingface_hub push without enabling/disabling other cases (typically disable some progress bars in a local training). And still have HF_HUB_DISABLE_PROGRESS_BARS to disable everything (typically disable all progress bars in a production environment). Such a change would be backward compatible and we could iteratively add groups to the existing progress bars we have.

No need to be as complete as the logging module. We can just work with prefixes and that's all. No need to handle configuration files or more detailed environment variables.


Other suggestion: instead of setting group="peft" manually in the library implementation, we could retrieve it from the stack traceback with a bit of Python magic (when group=None). Haven't tried it myself but it would make the change immediately backward compatible in existing libraries. I'm not so opinionated about it. EDIT: dropping this idea. Explicit is better than implicit.

@lappemic
Copy link
Contributor

Hey @Wauplin i would love to start working on this as well. Since i am not quite sure this is what i understand and how i would proceed implementing it:

  1. Extending tqdm helpers: Make disable_progress_bars, enable_progress_bars and are_progress_bars_disabled support new group parameter and update progress bar ininitialisation. Something like this:
def disable_progress_bars(group: Optional[str] = None):
    if group is None:
        _PROGRESS_BAR_STATES["_global"] = False
    else:
        _PROGRESS_BAR_STATES[group] = False

def enable_progress_bars(group: Optional[str] = None):
    if group is None:
        _PROGRESS_BAR_STATES["_global"] = True
    else:
        _PROGRESS_BAR_STATES[group] = True

def are_progress_bars_disabled(group: Optional[str] = None) -> bool:
    if group is None or group not in _PROGRESS_BAR_STATES:
        return not _PROGRESS_BAR_STATES["_global"]
    return not _PROGRESS_BAR_STATES[group]
  1. Create global registery for progress bar states: Tracks the state of each group of progress bars. Something like
_PROGRESS_BAR_STATES = {"_global": True}  # Default is enabled for all

def _check_env_variable():
    return os.getenv("HF_HUB_DISABLE_PROGRESS_BARS") != "1"

_PROGRESS_BAR_STATES["_global"] = _check_env_variable()
  1. Refactor the existing code to use the new tqdmhelpers

I would love to get started on this and look forward to your guidance and suggestions.

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 10, 2024

Hey @lappemic! Thanks for your contributions! So yes, that would be the general idea. More in details:

def disable_progress_bars(group: Optional[str] = None):

=> I would rename group to name to be more consistent with the logging module (e.g. logging.getLogger(name=...))

Create global registery for progress bar states: Tracks the state of each group of progress bars.

In general, yes.
In practice, the HF_HUB_DISABLE_PROGRESS_BARS constant should still be defined in constant.py so no need for _check_env_variable. I would keep the "states" dict and HF_HUB_DISABLE_PROGRESS_BARS completely separated (and used both in are_progress_bars_disabled with a priority on the env variable ofc).

Also keep in mind states must be handled in hierarchy. Here is an example of how it is meant to work:

>>> disable_progress_bars("peft.foo")
>>> are_progress_bars_disabled("peft")
False
>>> are_progress_bars_disabled("peft.something")
False
>>> are_progress_bars_disabled("peft.foo")
True
>>> are_progress_bars_disabled("peft.foo.bar")
True
>>> enable_progress_bars("peft.foo.bar")
>>> are_progress_bars_disabled("peft.foo")
True
>>> are_progress_bars_disabled("peft.foo.bar")
False

In practice, I would use nested dicts to handle those and navigate in the tree using parts from name.split("."). Be careful with cases where a node is disabled (must disable all of its children).
I would advice to treat None as "" and to avoid introducing a "_global" private key.

Does that make things clearer for you? Happy to review a draft PR anytime.

@lappemic
Copy link
Contributor

Hey @Wauplin, thanks a lot for your detailed feedback!

I'll definitely align with your suggestion on naming.

As for managing the "states" dict hierarchically, I understand it as follows:

  1. Defining the "states" dict (like progress_bar_states) directly within tqdm.py
  2. Implementing additional utility functions within tqdm.py to manage the hierarchical states (e.g. set_progress_bar_state and get_progress_bar_state).
  3. Integrating this hierarchical logic into the existing utility functions (disable_progress_bars, enable_progress_bars, are_progress_bars_disabled).

Did I get this right? If so, I'm ready to start drafting a PR.

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 10, 2024

Yes exactly! Not sure the set_progress_bar_state/get_progress_bar_state helpers are necessary but if it helps, please go ahead and create them. We won't expose them publicly though (so please name them as private e.g. _set_progress_bar_state). Otherwise than that I think you're good to go!

@Wauplin
Copy link
Contributor Author

Wauplin commented Apr 29, 2024

Closed by #2217 thanks to @lappemic! 🤗

cc @lhoestq @mariosasko @BenjaminBossan who requested in the past the possibility to disable only certain progress bars. This is now possible!
For example to disable all progress bars generated by huggingace_hub:

from huggingface_hub.utils import disable_progress_bars

disable_progress_bars("huggingface_hub")

# or more specifically
disable_progress_bars("huggingface_hub.lfs_upload")

to use this feature in datasets/peft, you must provide a name to your tqdm init:

from huggingface_hub.utils import tqdm

# Declare a progress bar under the `peft.foo` namespace. 
with tqdm(range(5), name="peft.foo"):
    ...

Namespaces work similarly to logging, with hierarchical structure split by dots.

@Wauplin Wauplin closed this as completed Apr 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants