Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Print number of variables in repr #4762

Merged

Conversation

Illviljan
Copy link
Contributor

@Illviljan Illviljan commented Jan 4, 2021

Show the printed and total number of variables in the repr.

  • Passes isort . && black . && mypy . && flake8
import numpy as np
import xarray as xr

a = np.arange(0, 15)
b = np.core.defchararray.add("long_variable_name", a.astype(str))
c = np.arange(0, 25)
d = np.core.defchararray.add("attr_", c.astype(str))
e = {k: 2 for k in d}
coords = dict(time=da.array([0, 1]))
data_vars = dict()
for v in b:
    data_vars[v] = xr.DataArray(
        name=v,
        data=np.array([0, 1]).astype(bool),
        dims=["time"],
        coords=coords,
    )
ds1 = xr.Dataset(data_vars)
ds1.attrs = e

# The repr now shows how many attributes in total there are:
print(ds1)
Out[10]: 
<xarray.Dataset>
Dimensions:               (time: 2)
Coordinates:
  * time                  (time) int32 0 1
Data variables: (12/15)
    long_variable_name0   (time) bool False True
    long_variable_name1   (time) bool False True
    long_variable_name2   (time) bool False True
    long_variable_name3   (time) bool False True
    long_variable_name4   (time) bool False True
    long_variable_name5   (time) bool False True
                   ...
    long_variable_name9   (time) bool False True
    long_variable_name10  (time) bool False True
    long_variable_name11  (time) bool False True
    long_variable_name12  (time) bool False True
    long_variable_name13  (time) bool False True
    long_variable_name14  (time) bool False True
Attributes: (12/25)
    attr_0:   2
    attr_1:   2
    attr_2:   2
    attr_3:   2
    attr_4:   2
    attr_5:   2
      ...
    attr_19:  2
    attr_20:  2
    attr_21:  2
    attr_22:  2
    attr_23:  2
    attr_24:  2

@Illviljan
Copy link
Contributor Author

Ouch, forgot about the doctests... Going through them by hand is not happening. Is there any automatic way to do that?

Workaround to avoid having to redo every single doctest... It is really only necessary when the data rows are limited. But I find it a bit difficult to count the rows quickly past like 7.
@keewis
Copy link
Collaborator

keewis commented Jan 4, 2021

unfortunately, I can't find a tool that does that. I guess either someone will have to write that tool or you will have to go through all the files and update by hand. For the latter, I would recommend to wait until the repr format has been reviewed so you don't have to do that more than once.

@Illviljan
Copy link
Contributor Author

I worked around it by only showing it when the repr is limited. That's the most important case anyway I think.

No need to limit max_rows now because the if condition handles that.
@max-sixty
Copy link
Collaborator

I've used pytest-regtest quite a lot, which is decent. It would be another dependency. For tests like this — where the output is easier to review than to create, these "expect" / "snapshot" / "regression" tests are ideal. If anyone wanted to swap out these tests for that, I would strongly support the effort.

An aside — I tried to make a similar tool that worked for inline results — https://github.com/max-sixty/pytest-accept. Unfortunately it's not possible to make it work with bare assert statements, without swapping out a lot of pytest's internals. For a while i worked a lot on https://github.com/mitsuhiko/insta, an excellent project, would be great to have the same thing in python.

@Illviljan Illviljan closed this Jan 4, 2021
@Illviljan Illviljan reopened this Jan 4, 2021
if len(mapping) > max_rows:
len_mapping = len(mapping)
if len_mapping > max_rows:
summary = [f"{summary[0]} ({max_rows}/{len_mapping})"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this no longer get the title though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be +=?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets the title via summary[0] in the f-string. I did this because I want the number to be displayed on the same row as title, Attributes: (12/25). If we do += the numbers would be shown on a new line.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I missed that

@max-sixty
Copy link
Collaborator

LGTM, any final comments before we merge?

@max-sixty max-sixty merged commit 81ed507 into pydata:master Jan 12, 2021
@max-sixty
Copy link
Collaborator

Thanks @Illviljan !

dcherian added a commit to TomNicholas/xarray that referenced this pull request Jan 18, 2021
* upstream/master: (342 commits)
  fix decode for scale/ offset list (pydata#4802)
  Expand user dir paths (~) in open_mfdataset and to_zarr. (pydata#4795)
  add a version info step to the upstream-dev CI (pydata#4815)
  fix the ci trigger action (pydata#4805)
  scatter plot by order of the first appearance of hue (pydata#4723)
  don't skip the scheduled CI (pydata#4806)
  coords: retain str dtype (pydata#4759)
  Fix interval labels with units (pydata#4794)
  Always force dask arrays to float in missing.interp_func (pydata#4771)
  Print number of variables in repr (pydata#4762)
  install conda as a library in the minimum dependency check CI (pydata#4792)
  Migrate CI from azure pipelines to GitHub Actions (pydata#4730)
  use conda.api instead of parallel calls to the conda binary (pydata#4775)
  Speed up missing._get_interpolator (pydata#4776)
  Remove special case in guess_engines (pydata#4777)
  improve typing of OrderedSet (pydata#4774)
  CI: ignore some warnings (pydata#4773)
  DOC: update hyperlink for xskillscore (pydata#4778)
  drop support for python 3.6 (pydata#4720)
  Trigger upstream CI on cron schedule (by default) (pydata#4729)
  ...
@Illviljan Illviljan deleted the Illviljan-print_len_variables_in_repr branch May 18, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants