Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Add test case for unwanted patterns #30467

Merged
merged 6 commits into from
Jan 2, 2020

Conversation

ShaharNaveh
Copy link
Member

@ShaharNaveh ShaharNaveh commented Dec 25, 2019

Can be merged, after merging #30464

@alimcmaster1 alimcmaster1 added the CI Continuous Integration label Dec 25, 2019
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, added some comments about the structure of the script, and the style, but looks good.

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
@jbrockmendel
Copy link
Member

This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the ast stuff.

@datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this?

1 similar comment
@jbrockmendel
Copy link
Member

This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the ast stuff.

@datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this?

@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Dec 26, 2019

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

@alimcmaster1
Copy link
Member

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

Where does it say this out of interest? It must be at least 3.6 as we run validate_docstring.py in github actions?

@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Dec 26, 2019

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

Where does it say this out of interest? It must be at least 3.6 as we run validate_docstring.py in github actions?

Line 52 in the github actions under Looking for unwanted patterns section.

python_version_github_actions

(Screenshot because I squashed the original commit)

@datapythonista
Copy link
Member

We never use the system python, we install conda in every build, and create an environment, you'll have to activate it.

I'd yield the values as a namedtuple.

@ShaharNaveh ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 4 times, most recently from 4d6d179 to d5afed9 Compare December 26, 2019 20:20
@ShaharNaveh
Copy link
Member Author

and create an environment, you'll have to activate it.

The problem is, the activation of conda is happening after it's running the unwanted patterns check. Do you have any suggestions?

@datapythonista
Copy link
Member

Better move the string thing in the linting, which is a good idea anyway.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, added few comments with minor things, but good job. Thanks!

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
ci/code_checks.sh Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
argparser.add_argument(
"--format",
"-f",
default="default",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default="default",
default="{source_path}:{line_number}:{start}:{end}:{msg}",

So by default we use this format, and if the caller ones a different format, can call the script like script -f "##vso[task.logissue type=error; {source_path}]{msg}" path. So you don't need to know the constants with the formats beforehand, just the variables.

I'm unsure whether start and end provide value, are they the columns of the strings to concatenate? I'd say the line number should make clear enough where the problem is, if that's the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure whether start and end provide value.

Short summary:

start and end are used to pinpoint the unconcatenated string.


Long summary:

start and end are the two unconcatenated strings that needs to be concatenated.

lets say we have a file called script.py and it's located in path/to/script.py and on line 1337 there is a line of:

print("foo" "bar")

When running scripts/validate_string_concatenation.py

The default the output is looking like this:

./path/to/script.py:1337      BETWEEN "foo" AND "bar"

and the yielded dictionary is mapped like this:

source_path: "./path/to/script.py"
line_number: 1337
start: "foo"
end: "bar"

@ShaharNaveh ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 2 times, most recently from ecd2e93 to b0732c5 Compare December 29, 2019 10:52
ci/code_checks.sh Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
):
for values in strings_to_concatenate(os.path.join(subdir, file_name)):
is_failed = True
print(output_format.format(**values))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of this will be something like:

pandas/__init__.py:124:    BETWEEN first part of the string AND with a second part in another string

I don't think people will understand what's going on when seeing this error in the CI. I think we should display something like:

pandas/__init__.py:124:String unnecessarily split in two by black. Please merge them manually.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, will the message be too long?
and where do I put the source_path, line_number, start, end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The message being long is not a problem. Neither for the CI or for the code.

From my example, the format would be:

{source_path}:{line_number}:String unnecessarily split in two by black. Please merge them manually.

As I mentioned eariler, I don't find start and end particularly useful for the reader of the error. Feel free to leave them if you disagree, but better rename them, their name is misleading. string1 and string2 are not great names, but they're better. At least they don't create the wrong impression those are the positions of the strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can now see what you mean, Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks much better.

Sorry I said it wrong, but the format should actually be {source_path}:{line_number}:{msg}. We'll decide the error here. Imagine that this function can detect more than one error, we can't say in the format which is the error we want to receive.

So this could become:

Suggested change
print(output_format.format(**values))
msg = "String unnecessarily split in two by black. Please merge them manually."
print(output_format.format(source_path=source_path, line_number=line_number, msg=msg))

And since now the strings_to_concatenate function is just returning two values, I think it makes more sense to return a tuple with them, instead of a dict, so the for above would be:

for source_path, line_number in strings_to_concatenate(os.path.join(subdir, file_name)):

@ShaharNaveh ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 2 times, most recently from 7bdcd14 to 10df407 Compare December 29, 2019 14:41
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, looks great now. Just couple of minor things, and I think we can get it merged. Thanks!

@@ -86,7 +104,7 @@ def strings_to_concatenate(source_path: str) -> Generator[Dict[str, str], None,

Yields
------
dict of {str: str}
Tuple
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just list the two returns, instead of saying it's a tuple.

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
@ShaharNaveh
Copy link
Member Author

Can be merged after merging #30579

@datapythonista
Copy link
Member

Merged #30579, you can update your branch, so the CI passes

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just on small typo, but lgtm

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved
@jreback jreback added this to the 1.0 milestone Jan 1, 2020
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thanks @MomIsBestFriend

@ShaharNaveh
Copy link
Member Author

ShaharNaveh commented Jan 1, 2020

Thank you for the very close guidance @datapythonista

@jreback jreback merged commit 27f406f into pandas-dev:master Jan 2, 2020
@jreback
Copy link
Contributor

jreback commented Jan 2, 2020

thanks @MomIsBestFriend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: code check for " " introduced by black
5 participants