-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI: Add test case for unwanted patterns #30467
CI: Add test case for unwanted patterns #30467
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, added some comments about the structure of the script, and the style, but looks good.
This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the @datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this? |
1 similar comment
This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the @datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this? |
3ae2f2b
to
6f2738f
Compare
@datapythonista It's looking like the running version of python in github's action is |
Where does it say this out of interest? It must be at least 3.6 as we run |
Line 52 in the github actions under (Screenshot because I squashed the original commit) |
a34b8ba
to
e7c1a7f
Compare
We never use the system python, we install conda in every build, and create an environment, you'll have to activate it. I'd yield the values as a namedtuple. |
4d6d179
to
d5afed9
Compare
The problem is, the activation of conda is happening after it's running the unwanted patterns check. Do you have any suggestions? |
Better move the string thing in the linting, which is a good idea anyway. |
d5afed9
to
ea3f0e7
Compare
ea3f0e7
to
a90ca90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, added few comments with minor things, but good job. Thanks!
a90ca90
to
fc0def6
Compare
argparser.add_argument( | ||
"--format", | ||
"-f", | ||
default="default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default="default", | |
default="{source_path}:{line_number}:{start}:{end}:{msg}", |
So by default we use this format, and if the caller ones a different format, can call the script like script -f "##vso[task.logissue type=error; {source_path}]{msg}" path
. So you don't need to know the constants with the formats beforehand, just the variables.
I'm unsure whether start
and end
provide value, are they the columns of the strings to concatenate? I'd say the line number should make clear enough where the problem is, if that's the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure whether
start
andend
provide value.
Short summary:
start
and end
are used to pinpoint the unconcatenated string.
Long summary:
start
and end
are the two unconcatenated strings that needs to be concatenated.
lets say we have a file called script.py
and it's located in path/to/script.py
and on line 1337
there is a line of:
print("foo" "bar")
When running scripts/validate_string_concatenation.py
The default the output is looking like this:
./path/to/script.py:1337 BETWEEN "foo" AND "bar"
and the yielded dictionary is mapped like this:
source_path: "./path/to/script.py"
line_number: 1337
start: "foo"
end: "bar"
ecd2e93
to
b0732c5
Compare
b0732c5
to
72f3dc3
Compare
): | ||
for values in strings_to_concatenate(os.path.join(subdir, file_name)): | ||
is_failed = True | ||
print(output_format.format(**values)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output of this will be something like:
pandas/__init__.py:124: BETWEEN first part of the string AND with a second part in another string
I don't think people will understand what's going on when seeing this error in the CI. I think we should display something like:
pandas/__init__.py:124:String unnecessarily split in two by black. Please merge them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this, will the message be too long?
and where do I put the source_path, line_number, start, end
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message being long is not a problem. Neither for the CI or for the code.
From my example, the format would be:
{source_path}:{line_number}:String unnecessarily split in two by black. Please merge them manually.
As I mentioned eariler, I don't find start
and end
particularly useful for the reader of the error. Feel free to leave them if you disagree, but better rename them, their name is misleading. string1
and string2
are not great names, but they're better. At least they don't create the wrong impression those are the positions of the strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can now see what you mean, Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks much better.
Sorry I said it wrong, but the format should actually be {source_path}:{line_number}:{msg}
. We'll decide the error here. Imagine that this function can detect more than one error, we can't say in the format which is the error we want to receive.
So this could become:
print(output_format.format(**values)) | |
msg = "String unnecessarily split in two by black. Please merge them manually." | |
print(output_format.format(source_path=source_path, line_number=line_number, msg=msg)) |
And since now the strings_to_concatenate
function is just returning two values, I think it makes more sense to return a tuple with them, instead of a dict, so the for above
would be:
for source_path, line_number in strings_to_concatenate(os.path.join(subdir, file_name)):
72f3dc3
to
280d103
Compare
7bdcd14
to
10df407
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks great now. Just couple of minor things, and I think we can get it merged. Thanks!
@@ -86,7 +104,7 @@ def strings_to_concatenate(source_path: str) -> Generator[Dict[str, str], None, | |||
|
|||
Yields | |||
------ | |||
dict of {str: str} | |||
Tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just list the two returns, instead of saying it's a tuple.
83e9246
to
5e272bc
Compare
Can be merged after merging #30579 |
Merged #30579, you can update your branch, so the CI passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just on small typo, but lgtm
afd9e0b
to
0530d02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @MomIsBestFriend
Thank you for the very close guidance @datapythonista |
thanks @MomIsBestFriend |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Can be merged, after merging #30464