CI: Add test case for unwanted patterns #30467

ShaharNaveh · 2019-12-25T19:29:47Z

closes CI: code check for " " introduced by black #30454
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Can be merged, after merging #30464

datapythonista

Nice work, added some comments about the structure of the script, and the style, but looks good.

scripts/validate_string_concatenation.py

jbrockmendel · 2019-12-26T00:51:48Z

This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the ast stuff.

@datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this?

jbrockmendel · 2019-12-26T00:52:15Z

This is really nice, thanks @MomIsBestFriend. I was not looking forward to figuring out the ast stuff.

@datapythonista is there someone on one of the flake/lint projects we might want to talk to about upstreaming this?

ShaharNaveh · 2019-12-26T17:04:26Z

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

alimcmaster1 · 2019-12-26T17:39:46Z

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

Where does it say this out of interest? It must be at least 3.6 as we run validate_docstring.py in github actions?

ShaharNaveh · 2019-12-26T18:07:37Z

@datapythonista It's looking like the running version of python in github's action is 2.7.15, is it possible to increase it?

Where does it say this out of interest? It must be at least 3.6 as we run validate_docstring.py in github actions?

Line 52 in the github actions under Looking for unwanted patterns section.

(Screenshot because I squashed the original commit)

datapythonista · 2019-12-26T19:57:37Z

We never use the system python, we install conda in every build, and create an environment, you'll have to activate it.

I'd yield the values as a namedtuple.

ShaharNaveh · 2019-12-26T20:26:10Z

and create an environment, you'll have to activate it.

The problem is, the activation of conda is happening after it's running the unwanted patterns check. Do you have any suggestions?

datapythonista · 2019-12-26T20:37:02Z

Better move the string thing in the linting, which is a good idea anyway.

scripts/validate_string_concatenation.py

datapythonista

Looks great, added few comments with minor things, but good job. Thanks!

scripts/validate_string_concatenation.py

ci/code_checks.sh

scripts/validate_string_concatenation.py

datapythonista · 2019-12-28T21:28:43Z

scripts/validate_string_concatenation.py

+    argparser.add_argument(
+        "--format",
+        "-f",
+        default="default",


Suggested change

default="default",

default="{source_path}:{line_number}:{start}:{end}:{msg}",

So by default we use this format, and if the caller ones a different format, can call the script like script -f "##vso[task.logissue type=error; {source_path}]{msg}" path. So you don't need to know the constants with the formats beforehand, just the variables.

I'm unsure whether start and end provide value, are they the columns of the strings to concatenate? I'd say the line number should make clear enough where the problem is, if that's the case.

I'm unsure whether start and end provide value.

Short summary:

start and end are used to pinpoint the unconcatenated string.

Long summary:

start and end are the two unconcatenated strings that needs to be concatenated.

lets say we have a file called script.py and it's located in path/to/script.py and on line 1337 there is a line of:

print("foo" "bar")

When running scripts/validate_string_concatenation.py

The default the output is looking like this:

./path/to/script.py:1337 BETWEEN "foo" AND "bar"

and the yielded dictionary is mapped like this:

source_path: "./path/to/script.py" line_number: 1337 start: "foo" end: "bar"

ci/code_checks.sh

scripts/validate_string_concatenation.py

datapythonista · 2019-12-29T13:40:37Z

scripts/validate_string_concatenation.py

+            ):
+                for values in strings_to_concatenate(os.path.join(subdir, file_name)):
+                    is_failed = True
+                    print(output_format.format(**values))


The output of this will be something like:

pandas/__init__.py:124: BETWEEN first part of the string AND with a second part in another string

I don't think people will understand what's going on when seeing this error in the CI. I think we should display something like:

pandas/__init__.py:124:String unnecessarily split in two by black. Please merge them manually.

I'm not sure about this, will the message be too long?
and where do I put the source_path, line_number, start, end?

The message being long is not a problem. Neither for the CI or for the code.

From my example, the format would be:

{source_path}:{line_number}:String unnecessarily split in two by black. Please merge them manually.

As I mentioned eariler, I don't find start and end particularly useful for the reader of the error. Feel free to leave them if you disagree, but better rename them, their name is misleading. string1 and string2 are not great names, but they're better. At least they don't create the wrong impression those are the positions of the strings.

I can now see what you mean, Fixed.

Cool, looks much better.

Sorry I said it wrong, but the format should actually be {source_path}:{line_number}:{msg}. We'll decide the error here. Imagine that this function can detect more than one error, we can't say in the format which is the error we want to receive.

So this could become:

Suggested change

print(output_format.format(**values))

msg = "String unnecessarily split in two by black. Please merge them manually."

print(output_format.format(source_path=source_path, line_number=line_number, msg=msg))

And since now the strings_to_concatenate function is just returning two values, I think it makes more sense to return a tuple with them, instead of a dict, so the for above would be:

for source_path, line_number in strings_to_concatenate(os.path.join(subdir, file_name)):

datapythonista

Cool, looks great now. Just couple of minor things, and I think we can get it merged. Thanks!

datapythonista · 2019-12-30T00:55:08Z

scripts/validate_string_concatenation.py

@@ -86,7 +104,7 @@ def strings_to_concatenate(source_path: str) -> Generator[Dict[str, str], None,

    Yields
    ------
-    dict of {str: str}
+    Tuple


Just list the two returns, instead of saying it's a tuple.

scripts/validate_string_concatenation.py

ShaharNaveh · 2019-12-31T12:12:29Z

Can be merged after merging #30579

datapythonista · 2019-12-31T13:15:38Z

Merged #30579, you can update your branch, so the CI passes

datapythonista

Just on small typo, but lgtm

scripts/validate_string_concatenation.py

datapythonista

lgtm, thanks @MomIsBestFriend

ShaharNaveh · 2020-01-01T21:58:25Z

Thank you for the very close guidance @datapythonista

jreback · 2020-01-02T01:10:38Z

thanks @MomIsBestFriend

alimcmaster1 added the CI Continuous Integration label Dec 25, 2019

datapythonista requested changes Dec 25, 2019

View reviewed changes

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from 3ae2f2b to 6f2738f Compare December 26, 2019 17:02

ShaharNaveh mentioned this pull request Dec 26, 2019

STY: Removed unconcatenated strings #30464

Merged

5 tasks

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from a34b8ba to e7c1a7f Compare December 26, 2019 19:52

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 4 times, most recently from 4d6d179 to d5afed9 Compare December 26, 2019 20:20

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from d5afed9 to ea3f0e7 Compare December 26, 2019 20:53

ShaharNaveh commented Dec 26, 2019

View reviewed changes

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved

ShaharNaveh requested a review from datapythonista December 26, 2019 21:20

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from ea3f0e7 to a90ca90 Compare December 26, 2019 21:23

datapythonista reviewed Dec 26, 2019

View reviewed changes

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from a90ca90 to fc0def6 Compare December 28, 2019 14:50

ShaharNaveh requested a review from datapythonista December 28, 2019 15:27

ShaharNaveh commented Dec 28, 2019

View reviewed changes

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved

datapythonista reviewed Dec 28, 2019

View reviewed changes

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 2 times, most recently from ecd2e93 to b0732c5 Compare December 29, 2019 10:52

ShaharNaveh requested a review from datapythonista December 29, 2019 10:53

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from b0732c5 to 72f3dc3 Compare December 29, 2019 11:28

datapythonista reviewed Dec 29, 2019

View reviewed changes

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from 72f3dc3 to 280d103 Compare December 29, 2019 13:52

ShaharNaveh requested a review from datapythonista December 29, 2019 13:56

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch 2 times, most recently from 7bdcd14 to 10df407 Compare December 29, 2019 14:41

datapythonista reviewed Dec 30, 2019

View reviewed changes

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from 83e9246 to 5e272bc Compare December 31, 2019 11:10

datapythonista reviewed Dec 31, 2019

View reviewed changes

scripts/validate_string_concatenation.py Outdated Show resolved Hide resolved

ShaharNaveh force-pushed the CI-unwanted-test-str-concat branch from afd9e0b to 0530d02 Compare January 1, 2020 13:24

MomIsBestFriend added 6 commits January 1, 2020 15:25

CI: Add test case for unwanted patterns

272099b

Refactored the code to use a generator

7a588bc

Fixes for datapythonista's review

c19a2a5

Fixes for review

2b67601

Fixes for review

605e70c

Remove incorrect default value from docstring

0530d02

ShaharNaveh requested a review from datapythonista January 1, 2020 14:00

jreback added this to the 1.0 milestone Jan 1, 2020

datapythonista approved these changes Jan 1, 2020

View reviewed changes

jreback merged commit 27f406f into pandas-dev:master Jan 2, 2020

ShaharNaveh deleted the CI-unwanted-test-str-concat branch January 2, 2020 01:31

ShaharNaveh mentioned this pull request Jan 6, 2020

CI: Unify code_checks whitespace checking #30755

Merged

5 tasks

This was referenced Jan 14, 2020

STY: concat strings #30991

Merged

CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Add test case for unwanted patterns #30467

CI: Add test case for unwanted patterns #30467

ShaharNaveh commented Dec 25, 2019 •

edited

Loading

datapythonista left a comment

jbrockmendel commented Dec 26, 2019

jbrockmendel commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019 •

edited

Loading

alimcmaster1 commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019 •

edited

Loading

datapythonista commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019

datapythonista commented Dec 26, 2019

datapythonista left a comment

datapythonista Dec 28, 2019

ShaharNaveh Dec 29, 2019

datapythonista Dec 29, 2019

ShaharNaveh Dec 29, 2019

datapythonista Dec 29, 2019

ShaharNaveh Dec 29, 2019

datapythonista Dec 29, 2019

datapythonista left a comment

datapythonista Dec 30, 2019

ShaharNaveh commented Dec 31, 2019

datapythonista commented Dec 31, 2019

datapythonista left a comment

datapythonista left a comment

ShaharNaveh commented Jan 1, 2020 •

edited

Loading

jreback commented Jan 2, 2020

	default="default",
	default="{source_path}:{line_number}:{start}:{end}:{msg}",

	print(output_format.format(**values))
	msg = "String unnecessarily split in two by black. Please merge them manually."
	print(output_format.format(source_path=source_path, line_number=line_number, msg=msg))

CI: Add test case for unwanted patterns #30467

CI: Add test case for unwanted patterns #30467

Conversation

ShaharNaveh commented Dec 25, 2019 • edited Loading

datapythonista left a comment

Choose a reason for hiding this comment

jbrockmendel commented Dec 26, 2019

jbrockmendel commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019 • edited Loading

alimcmaster1 commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019 • edited Loading

datapythonista commented Dec 26, 2019

ShaharNaveh commented Dec 26, 2019

datapythonista commented Dec 26, 2019

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Short summary:

Long summary:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShaharNaveh commented Dec 31, 2019

datapythonista commented Dec 31, 2019

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

ShaharNaveh commented Jan 1, 2020 • edited Loading

jreback commented Jan 2, 2020

ShaharNaveh commented Dec 25, 2019 •

edited

Loading

ShaharNaveh commented Dec 26, 2019 •

edited

Loading

ShaharNaveh commented Dec 26, 2019 •

edited

Loading

ShaharNaveh commented Jan 1, 2020 •

edited

Loading