Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: Adding script to validate consistent and correct capitalization among headings in documentation (#26941) #31114

Merged
merged 63 commits into from
Mar 7, 2020
Merged
Show file tree
Hide file tree
Changes from 42 commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
85e3fe6
changed name
Jan 13, 2020
e2d5354
adding sphinx extension
Jan 14, 2020
f089c0c
Starting builder
Jan 14, 2020
2dd5791
experimenting with builder
Jan 14, 2020
2294331
before running build
Jan 14, 2020
c06c951
126 warnings?
Jan 14, 2020
2ffeee0
contributors
Jan 14, 2020
1364f86
experimenting
Jan 14, 2020
bb535ae
update
Jan 14, 2020
6b51df6
parser created
Jan 14, 2020
d6198a6
italics working
Jan 14, 2020
21693b6
found a way to collect all heading strings from doctree
Jan 14, 2020
0810c09
testing script
Jan 15, 2020
30c4f8c
modified validation script
Jan 15, 2020
aabd136
command line arguments possible for validation script
Jan 15, 2020
4c83edb
validation script needs better commenting
Jan 16, 2020
50661c3
added line number to validation script
Jan 17, 2020
f513f29
edited code_checks.sh
Jan 17, 2020
2d3cfe7
argument parser correctly implemented
Jan 17, 2020
4ceea5e
Added comments
Jan 17, 2020
11556b7
Validate consistency of title capitalization in documentation script …
Jan 17, 2020
e55776f
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 17, 2020
9fc312a
Adding script to validate consistency of title capitalization (#26941)
Jan 17, 2020
635163d
Adding validate_rst_title_capitalization.py (#26941)
Jan 17, 2020
c4ff8bd
Testing validate_rst_capitalization.py script (#26941)
Jan 18, 2020
927e3ed
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 18, 2020
83f778c
Edited validate script (#26941)
Jan 18, 2020
1907d45
Added parameter and return value information in docstrings
Jan 18, 2020
de06ec8
Edited validate_rst_title_capitalization.py for review (#29641)
Jan 18, 2020
b7c0bfd
Edited validate_rst_title_capitalization.py for review (#26941)
Jan 18, 2020
3d3a7f4
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 18, 2020
7ea58df
Checking if stderr output will be suppressed (#26941)
Jan 19, 2020
d71be41
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 19, 2020
60d8db9
Simplified validate_rst_title_capitalization.py to print correctly (#…
Jan 19, 2020
0e344ad
Testing script on doc/source/development/contributing.rst (#26941)
Jan 19, 2020
3757712
validate_rst_title_capitalization.py MomIsBestFriend edits (#26941)
Jan 19, 2020
3d95777
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 19, 2020
56bfc44
Created method to correct title capitalization (#26941)
Jan 21, 2020
0ec38e2
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 21, 2020
deddc2d
Ran black on validate_rst_title_capitalization (#26941)
Jan 21, 2020
0311fe0
Edit: titles with non-word character as first character are not valid
Jan 22, 2020
3256615
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
df01730
Simplified validate_rst_title_capitalization main method (#26941)
Jan 22, 2020
c1e3abb
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
9a9a57a
Edited parameter and return value description of main function
Jan 22, 2020
bafbf96
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 22, 2020
dd5c983
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 23, 2020
5f0f84a
Added glob module to script 01-27-2020
Jan 27, 2020
2fc019f
Merge remote-tracking branch 'upstream/master' into new-feature
Jan 27, 2020
ee45f98
Edited len(line) != 0 correction
Feb 4, 2020
88dfc46
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 4, 2020
78a49c1
Edited find_titles method
Feb 4, 2020
95d3488
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 22, 2020
1c7de87
Merge remote-tracking branch 'upstream/master' into new-feature
Feb 26, 2020
f4ffd32
edited contributing.rst to have no errors
Feb 26, 2020
3d2e9ce
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 2, 2020
687053f
modified validation script
Mar 6, 2020
ed3cdc6
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 6, 2020
c690281
fix linting errors
Mar 6, 2020
c9775cc
black pandas-dev change
Mar 6, 2020
66c651a
modified changes to comments
Mar 6, 2020
ac5c5b7
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 6, 2020
ceedac5
Merge remote-tracking branch 'upstream/master' into new-feature
Mar 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,10 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=GL03,GL04,GL05,GL06,GL07,GL09,GL10,SS04,SS05,PR03,PR04,PR05,PR10,EX04,RT01,RT04,RT05,SA02,SA03,SA05
RET=$(($RET + $?)) ; echo $MSG "DONE"

MSG='Validate correct capitalization among titles in documentation' ; echo $MSG
$BASE_DIR/scripts/validate_rst_title_capitalization.py $BASE_DIR/doc/source/development/contributing.rst
RET=$(($RET + $?)) ; echo $MSG "DONE"

fi

### DEPENDENCIES ###
Expand Down
272 changes: 272 additions & 0 deletions scripts/validate_rst_title_capitalization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
#!/usr/bin/env python
"""
Validate that the titles in the rst files follow the proper capitalization convention.

Print the titles that do not follow the convention.

Usage::
./scripts/validate_rst_title_capitalization.py doc/source/development/contributing.rst
./scripts/validate_rst_title_capitalization.py doc/source/

"""
import argparse
import sys
import re
import os
from typing import Tuple, Generator, List


CAPITALIZATION_EXCEPTIONS = {
"pandas",
"Python",
"IPython",
"PyTables",
"Excel",
"JSON",
"HTML",
"SAS",
"SQL",
"BigQuery",
"STATA",
"Interval",
"PEP8",
"Period",
"Series",
"Index",
"DataFrame",
"C",
"Git",
"GitHub",
"NumPy",
"Apache",
"Arrow",
"Parquet",
"MultiIndex",
"NumFOCUS",
"sklearn",
"Docker",
}

CAP_EXCEPTIONS_DICT = {word.lower(): word for word in CAPITALIZATION_EXCEPTIONS}

bad_title_dict = {}

err_msg = "Heading capitalization formatted incorrectly. Please correctly capitalize"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're using this just once, I'd move it to the function where it's being used.



def correct_title_capitalization(title: str) -> str:
"""
Algorithm to create the correct capitalization for a given title

Parameters
----------
title : str
Heading string to correct
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved

Returns
-------
correct_title : str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
correct_title : str
str

Correctly capitalized heading

"""

correct_title: str = re.sub(r"^\W*", "", title).capitalize()
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved

removed_https_title = re.sub(r"<https?:\/\/.*[\r\n]*>", "", correct_title)

word_list = re.split(r"\W", removed_https_title)

for word in word_list:
if word.lower() in CAP_EXCEPTIONS_DICT:
correct_title = re.sub(
r"\b" + word + r"\b", CAP_EXCEPTIONS_DICT[word.lower()], correct_title
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
)

return correct_title


def is_following_capitalization_convention(title: str) -> bool:
"""
Function to return if a given title is capitalized correctly

Parameters
----------
title : str
Heading string to validate

Returns
-------
bool
True if title capitalized correctly, False if not

"""

correct_title = correct_title_capitalization(title)

if title != correct_title:
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
return False
else:
return True


def find_titles(rst_file: str) -> Generator[Tuple[str, int], None, None]:
"""
Algorithm to identify particular text that should be considered headings in an
RST file

See <https://thomas-cokelaer.info/tutorials/sphinx/rest_syntax.html> for details
on what constitutes a string as a heading in RST

Parameters
----------
rst_file : str
RST file to scan through for headings

Yields
-------
title : str
A heading found in the rst file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No blank line here

line_number : int
The corresponding line number of the heading

tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
"""

with open(rst_file, "r") as file_obj:
lines = file_obj.read().split("\n")
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved

regex = {
"*": r"^(?:\*{1})*$",
"=": r"^(?:={1})*$",
"-": r"^(?:-{1})*$",
"^": r"^(?:\^{1})*$",
"~": r"^(?:~{1})*$",
"#": r"^(?:#{1})*$",
'"': r'^(?:"{1})*$',
}

table = str.maketrans("", "", "*`_")

for line_no in range(1, len(lines)):
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
if len(lines[line_no]) != 0 and len(lines[line_no - 1]) != 0:
for key in regex:
match = re.search(regex[key], lines[line_no])
if match is not None:
if line_no >= 2:
if lines[line_no] == lines[line_no - 2]:
if len(lines[line_no]) == len(lines[line_no - 1]):
yield lines[line_no - 1].translate(table), line_no
break
if len(lines[line_no]) >= len(lines[line_no - 1]):
yield lines[line_no - 1].translate(table), line_no


def fill_bad_title_dict(rst_file: str) -> None:
"""
Method that fills up the bad_title_dict with incorrectly capitalized headings

Parameters
----------
rst_file : str
Directory address of a .rst file as a string

"""

if rst_file in bad_title_dict:
return

for title, line_number in find_titles(rst_file):
if not is_following_capitalization_convention(title):
if rst_file not in bad_title_dict:
bad_title_dict[rst_file] = [(title, line_number)]
else:
bad_title_dict[rst_file].append((title, line_number))


def find_rst_files(source_paths: List[str]) -> Generator[str, None, None]:
"""
Given the command line arguments of directory paths, this method
yields the strings of the .rst file directories that these paths contain

Parameters
----------
source_paths : str
List of directories to validate, provided through command line arguments

Yields
-------
directory_address : str
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
Directory address of a .rst files found in command line argument directories

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better remove this blank line (we're not validating this docstring, but this blank line would make it fail if we do.

"""

tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
for directory_address in source_paths:
if not os.path.exists(directory_address):
raise ValueError(
"Please enter a valid path, pointing to a valid file/directory."
)
elif directory_address.endswith(".rst"):
yield directory_address
else:
for (dirpath, _, filenames) in os.walk(directory_address):
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
for file in filenames:
if file.endswith(".rst"):
yield os.path.join(dirpath, file)


def main(source_paths: List[str], output_format: str) -> bool:
"""
The main method to print all headings with incorrect capitalization

Parameters
----------
source_paths : str
List of directories to validate, provided through command line arguments
output_format : str
Output format of the script.

Returns
-------
number_of_errors : int
tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
True if there are headings that are printed, False if not

tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
"""

tonywu1999 marked this conversation as resolved.
Show resolved Hide resolved
number_of_errors: int = 0

directory_list = find_rst_files(source_paths)

for filename in directory_list:
fill_bad_title_dict(filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This F feels overcomplicated. Just loop over find_titles here, and if title == correct_title_capitalization(title) yield a tuple with the file name, and the correct and incorrect titles.

I think you can get rid a lot of the code by doing that, and things will be much clearer.


if len(bad_title_dict) == 0:
return number_of_errors

for key in bad_title_dict:
for line in bad_title_dict[key]:
print(
f"""{key}:{line[1]}:{err_msg} "{line[0]}" to "{
correct_title_capitalization(line[0])}" """
)
number_of_errors += 1

return number_of_errors


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Validate heading capitalization")

parser.add_argument(
"paths", nargs="+", default=".", help="Source paths of file/directory to check."
)

parser.add_argument(
"--format",
"-f",
default="{source_path}:{line_number}:{msg}:{heading}",
help="Output format of incorrectly capitalized titles",
)

args = parser.parse_args()

sys.exit(main(args.paths, args.format))