Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Typing] Paddle 的 CI 中引入 mypy 对于 API 中 docstring 的示例代码的类型检查 #63901

Merged
merged 39 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b8c2201
add type_hints for ci
megemini Apr 25, 2024
5dab33d
add type_hints unittest
megemini Apr 26, 2024
c11b093
tmp test for type hints
megemini Apr 26, 2024
436175f
change mypy version
megemini Apr 26, 2024
f6927c2
from __future__ import annotations
megemini Apr 26, 2024
c4ba2bf
tmp math.py docstring trigger ci
megemini Apr 26, 2024
9fba61a
tmp trigger ci
megemini Apr 26, 2024
36ba294
tmp debug mypy
megemini Apr 26, 2024
68354cb
fix paddle_build.sh
megemini Apr 27, 2024
7c2a715
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
megemini Apr 27, 2024
9654c26
setup with pyi
megemini Apr 27, 2024
732aff8
force reinstall
megemini Apr 27, 2024
458802a
setup.py type hints
megemini Apr 27, 2024
363ec84
restore math.py
megemini May 2, 2024
41aa797
update print_signatures.py for trigger type annotation ci
megemini May 2, 2024
c8775e3
update print_signatures.py member_dict for trigger type annotation ci
megemini May 2, 2024
0dae862
restore print_signatures.py
megemini May 2, 2024
5c996a0
get_api_md5 with ArgSpec & update unittest
megemini May 2, 2024
ce613b2
change math.py type annotation
megemini May 2, 2024
fb6cef6
change math.py type annotation return
megemini May 2, 2024
edc4b23
change math.py type annotation scale & stanh
megemini May 2, 2024
9857d86
update paddle_build.sh
megemini May 5, 2024
a7ed18c
[Update] type checker
megemini May 8, 2024
51fda55
tmp math.py, test=type_checking
megemini May 8, 2024
98dc1df
tmp math.py, test=type_checking
megemini May 9, 2024
eb1f468
tmp math.py, test=type_checking
megemini May 9, 2024
c75c574
tmp math.py, test=type_checking
megemini May 9, 2024
495d0b7
tmp math.py & fix paddle_build.sh, test=type_checking
megemini May 9, 2024
40b66c7
type checking on title
megemini May 10, 2024
0181f23
reduce log
megemini May 10, 2024
e7be07d
change mypy cache dir abspath
megemini May 10, 2024
618c3b9
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
megemini May 20, 2024
72067ed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
megemini May 29, 2024
37ae0ab
[Change] paddle_build.sh func
megemini May 29, 2024
cf37661
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into…
megemini May 30, 2024
5cc0c4b
[Update] filter api
megemini May 30, 2024
f9c381e
[Update] pyproject.toml & process pool for run
megemini May 30, 2024
06dee11
[Update] restore math.py
megemini May 31, 2024
ae07a13
[Update] restore math.py
megemini May 31, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 102 additions & 27 deletions paddle/scripts/paddle_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3488,7 +3488,6 @@ function build_document_preview() {
sh /paddle/tools/document_preview.sh ${PORT}
}


# origin name: example
function exec_samplecode_test() {
if [ -d "${PADDLE_ROOT}/build/pr_whl" ];then
Expand All @@ -3502,17 +3501,86 @@ function exec_samplecode_test() {

cd ${PADDLE_ROOT}/tools
if [ "$1" = "cpu" ] ; then
python sampcd_processor.py --debug --mode cpu; example_error=$?
python sampcd_processor.py --mode cpu; example_error=$?
elif [ "$1" = "gpu" ] ; then
SAMPLE_CODE_EXEC_THREADS=${SAMPLE_CODE_EXEC_THREADS:-2}
python sampcd_processor.py --threads=${SAMPLE_CODE_EXEC_THREADS} --debug --mode gpu; example_error=$?
python sampcd_processor.py --threads=${SAMPLE_CODE_EXEC_THREADS} --mode gpu; example_error=$?
fi
if [ "$example_error" != "0" ];then
echo "Code instance execution failed" >&2
exit 5
fi
}

function need_type_checking() {
set +x

# check pr title
TITLE_CHECK=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "typing" || true`

if [[ ${TITLE_CHECK} ]]; then
set -x
return 0
else
set -x
return 1
fi
}

function exec_type_checking() {
if [ -d "${PADDLE_ROOT}/build/pr_whl" ];then
pip install ${PADDLE_ROOT}/build/pr_whl/*.whl
else
echo "WARNING: PR wheel is not found. Use develop wheel !!!"
pip install ${PADDLE_ROOT}/build/python/dist/*.whl
fi

python -c "import paddle;print(paddle.__version__);paddle.version.show()"

cd ${PADDLE_ROOT}/tools

# check all sample code
TITLE_CHECK_ALL=`curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "typing all" || true`

if [[ ${TITLE_CHECK_ALL} ]]; then
python type_checking.py --full-test; type_checking_error=$?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

关于这块的mypy_cache有个小问题,如果我根据 shell 脚本来跑,这个路径将会是在tools/.mypy_cache, 而不是${PADDLE_ROOT}/.mypy_cache。是故意这么设计的嘛emmm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有什么影响吗? pyproject.toml 可以配置

[tool.mypy]
python_version = "3.8"
cache_dir = ".mypy_cache"

另外,刚好有个问题请教一下!

我这里 paddle_build.sh 中的 run_type_checking 参考的 check_run_sot_ci

    # use "git commit -m 'message, test=sot'" to force ci to run
    COMMIT_RUN_CI=$(git log -1 --pretty=format:"%s" | grep -w "test=sot" || true)
    # check pr title
    TITLE_RUN_CI=$(curl -s https://github.com/PaddlePaddle/Paddle/pull/${GIT_PR_ID} | grep "<title>" | grep -i "sot" || true)
    if [[ ${COMMIT_RUN_CI} || ${TITLE_RUN_CI} ]]; then
        set -x
        return
    fi

但是,$(git log -1 --pretty=format:"%s" | grep -w "test=type_checking" || true) 没有效果,所以去掉了,只使用了 title 的判断条件 ~

我本地 $(git log -1 --pretty=format:"%s" | grep -w "test=type_checking" || true) 没问题 ~

这是咋回事儿?有啥办法?

Copy link
Member

@gouzil gouzil May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个有什么影响吗? pyproject.toml 可以配置

[tool.mypy]
python_version = "3.8"
cache_dir = ".mypy_cache"

影响就是,这样就会生成和维护两份缓存了

mypy python/paddle/tensor/math.py # cache 位置: ${PADDLE_ROOT}/.mypy_cache

cd tools/
python type_checking.py --full-test # cache 位置: ${PADDLE_ROOT}/tools/.mypy_cache

另外,刚好有个问题请教一下!

可以把git log -1 改成 git log -10或者更大,因为 ci 在运行之前会 git pull upstream develop 所以导致没有命中我们想要的 commit (sot那个我自己来修吧, 感谢大佬发现的问题)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

影响就是,这样就会生成和维护两份缓存了

😅 我草率了 ... ... 当时没想这么多 ... ...

那我在 type_checking.py 里面改为绝对路径吧 ~

感谢!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gouzil 已修改 ~

else
python type_checking.py; type_checking_error=$?
fi

if [ "$type_checking_error" != "0" ];then
echo "Example code type checking failed" >&2
exit 5
fi
}


function exec_samplecode_checking() {
example_info_gpu=""
example_code_gpu=0
if [ "${WITH_GPU}" == "ON" ] ; then
{ example_info_gpu=$(exec_samplecode_test gpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code_gpu=$?
fi
{ example_info=$(exec_samplecode_test cpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code=$?

# TODO(megemini): type_checkding should be default after type annotation been done.
need_type_checking
type_checking_status=$?

if [[ ${type_checking_status} -eq 0 ]]; then
{ type_checking_info=$(exec_type_checking 2>&1 1>&3 3>/dev/null); } 3>&1
type_checking_code=$?
fi

summary_check_example_code_problems $[${example_code_gpu} + ${example_code}] "${example_info_gpu}\n${example_info}"

if [[ ${type_checking_status} -eq 0 ]]; then
summary_type_checking_problems $type_checking_code "$type_checking_info"
fi
}


function collect_ccache_hits() {
ccache -s
Expand Down Expand Up @@ -3553,10 +3621,11 @@ function test_model_benchmark() {
bash ${PADDLE_ROOT}/tools/test_model_benchmark.sh
}

function summary_check_problems() {
function summary_check_example_code_problems() {
set +x
local example_code=$1
local example_info=$2

if [ $example_code -ne 0 ];then
echo "==============================================================================="
echo "*****Example code error***** Please fix the error listed in the information:"
Expand All @@ -3579,6 +3648,33 @@ function summary_check_problems() {
}


function summary_type_checking_problems() {
set +x
local type_checking_code=$1
local type_checking_info=$2

if [ $type_checking_code -ne 0 ];then
echo "==============================================================================="
echo "*****Example code type checking error***** Please fix the error listed in the information:"
echo "==============================================================================="
echo "$type_checking_info"
echo "==============================================================================="
echo "*****Example code type checking FAIL*****"
echo "==============================================================================="
exit $type_checking_code
else
echo "==============================================================================="
echo "*****Example code type checking info*****"
echo "==============================================================================="
echo "$type_checking_info"
echo "==============================================================================="
echo "*****Example code type checking PASS*****"
echo "==============================================================================="
fi
set -x
}


function reuse_so_cache() {
get_html="https://api.github.com/repos/PaddlePaddle/Paddle"
curl -X GET ${get_html}/commits -H "authorization: token ${GITHUB_API_TOKEN}" >tmp.txt
Expand Down Expand Up @@ -4262,15 +4358,7 @@ function main() {
check_sequence_op_unittest
generate_api_spec ${PYTHON_ABI:-""} "PR"
set +e
example_info_gpu=""
example_code_gpu=0
if [ "${WITH_GPU}" == "ON" ] ; then
{ example_info_gpu=$(exec_samplecode_test gpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code_gpu=$?
fi
{ example_info=$(exec_samplecode_test cpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code=$?
summary_check_problems $[${example_code_gpu} + ${example_code}] "${example_info_gpu}\n${example_info}"
exec_samplecode_checking
assert_api_spec_approvals
;;
build_and_check_cpu)
Expand All @@ -4282,15 +4370,7 @@ function main() {
;;
build_and_check_gpu)
set +e
example_info_gpu=""
example_code_gpu=0
if [ "${WITH_GPU}" == "ON" ] ; then
{ example_info_gpu=$(exec_samplecode_test gpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code_gpu=$?
fi
{ example_info=$(exec_samplecode_test cpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code=$?
summary_check_problems $[${example_code_gpu} + ${example_code}] "${example_info_gpu}\n${example_info}"
exec_samplecode_checking
assert_api_spec_approvals
;;
check_whl_size)
Expand Down Expand Up @@ -4533,11 +4613,6 @@ function main() {
build ${parallel_number}
build_document_preview
;;
api_example)
{ example_info=$(exec_samplecode_test cpu 2>&1 1>&3 3>/dev/null); } 3>&1
example_code=$?
summary_check_problems $example_code "$example_info"
;;
test_op_benchmark)
test_op_benchmark
;;
Expand Down
29 changes: 29 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,32 @@ known-first-party = ["paddle"]
"test/dygraph_to_static/test_loop.py" = ["C416", "F821"]
# Ignore unnecessary lambda in dy2st unittest test_lambda
"test/dygraph_to_static/test_lambda.py" = ["PLC3002"]

[tool.mypy]
python_version = "3.8"
cache_dir = ".mypy_cache"
# Miscellaneous strictness flags
allow_redefinition = true
local_partial_types = true
strict = false
# Untyped definitions and calls
check_untyped_defs = true
# Import discovery
follow_imports = "normal"
# Miscellaneous
warn_unused_configs = true
# Configuring warnings
warn_redundant_casts = true
warn_unused_ignores = true
warn_no_return = true
# Configuring error messages
show_column_numbers = true

[[tool.mypy.overrides]]
module = [
"astor",
"cv2",
"scipy",
"xlsxwriter"
]
ignore_missing_imports = true
1 change: 1 addition & 0 deletions python/unittest_py/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@ wandb>=0.13 ; python_version<"3.12"
xlsxwriter==3.0.9
xdoctest==1.1.1
ubelt==1.3.3 # just for xdoctest
mypy==1.10.0
64 changes: 35 additions & 29 deletions tools/sampcd_processor_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
# See the License for the specific language governing permissions and
# limitations under the License.

from __future__ import annotations

import argparse
import inspect
import logging
Expand Down Expand Up @@ -48,6 +50,12 @@
API_DIFF_SPEC_FN = 'dev_pr_diff_api.spec'
TEST_TIMEOUT = 10

PAT_API_SPEC_MEMBER = re.compile(r'\((paddle[^,]+)\W*document\W*([0-9a-z]{32})')
# insert ArgSpec for changing the API's type annotation can trigger the CI
PAT_API_SPEC_SIGNATURE = re.compile(
r'^(paddle[^,]+)\s+\((ArgSpec.*),.*document\W*([0-9a-z]{32})'
)


class Result:
# name/key for result
Expand All @@ -66,7 +74,7 @@ class Result:
order: int = 0

@classmethod
def msg(cls, count: int, env: typing.Set) -> str:
def msg(cls, count: int, env: set) -> str:
"""Message for logging with api `count` and running `env`."""
raise NotImplementedError

Expand All @@ -85,8 +93,8 @@ class MetaResult(type):
def __new__(
mcs,
name: str,
bases: typing.Tuple[type, ...],
namespace: typing.Dict[str, typing.Any],
bases: tuple[type, ...],
namespace: dict[str, typing.Any],
) -> type:
cls = super().__new__(mcs, name, bases, namespace)
if issubclass(cls, Result):
Expand All @@ -104,7 +112,7 @@ def get(mcs, name: str) -> type:
return mcs.__cls_map.get(name)

@classmethod
def cls_map(mcs) -> typing.Dict[str, Result]:
def cls_map(mcs) -> dict[str, Result]:
return mcs.__cls_map


Expand Down Expand Up @@ -290,7 +298,7 @@ def prepare(self, test_capacity: set) -> None:
"""
pass

def run(self, api_name: str, docstring: str) -> typing.List[TestResult]:
def run(self, api_name: str, docstring: str) -> list[TestResult]:
"""Extract codeblocks from docstring, and run the test.
Run only one docstring at a time.

Expand All @@ -304,7 +312,7 @@ def run(self, api_name: str, docstring: str) -> typing.List[TestResult]:
raise NotImplementedError

def print_summary(
self, test_results: typing.List[TestResult], whl_error: typing.List[str]
self, test_results: list[TestResult], whl_error: list[str]
) -> None:
"""Post process test results and print test summary.

Expand Down Expand Up @@ -333,17 +341,17 @@ def get_api_md5(path):
API_spec = os.path.abspath(os.path.join(os.getcwd(), "..", path))
if not os.path.isfile(API_spec):
return api_md5
pat = re.compile(r'\((paddle[^,]+)\W*document\W*([0-9a-z]{32})')
patArgSpec = re.compile(
r'^(paddle[^,]+)\s+\(ArgSpec.*document\W*([0-9a-z]{32})'
)

with open(API_spec) as f:
for line in f.readlines():
mo = pat.search(line)
if not mo:
mo = patArgSpec.search(line)
mo = PAT_API_SPEC_MEMBER.search(line)

if mo:
api_md5[mo.group(1)] = mo.group(2)
else:
mo = PAT_API_SPEC_SIGNATURE.search(line)
api_md5[mo.group(1)] = f'{mo.group(2)}, {mo.group(3)}'

return api_md5


Expand Down Expand Up @@ -397,18 +405,6 @@ def get_full_api_from_pr_spec():
get_full_api_by_walk()


def get_full_api():
"""
get all the apis
"""
global API_DIFF_SPEC_FN # readonly
from print_signatures import get_all_api_from_modulelist

member_dict = get_all_api_from_modulelist()
with open(API_DIFF_SPEC_FN, 'w') as f:
f.write("\n".join(member_dict.keys()))


def extract_code_blocks_from_docstr(docstr, google_style=True):
"""
extract code-blocks from the given docstring.
Expand Down Expand Up @@ -599,9 +595,16 @@ def get_test_capacity(run_on_device="cpu"):
return sample_code_test_capacity


def get_docstring(full_test=False):
def get_docstring(
full_test: bool = False,
filter_api: typing.Callable[[str], bool] | None = None,
):
'''
this function will get the docstring for test.

Args:
full_test, get all api
filter_api, a function that filter api, if `True` then skip add to `docstrings_to_test`.
'''
import paddle
import paddle.static.quantization # noqa: F401
Expand All @@ -616,6 +619,9 @@ def get_docstring(full_test=False):
with open(API_DIFF_SPEC_FN) as f:
for line in f.readlines():
api = line.replace('\n', '')
if filter_api is not None and filter_api(api.strip()):
continue

try:
api_obj = eval(api)
except AttributeError:
Expand All @@ -637,7 +643,7 @@ def get_docstring(full_test=False):
return docstrings_to_test, whl_error


def check_old_style(docstrings_to_test: typing.Dict[str, str]):
def check_old_style(docstrings_to_test: dict[str, str]):
old_style_apis = []
for api_name, raw_docstring in docstrings_to_test.items():
for codeblock in extract_code_blocks_from_docstr(
Expand Down Expand Up @@ -715,8 +721,8 @@ def exec_gen_doc():


def get_test_results(
doctester: DocTester, docstrings_to_test: typing.Dict[str, str]
) -> typing.List[TestResult]:
doctester: DocTester, docstrings_to_test: dict[str, str]
) -> list[TestResult]:
"""Get test results from doctester with docstrings to test."""
_test_style = (
doctester.style
Expand Down
Loading