Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

helper: split_markdown_front_matter #443

Merged
merged 6 commits into from
Jun 20, 2021

Conversation

Hritik14
Copy link
Collaborator

helper for istio and mozilla importers
The following two alternatives are also proposed:

def split_markdown_front_matter(lines: str) -> Tuple[str, str]:
    fmlines = []
    mdlines = []
    splitter = mdlines

    for index, line in enumerate(lines.split("\n")):
        if index == 0 and line.strip().startswith("---"):
            splitter = fmlines
        elif line.strip().startswith("---"):
            splitter = mdlines
        else:
            splitter.append(line)

    return "\n".join(fmlines), "\n".join(mdlines)

and the one present here under MPL license.

Signed-off-by: Hritik Vijay hritikxx8@gmail.com

@Hritik14 Hritik14 mentioned this pull request Apr 18, 2021
6 tasks
@Hritik14 Hritik14 force-pushed the front_matter_splitter branch 2 times, most recently from f80d122 to 209a853 Compare April 19, 2021 08:32
@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 22, 2021

This looks good to me, could you swap this with the thing we have in istio and run tests. That way we'll cover some more testcases and the refactor would also be done.

@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 22, 2021

As far as the alternative approaches are concerned, I would stick with something dumb and simple (implementation wise).
The mozilla one, and the one in comments seems the simplest. I'm not sure that the one in the comments would work in case the frontmatter doesn't start from first line. Does that even happen ?

@Hritik14
Copy link
Collaborator Author

swap this with the thing we have in istio and run tests

Done.

The mozilla one, and the one in comments seems the simplest

This PR actually originally offers the same code with a minor improvement.

I'm not sure that the one in the comments would work in case the frontmatter doesn't start from first line. Does that even happen ?

I'm not aware of any instance where front matter doesn't start at the very beginning. I looked at some implementations and it appears that it is rather a strict requirement. (Opening fence must begin on the first line of the markdown string/file ref)
Also, as --- is a valid markdown, I don't think it is allowed for it to mark the beginning of front matter without being at the very top.
I've verified this assumption with github gists and it appears to be true.
Here are the results: https://gist.github.com/Hritik14/7d88ed5d9789c0f5d726746d654590dd (github shows front matter in a nice table)

@sbs2001
Copy link
Collaborator

sbs2001 commented Apr 26, 2021

@Hritik14 thanks for the research and the refactor.

Let's use the one you suggested in the comments then. Though the current implementation is concise, it's not super obvious.

@Hritik14
Copy link
Collaborator Author

@sbs2001 I've updated it with the one in comments though I'd really appreciate @pombredanne's insight here as he was really in favor of the partition (d8ea801) approach.

@Hritik14
Copy link
Collaborator Author

Hritik14 commented May 3, 2021

@sbs2001 @pombredanne ping

Copy link
Collaborator

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! See some comments inline for your consideration.

vulnerabilities/helpers.py Outdated Show resolved Hide resolved
vulnerabilities/importers/istio.py Outdated Show resolved Hide resolved
vulnerabilities/importers/istio.py Outdated Show resolved Hide resolved
vulnerabilities/importers/istio.py Outdated Show resolved Hide resolved
vulnerabilities/helpers.py Outdated Show resolved Hide resolved
vulnerabilities/helpers.py Outdated Show resolved Hide resolved
vulnerabilities/helpers.py Show resolved Hide resolved
vulnerabilities/helpers.py Outdated Show resolved Hide resolved
vulnerabilities/helpers.py Outdated Show resolved Hide resolved

lines = lines.replace("\r\n", "\n")
for index, line in enumerate(lines.split("\n")):
if index == 0 and line.strip().startswith("---"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you are doing here with your "splitter" variable and this is sleek. But this may not be easy to read and understand at least it was not obvious to me at first.

What could you do to make this easier to read?

  1. the first test can be done outside of the loop since if the document starts with --- it must be on the first line which can be safely discarded (the first line).
  2. with that done, then the only thing left is that this is frontmatter until the line is --- and at this stage you switch to markdown/body. Or you could partition the remaining joined text on ---?

Or may be at least finding a better name than splitter would help? something like bucket?

Here are some alternative ways for your consideration:

frontmatter = []
markdown = []
lines = text.splitlines(False)
if lines[0] == "---":
    lines = lines[1:]
current_bucket = frontmatter
for line in lines:
    if line == "---":
        current_bucket = markdown
        continue
    current_bucket.append(line)

or this one not using any temp lists:

lines = text.splitlines(False)
# discard the first separator
if lines[0] == "---":
    lines = lines[1:]
text = "\n".join(lines)
frontmatter, _, markdown = text.partition("---\n")

The point I am trying to make is that the important thing is to make the code readable and easy to understand such that 6 months from now you or anyone can get what you meant easily. So naming and being explicit and clear matters a lot IMHO.

Copy link
Collaborator Author

@Hritik14 Hritik14 May 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took your second implementation and now it looks like

    lines = text.splitlines()
    if lines[0] == "---":
        lines = lines[1:]
        text = "\n".join(lines)
        frontmatter, _, markdown = text.partition("\n---\n")
        return frontmatter, markdown

    return "", text

Let me know what you think

In case the loop method is favorable, I'll make the necessary amends.

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request May 9, 2021
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14
Copy link
Collaborator Author

Hritik14 commented May 9, 2021

Thanks @pombredanne. I've incorporated the latest changes. Do have a look.

Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request May 9, 2021
Better documentation and more readable function structrue
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14 Hritik14 requested a review from pombredanne May 9, 2021 08:38
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request May 19, 2021
Better documentation and more readable function structrue
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Hritik14 added a commit to Hritik14/vulnerablecode that referenced this pull request Jun 8, 2021
Better documentation and more readable function structrue
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14
Copy link
Collaborator Author

Hritik14 commented Jun 8, 2021

@pombredanne ping

Copy link
Collaborator

@pombredanne pombredanne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See a minor nit FYI. Otherwise LGTM!

helper for istio and mozilla importers

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
also, sort imports

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Better documentation and more readable function structrue
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
@Hritik14 Hritik14 merged commit 366d3a1 into aboutcode-org:main Jun 20, 2021
Pushpit07 pushed a commit to Pushpit07/vulnerablecode that referenced this pull request Jul 27, 2021
Better documentation and more readable function structrue
review: aboutcode-org#443 (review)

Signed-off-by: Hritik Vijay <hritikxx8@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants