Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing files with multiple line continuations #937

Merged
merged 4 commits into from
Sep 20, 2023

Conversation

bprather
Copy link
Collaborator

@bprather bprather commented Sep 6, 2023

Fixes #936

PR Checklist

  • Code passes cpplint
  • New features are documented.
  • Adds a test for any bugs fixed. Adds tests for new features.
  • Code is formatted
  • Changes are summarized in CHANGELOG.md
  • CI has been triggered on Darwin for performance regression tests.
  • Docs build
  • (@lanl.gov employees) Update copyright on changed files

@bprather bprather changed the title Refresh some string buffers when written Fix parsing files with multiple line continuations Sep 6, 2023
Copy link
Collaborator

@lroberts36 lroberts36 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I agree that it might be desirable to re-write this at some point (although I am sure no one will ever have time for that).

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand how this change resolves the above issue, but seems like it soils be pretty easy to add a test to catch this, right?

@pgrete
Copy link
Collaborator

pgrete commented Sep 18, 2023

I now stared at this for some time and also don't fully understand how this fixes the issue.
@bprather , assuming that you've now understood the logic (as you fixed it), would you mind adding additional comments to the existing code so that future us have an easier time to parse the logic (and review)?

@bprather
Copy link
Collaborator Author

I gave it a shot?

I'm honestly not too sure how I could make it clearer or what I could say, sorry. The multiline_x vars are buffers used for line continuations. These buffers were not cleared when treating multiple line continuations (they reside outside the loop over lines in the file), so they accumulated more data over time, every time there was a line continuation. Clearing them fixes the problem.

This is a very common first mistake when writing state-machine parsers (I've made it myself). At risk of sounding sanctimonious, I think this is decent evidence for why we shouldn't be maintaining our own file parser when the problem's been solved well in so many other places.

@pgrete
Copy link
Collaborator

pgrete commented Sep 19, 2023

I was referring to

    if (continuing || continuation) {   // <--------------------- this logic
      multiline_name += param_name;
      multiline_value += param_value;
      multiline_comment += param_comment;
      continuing = true;
    }
    if (continuing && !continuation) { // <--------------------- and that logic (in combination with what happens inside those statements)
      continuing = false;
      param_name = multiline_name;
      param_value = multiline_value;
      param_comment = multiline_comment;
      // Clear the multiline_x buffers to hold the next multi-line value
      multiline_name = "";
      multiline_value = "";
      multiline_comment = "";
    }

For example, I'm reading if (continuing && !continuation) as "if sth is continuing and at the same time it's not a continuation then..." and cannot really tell what's going on.
Maybe it's just wording, but it seems @Yurlungur also had similar issues parsing the logic in that function.

@bprather
Copy link
Collaborator Author

continuing means we are continuing, in the parser, from a previous line. if (continuing && !continuation) means we've continued from the previous line, but this line does not contain a continuation character. Therefore, it indicates we should flush the current continuation buffer into the line data, because there aren't more lines to read.

The conditionals handle the 2 extra states possible when parsing a line, for a total of 3:

  1. Normal line, no continuation
  2. Continuation appears at the end of this line, we should append contents to a buffer
  3. No continuation appears at the end of this line, we should append contents and then flush the buffer

This is a bog-standard 3-state parser, like CS101 assignment simple, which it looks like Josh threw together very early in Parthenon development. I heavily expect that if either of you spend a few minutes with it, you could understand exactly what's going on, as I did to fix the bug flushing buffers. I do agree the variables could be named better, and I'll add basic comments outlining the states so it can be understood a little more easily.

Copy link
Collaborator

@Yurlungur Yurlungur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bprather the naming is much clearer now. I know we don't want to put too much effort into this, as it's just a simpler parser, but I do think some unit tests are in order if anyone ever has time.

(Or a rewrite, as @lroberts36 suggests.)

Copy link
Collaborator

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bprather
The updated comments make this indeed much easier to parse (pun intended ;))

@pgrete pgrete enabled auto-merge (squash) September 20, 2023 15:23
@pgrete pgrete merged commit 9b452e5 into develop Sep 20, 2023
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ParameterInput does not support multiple line continuations
4 participants