Introducing Parameter Tables #414

mabruzzo · 2024-08-05T15:57:15Z

Overview

This PR introduces some minor miscellaneous parameter-refactoring (we start using std::string_view during parsing) and introduces the idea of Parameter Tables to the parameter file.

About Parameter Tables

This is an idea that shows up in a number of different file-formats (in Enzo-E we call these parameter-groups). In particular, I'm borrowing the nomenclature and conventions from toml files¹.

The idea is that you declare a table by putting the table name between brackets. Consider the following snippet

key1=4

[table]
key1=5
key2=false

[table.take2]
key1=6

In this snippet, "key", "table.key1", "table.key2", and "table.take2.key1" all refer to independent parameters. (Note: the table.take2.key1 parameter can be defined even if the [table] section was completely removed).

We use the dotted-parameter notation in the codebase and on the command-line to specify the full name of a parameter. Technically, TOML allows you to use the dotted notation inside a parameter-file, but we don't support that (yet -- it's a little complicated to do that).

I tried to match all the TOML conventions. There are 2 key consequences:

If you already explicitly defined a table, it is an error to explicitly define the parameter again
If you define a dotted table-name, like [my.table], the file acts as though you have implicitly defined the [my] table if you haven't already defined it. It is OK to explicitly define the [my] table after you explicitly define the [my.table] table

Why is this useful

From an organizational perspective, I think this is very useful. Athena++ and Enzo-E do something comparable, and it makes the meaning of parameters simpler to understand. This is something I wanted to add for a while, but there wasn't a strong motivation for doing it.

In fact there is also a practical usage -- provided by the new ParameterMap::Enforce_Table_Content_Uniform_Access_Status method.

The impetus for doing this was refactoring some cooling/chemistry stuff (I'll post that PR immediately afterwords). While I was doing that, I encountered the common situation where some chemistry/cooling solvers need some extra parameters that aren't required by other solvers. I wanted to raise some kind of error if an unused parameter was specified.
I realized if we could logically group together the chemistry/cooling parameters, and we could leverage the fact that we track which parameters are accessed from the parameter-file, we could automatically raise a new error.

In general, I firmly believe we should be evolving cholla's parameter-file format to be a strict-subset of TOML. TOML is nice because it is well-specified (unlike INI) and it has nice support for shallowly nested parameter tables/group (for deeper nesting, you probably want something more complex). Importantly it is also mostly compatible with the existing format (the main obstacle, is that toml expects all string arguments to be enclosed in quotes -- I'm confident we can migrate to that). ↩

mabruzzo · 2024-08-09T14:04:36Z

I cannot replicate the single failing test when I manually run the tests on the CRC.

mabruzzo · 2024-08-09T15:34:42Z

The particular test in question that is failing, see here, totally baffles me!

There is no part of this test that is remotely related to parameter-file parser. In detail, this test consists of:

passing in a single cell with $\rho = 1, \rho v_x = 0, \rho v_y = 0, \rho v_z = 0, E = 1, \gamma = 5/3, \delta x = \delta y = \delta z = 1$ to the function that computes the inverse timestep (without any CFL number).
You can actually work this out by hand. Since $v_x = v_y= v_z =0$, this is $dt^{-1} = c_s/ \delta x = \delta x^{-1} \sqrt{\gamma (\gamma - 1) E / \rho} = \sqrt{10}/3$. This is exactly the expected result (1.054...).
The test is failing and saying that we get a value that is 50% larger. This frankly makes no sense to me how that could possibly happen! (It has absolutely nothing to do with parameter parsing -- unless we are somewhere reading in parameters from within the test-runner and those properties are bleeding over into tests in unexpected ways).
The most frustrating part is I absolutely can't replicate this issue myself on ppc-n0. I have tried running it a few times myself.

I have restarted this pipeline a few times. The very first time it ran, it started to hang on a few MHD tests. The second time
it ran, this error was the only error. Subsequently I tried running it again and more things started to fail. Finally I tried running it a 4th time and got a message that the whole node went down.

When ppc-n4 comes back up, I guess I will try pushing new versions of this test and start printing things out in order to try to debug it that way (unless somebody knows of a better way to do it¹ -- @bcaddy, do you have any ways?)

for example, circleci offers the capability for you to ssh into a test write after it fails. From the little bit of searching that I did, I don't think Jenkins supports doing that ↩

mabruzzo · 2024-08-09T15:36:21Z

The weirdest part is that #415 directly includes the changes from this PR and the tests all pass there

bcaddy · 2024-08-09T15:46:17Z

PR #397 is having the same issue with the same test failing with the same incorrect test value. I have no clue what the issue is. I tried running the tests manually on C-3PO, PPC-n0 and PPC-n4 and they all passed when run manually. At this point my only guess is that it's either an issue with Jenkins or a race condition. If it was a race condition I would expect it to fail intermittently but so far it's been very consistent. If it was a hardware fault I would expect it to fail on every PR, not just these two. I think @evaneschneider was looking into it but I don't know what the status of that is.

Printing is the best way I've found to interrogate the state within a Jenkins run, there might be a Jenkins plugin that helps but idk. When I'm having issues with a single thing like this I'll disable all the matrix elements except the problematic one so it runs a little faster. I did look into CircleCI when we were first doing tests, they wanted like $3k a month for what we needed.

bcaddy · 2024-08-09T15:47:39Z

The issue does appear to be attached to specific PRs and their contents. When I reverted everything in #397 it ran fine but as soon as I added the changes back it failed. That PR only changes the names of some variables that are totally unrelated so it really shouldn't impact anything

mabruzzo · 2024-08-09T15:50:20Z

PR #397 is having the same issue with the same test failing with the same incorrect test value. I have no clue what the issue is. I tried running the tests manually on C-3PO, PPC-n0 and PPC-n4 and they all passed when run manually. At this point my only guess is that it's either an issue with Jenkins or a race condition. If it was a race condition I would expect it to fail intermittently but so far it's been very consistent. If it was a hardware fault I would expect it to fail on every PR, not just these two. I think @evaneschneider was looking into it but I don't know what the status of that is.

Gotcha. That makes me feel a lot better about this issue (I thought I broke something)

Printing is the best way I've found to interrogate the state within a Jenkins run, there might be a Jenkins plugin that helps but idk. When I'm having issues with a single thing like this I'll disable all the matrix elements except the problematic one so it runs a little faster. I did look into CircleCI when we were first doing tests, they wanted like $3k a month for what we needed.

Yeah, I would expect CircleCI to be much too expensive for anything GPU-related

mabruzzo added 7 commits August 3, 2024 12:00

minor refactor.

bd3f1af

introducing new tests for parameter-tables.

7c3b984

intermediate commit

bdff8ab

another incremental commit.

ed7b425

significant cleanup

2b0eddd

some simplification.

ff2f2db

introduce the Enforce_Table_Content_Uniform_Access_Status method.

c73c2fb

mabruzzo mentioned this pull request Aug 5, 2024

Make the cooling-choice a runtime parameter #415

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing Parameter Tables #414

Introducing Parameter Tables #414

mabruzzo commented Aug 5, 2024

mabruzzo commented Aug 9, 2024 •

edited

Loading

mabruzzo commented Aug 9, 2024

mabruzzo commented Aug 9, 2024 •

edited

Loading

bcaddy commented Aug 9, 2024

bcaddy commented Aug 9, 2024

mabruzzo commented Aug 9, 2024

Introducing Parameter Tables #414

Are you sure you want to change the base?

Introducing Parameter Tables #414

Conversation

mabruzzo commented Aug 5, 2024

Overview

About Parameter Tables

Why is this useful

Footnotes

mabruzzo commented Aug 9, 2024 • edited Loading

mabruzzo commented Aug 9, 2024

Footnotes

mabruzzo commented Aug 9, 2024 • edited Loading

bcaddy commented Aug 9, 2024

bcaddy commented Aug 9, 2024

mabruzzo commented Aug 9, 2024

mabruzzo commented Aug 9, 2024 •

edited

Loading

mabruzzo commented Aug 9, 2024 •

edited

Loading