-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alter the default behavior for organizing outputs #367
Alter the default behavior for organizing outputs #367
Conversation
… the older behavior where all outputs are written to a single file.
This PR affects all downstream analysis. I agree it's "better", but it will have substantial downstream impact (either analysis scripts will be changed to reflect the output structure, or scripts will need to be used to move files around). I think the change to serial file naming is terrific, if that matters. |
Yes for sure! This was on the CAAR to-do list for a long time and we never got around to it. But since @mabruzzo added a flag to maintain the legacy behavior for now, I think the impact should be minimal for current users. This is a good reminder though that we should double-check the concatenation and plotting examples in the python scripts directory to make them consistent. |
I think that this will break the concatenation scripts. I'm happy to help/advise on how to update them for this but I think it should be part of the same PR or at least those changes should be in flight before this is merged. |
If it would be less controversial, we could make the default-behavior the default choice. With that said, it would probably hinder efforts gradually phase out the old system. It's unclear why the continuous integration failed. There doesn't seem to be any useful error messages when I click details |
If the CI stuff fails randomly try just rerunning it. Sometimes errors like that are just the file system or jenkins having issues. |
It should just be a matter of changing a single line of code in each of them, right? If so, I'm fine with just doing that as part of this PR. Although it's actually not clear to me from a quick glance that the concatenation scripts themselves would break, since they take the source directory as an input. It seems like it's the dask templates that would need to be edited. |
Things will break since they try to iterate over all of the outputs. But fixing that is easy enough. I'll make of point of doing that |
Is there any way to do that without pushing another commit? |
Ah yes good point, I'm not as familiar with the newer behavior. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good to me. Excellent use of std::filesystem
.
…her cholla files were written in a flat directory-structure or if files from different simulation cycles were written to separate directories
Overview
This PR changes the way that Cholla organizes output files.
Previously all outputs were written into a single output directory.
This PR changes the default output behavior. Now we group all outputs written at a single simulation cycle into a single directory. To use the older behavior you can set the new
legacy_flat_outdir
runtime parameter to1
(in the future, I think it would be good to remove the older behavior completely).More details
Output file paths traditionally followed the following template:
where each curly-braced token represents a different variable. In detail:
-
{outdir}
is the parameter from the parameter file. The historical behavior (that we currently maintain), if this is non-empty, then all characters following the last '/' are treated as a prefix to the output file name (if there aren't any '/' characters, then the whole string is effectively a prefix.-
{nfile}
is the current file-output count.-
{pre_extension_suffix}
is the pre-hdf5-extension suffix. It's the suffix that precedes the file extension (or{extension}
)-
{extension}
is the filename extension. Examples include ".h5" or ".bin" or ".txt".-
{proc_id}
represents the process-id that held the data that will be written to this file. In non-MPI runs, this will be omitted.The new default behavior is to create
where the the significance of each curly-braced token is largely unchanged. There are 2 things worth noting:
- all files written at a single simulation-cycle are now grouped in a single directory
-
{outdir}
never specifies a file prefix. When{outdir}
is empty, it is treated as "./". Otherwise, we effectively append '/' to the end of{outdir}
Implementation Details
While doing all of this, I took the opportunity to consolidate the logic for formatting output filenames. Previously the logic was scattered throughout the codebase. Now, the logic is encapsulated by a class called
FnameTemplate
.I acknowledge that it looks a little funny that we construct an
FnameTemplate
instance everywhere we want to determine an output filename (rather than just defining a function). I mostly did this in anticipation of some future refactoring of the io section of the codebase that I plan to do (basically it involves storing the output-related parameters outside of the global Parameter struct). In the short term, I think it would actually be nice to pass around a singleFnameTemplate
instance to all of the functions that use it, but I'm trying not to change function interfaces too much in the PR (I need to backport this change to the particle-feedback branch I have, which branched near the start of last summer). If you strongly dislike this design choice, I can change it.Other Thoughts:
Currently the output filenames differ between a simulation compiled with MPI run with a single process compared to a simulation compiled without MPI. All of the filenames written by the MPI version will end in
.h5.0
. The non-MPI version writes filenames terminating in.h5
. Going forward, I actually think it would be better to terminate the filenames with.h5.0
in the non-MPI case (for the sake of consistency). But I think that's a topic for another day