-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] support for sinks in csv format for RAMSES frontend #4424
Conversation
Hi! Welcome, and thanks for opening this pull request. We have some guidelines for new pull requests, and soon you'll hear back about the results of our tests and continuous integration checks. Thank you for your contribution! |
field_offsets = {} | ||
_pfields = {} | ||
|
||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have in yt a way of supporting imports of external packages using
from yt.utilities.on_demand_imports import _pandas as pd
This will fail gracefully if pandas is not installed with a useful message.
yt/frontends/ramses/fields.py
Outdated
@@ -131,6 +131,22 @@ class RAMSESFieldInfo(FieldInfoContainer): | |||
("particle_metallicity", ("", [], None)), | |||
("particle_family", ("", [], None)), | |||
("particle_tag", ("", [], None)), | |||
# sink field parameters | |||
("particle_msink", ("code_mass", [], None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to use the canonical particle_mass
here and below?
Also the convention in yt is to use explicit names rather than common shorthand notations, e.g. lx
, ly
, lz
should be angular_momentum_{xyz}
, rho
should be density
, cs**
should be sound_speed_square(d)
, tform
should be formation_time
, ....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see. Here, I use the format : "particle_" + name of the header column in the csv file.
The column names are : msink
, x
, y
, z
, vx
,..., lx
,...
Maybe using the default convention of yt is better
yt/frontends/ramses/io.py
Outdated
@@ -92,6 +92,27 @@ def _ramses_particle_file_handler(fname, foffsets, data_types, subset, fields, c | |||
tr = {} | |||
ds = subset.domain.ds | |||
current_time = ds.current_time.in_units("code_time").v | |||
#test if sink csv file -> not same reader as the fortran classic files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest moving that piece of code to its own function (_ramses_particle_csv_file_handler
?) and rename the current function _ramses_particle_file_handler
by _ramses_particle_binary_file_handler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that is probably more clear
yt/frontends/ramses/io.py
Outdated
@@ -192,6 +213,9 @@ def _read_particle_fields(self, chunks, ptf, selector): | |||
for ax in "xyz": | |||
if pn % ax not in field_list: | |||
fields.append((ptype, pn % ax)) | |||
# attention : read only one chunk as its a global file, don't know wether this is the good way to do it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest removing this piece and code and instead modifying the following chunk (currently sitting at line 260):
_ramses_particle_file_handler(
fname, foffsets, data_types, subset, subs_fields, count=count
)
)
with something like
tr.update(
ph.reader(
fname, foffsets, data_types, subset, subs_fields, count=count
)
)
The reader
attribute could then be a method of the particle readers rather than a stand-alone method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, that would definitely solve our problem and it can be much more easy to later implement reader for any sort of file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to implement this if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But actually, if you keep something like this :
tr.update(
ph.reader(
fname, foffsets, data_types, subset, subs_fields, count=count
)
)
You are still updated the value of tr by appending the variable at the end, iterating over the chunks concerned.
To keep using this method, you need to check that the particle you are adding are well in the subset when you read it in the csv file. Thus checking with a few if
the position of the particle. I don't know if it is fast (but here the files are less than 1e3 sinks so not too fast).
A 2nd method would be to read once for all the data of the csv file, in this case, you need to avoid to go in the iteration line 240 in _read_particle_fields
:
for chunk in chunks:
for subset in chunk.objs:
In this case, the class would store all the sink produced, independently of their position.
It requires to either write some condition in _read_particle_fields
before going onto the iteration.
Maybe, we can implement this with _read_particle_data_file
method ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be difficult to check whether a particle is in the current subset though.
I think we have two options here that avoid having code specific to sinks
here:
- we update
_read_particle_fields
so that it returns a boolean that informs whether all the data has been read (sink
case) or not (all other cases). - we update the particle handlers so that they carry a flag that says "all data is in one file, please only read one chunk".
At the end of the day, the reason for this (slight) mess is that the index assumes that the data is spread over Ncpu
different domains, each of which forms one chunk. The best solution would require having a second index I guess, but that would be a massive change, especially for a simple case like this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
we update
_read_particle_fields
so that it returns a boolean that informs whether all the data has been read (sink case) or not (all other cases).- in that case, we need to modify deeply the class
BaseIOHandler
, as we modify the output of_read_particle_fields
no ?
- in that case, we need to modify deeply the class
-
we update the particle handlers so that they carry a flag that says "all data is in one file, please only read one chunk".
- yes, that could be the simplest idea no ?
what I wanted to say is that in the BaseIOHandler
there is a _read_particle_data_file
. I don't get it but there is a raise NotImplementedError
, whereas it is not implemented in the IOHandlerRAMSES
class. Maybe we can implement a method for data_file like this. So when reading the csv data, the called function is this one and not the former _read_particle_fields
. The ideal, would be a function setted as an attribute of the particle class, but that might require to modify the class BaseIOHandler
I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be right. I don't remember the inheritance of BaseIOHandler
. In that case, updating the particle handlers (they're specific to RAMSES and already encode all the specifics of each particle output file anyway) seems the best route.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hum, yes that seems the good idea. But for now, I don't manage the yield of _read_particle_fields
.
Do you know if this can be called from another function like this (this code does not work, I don't know a lot about coroutine in python) :
def _read_particle_fields(self, chunks, ptf, selector):
pn = "particle_position_%s"
chunks = list(chunks)
fields = [
(ptype, fname) for ptype, field_list in ptf.items() for fname in field_list
]
for ptype, field_list in sorted(ptf.items()):
for ax in "xyz":
if pn % ax not in field_list:
fields.append((ptype, pn % ax))
self._read_particle_from_binary_file(chunks, ptf, selector, fields)
def _read_particle_from_binary_file(self, chunks, ptf, selector, fields):
pn = "particle_position_%s"
for chunk in chunks:
for subset in chunk.objs:
rv = self._read_particle_subset(subset, fields)
for ptype, field_list in sorted(ptf.items()):
x, y, z = (np.asarray(rv[ptype, pn % ax], "=f8") for ax in "xyz")
mask = selector.select_points(x, y, z, 0.0)
if mask is None:
mask = []
for field in field_list:
data = np.asarray(rv.pop((ptype, field))[mask], "=f8")
print(data)
yield (ptype, field), data
In this case, the function _read_particle_from_binary_file
would be an attribute of the ParticleHandler
class.
@Lenoble-lab when you are done pushing new changes, please take a minute to address linting error reported by pre-commit.ci. All of them seem to be auto-fixable (see the note at the bottom of this page https://results.pre-commit.ci/run/github/88776493/1683098596.TWKhnqatSLSZQeSp2akaZw) |
Hello guys, |
Let's try to get this across the finish line. The priority would be to fix failing checks. |
I'm on holidays so please feel free to merge without me. I won't be able to review code in the coming weeks.
17 Jul 2023 12:12:01 Clément Robert ***@***.***>:
…
Let's try to get this across the finish line. The priority would be to fix failing checks.
* runtime test failure: https://tests.yt-project.org/job/yt_py38_git/7104/testReport/junit/yt.frontends.ramses.tests/test_outputs/test_ramses_part_count/
* style checks https://results.pre-commit.ci/run/github/88776493/1689259288.1fuat6kqReilI3lnYx0SMw
—
Reply to this email directly, view it on GitHub[#4424 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABJJII2Y7ZOUZO6WVAQ2RB3XQUFXBANCNFSM6AAAAAAXPFWF54].
You are receiving this because you commented.[Tracking image][https://github.com/notifications/beacon/ABJJII5YNMS46MPXBOPKKWLXQUFXBA5CNFSM6AAAAAAXPFWF56WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTBT2MF4.gif]
|
Hi, I don't get the errors from https://results.pre-commit.ci/run/github/88776493/1690275703.l2NvkRYcQf62QbImBS2O4g Do you know what are the problems ? |
All the linting errors on your branch appear to be auto-fixable (that's why the error messages are minimal). |
There's a merge conflict to resolve now. Would you allow me to rebase your branch or would you prefer to resolve it yourself ? |
hum, as you want I was eating so I didn't see it. Now, I don't have time but later in the afternoon I can have a deeper look. I run 'pre-commit run --all-files` but it changes deeply some files |
Co-authored-by: Corentin Cadiou <contact@cphyc.me> Co-authored-by: Clément Robert <cr52@protonmail.com>
fixed. I condensed the branch's history to eliminate back-and-forth edits which make stuff like Here are the commands you need to run to update your local clone of the repo git checkout main
git branch -D sink_csv
git fetch
git checkout sink_csv now we should be able to focus on getting tests to pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here are a couple of light weighted questions and suggestions to improve clarity and readability
yt/frontends/ramses/io.py
Outdated
for chunk in chunks: | ||
for subset in chunk.objs: | ||
|
||
if ptype != "sink_csv": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused. Shouldn't the guard clause be if ptype == "sink_csv":
here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an if and then an else, so it's a bit equivalent, isn't it ?
I was to force to add this condition as the reader is really not the same between the fortran file and the csv file. I didn't manage to find a way without.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed there's an else
in there, and having this if/else
branching isn't an issue, but I would expect the old code to correspond to previously supported cases (i.e. if ptype != "sink_csv"
). Here, it seems to me that the old code is run specifically in the else
clause, which matches the case you are adding. Am I missing anything ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I see. I will change with this idea
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
yt/frontends/ramses/io.py
Outdated
dat = pd.read_csv( | ||
fname, delimiter=",", usecols=[ind], skiprows=2, header=None | ||
) # read only selected fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we could call pd.read_csv
just once ahead of the loop instead of doing multiple (costly) reading operations ? If there was a chance of memory errors, I would expect the current implementation to hit those any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I change now.
But, it's far from being optimized : I nead to read the entire file in _read_part_csv_file_descriptor
to get the number of lines (to then initialize self.local_particle_count
). And then I read it again (but only the columns i'm interested in)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think reading is a csv file is to different from readinf the fortan binay file, so thying to use the same structure is not ideal
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there ! just a couple more small nits
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Co-authored-by: Clément Robert <cr52@protonmail.com>
Sorry for the delay, I was away and I completely forget... |
No need to apologise, and thank you for your contribution ! |
Hooray! Congratulations on your first merged pull request! We hope we keep seeing you around! 🎆 |
Hi everyone,
this pull request is to implement a reader to read the output from ramses sink particle algorithm.
For now, the sink ouput from the standard ramses (the one from the bitbucket) cannot be read. It's only an other format.
In each output_*****, the sink algo write a single sink_****.csv file with the properties of the sinks. I have added an example in attached file.
We need to read a single file per output folder in the csv format.
I think that the ideal would be to have a single ptype corresponding to the sinks (do not create a 'sink_csv' type). The read_header would have to be modified and specified as appropriate. For now, I have impleneted a new name 'sink_csv' to differentiate from the first class.
The main issue I had is the
IOHandlerRAMSES._read_particle_fields
which reads in every chunks concerned, whereas here there is only one single file to read. I hacked this with the if ptype == 'sink_csv' line 217 in io.py but this is really not ideal.Thinking about it, I see several solutions:
add a flag in the sink class to specify that it should only be read once -> add an if in IOHandlerRAMSES._read_particle_fields to say that if this is the case, only one read is enough. And we avoid entering the for chunk in chunks loop.
we create a new IOHandlerRAMSES._read_sink_fields which reads only once the csv file, directly in IOHandlerRAMSES. It is possible to add a field type that would not be separated into cpu (I don't know if other Ramses features also write a single file per output). But then you lose the inheritance of the particle class (or re-create another one?).
sink_00010.csv
Thanks in advance !
Romain Lenoble