-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Post process add mondo utils #22
Conversation
….py and post_process.py, and a test50genes folder
…malco into post_process
…ostprocessing. Running `pheval run -i test50genes_input -r "malcorunner" -o test50genes_output -t tests` will generate plot. Make sure to comment out everything but the plot part in runner.py
…s, and cleaned up some analysis
…directly. Languages are defined in runner as a tuple, plotting is done as a last step in a separate function. Still slow, needs to be sped up
May make sense to include input phenopackets as test fixtures, in |
Plus the corresponding prompts, of course. |
Also looks like the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this on two phenopackets and it didn't get to the point of producing a plot, but it did produce results for en and es. Results below:
label term score rank
PMID_15266616_3.json_en-prompt.txt MONDO:0016512 1.0 1
PMID_15266616_3.json_en-prompt.txt MONDO:0018997 0.5 2
PMID_15266616_3.json_en-prompt.txt MONDO:0016033 0.3333333333333333 3
PMID_15266616_3.json_en-prompt.txt MONDO:0019188 0.25 4
PMID_15266616_3.json_en-prompt.txt MONDO:0008678 0.2 5
PMID_15266616_3.json_en-prompt.txt MONDO:0009341 0.16666666666666666 6
PMID_15266616_3.json_en-prompt.txt MONDO:0008965 0.14285714285714285 7
PMID_10874631_II.2.json_en-prompt.txt MONDO:0019200 1.0 1
PMID_10874631_II.2.json_en-prompt.txt MONDO:0018998 0.5 2
PMID_10874631_II.2.json_en-prompt.txt MONDO:0019501 0.3333333333333333 3
PMID_10874631_II.2.json_en-prompt.txt MONDO:0015993 0.25 4
PMID_10874631_II.2.json_en-prompt.txt MONDO:0019262 0.2 5
label term score rank
PMID_10874631_II.2.json_es-prompt.txt MONDO:0019200 1.0 1
PMID_10874631_II.2.json_es-prompt.txt MONDO:0019501 0.5 2
PMID_10874631_II.2.json_es-prompt.txt MONDO:0018998 0.3333333333333333 3
PMID_10874631_II.2.json_es-prompt.txt MONDO:0015993 0.25 4
PMID_10874631_II.2.json_es-prompt.txt MONDO:0015229 0.2 5
PMID_15266616_3.json_es-prompt.txt MONDO:0018997 1.0 1
PMID_15266616_3.json_es-prompt.txt MONDO:0008678 0.5 2
PMID_15266616_3.json_es-prompt.txt MONDO:0016033 0.3333333333333333 3
PMID_15266616_3.json_es-prompt.txt MONDO:0016512 0.25 4
PMID_15266616_3.json_es-prompt.txt MONDO:0019188 0.2 5
PMID_15266616_3.json_es-prompt.txt MONDO:0010035 0.16666666666666666 6
PMID_15266616_3.json_es-prompt.txt MONDO:0018923 0.14285714285714285 7
PMID_15266616_3.json_es-prompt.txt MONDO:0008564 0.125 8
PMID_15266616_3.json_es-prompt.txt MONDO:0009341 0.1111111111111111 9
This is confusing me, though - perhaps I'm missing something about how the scores are calculated. But according to the source phenopacket, the correct diagnosis is |
These results come from this function: whereas the comparison with the correct result is done here: Does this answer your question? |
Sorry for this, I added the new .jar in the right place, see 2a30bb1 |
Not sure I understand this correctly, maybe this is related to your first comment. It does compute everything that is in prompts, but the prepare step runs phenopacket2prompt.jar which populates the /prompts/{language} folder based on what is in the phenopacket_store_path, which is the location of a folder named phenopacket-store present in the input_directory, which is given to pheval via command line argument. Thus, one can place any set of phenopackets in such a folder and run the whole thing effortlessly. We could provide a set of test_inputdir, shipped directly with the code, containing:
We did discuss with Justin and Peter and said we want to change this. Namely, go from the setup_phenopackets() which fetches and downloads from the internet (from the, right now, latest phenopacket store release) to having the phenopackets simply saved directly in the gh repo :) Let me know if this makes sense! |
… be further tested. See #23
No worries - I had the full collection of pregenerated prompts in my |
Just catching up on this thread - I think the code needs to be tidied up a bit. Glad to help with this of course. It's pretty difficult to run from scratch in a new directory. As far as I can tell, you have to:
(similar error for Things work better when I start in the base project directory where everything already exists. The run starts, but I see many errors like this: |
That one's on me - it's an OntoGPT warning I keep trying to unsuccessfully squash. Shouldn't impact output here though. |
A few quick comments (via phone) since I may not be able to work on this until after ESHG: Regarding output directory and config.yaml, this is a quick fix, we should simply have them there ready to go when one clones. Tool_input_commands is a PhEval thing, in my case no issues with this, it gets automatically generated in the output folder if I remember this right. Test data dir is a PhEval thing, I don't think we are using it. Regarding the first errors: /prompts should be in the project top directory, or I guess wherever os.system(java...) is run from, whereas the folder phenopacket-store should have a subfolder containing the phenopackets. This is because in the store they are always grouped in folders by gene. I can see with @pnrobinson if we can edit it so that it will simply (recursively ) take all .json files it finds within the folder it is given, without the need for a subdirectory. The Italian errors are related to the missing Italian translation in the public versions, I circulated on slack a file containing also translations in Italian and German. Hope this helps in my absence. |
@justaddcoffee try and see if you can run the whole pipeline with 2/few phenopackets, please.
You may have to: create some test input and output folder, to be passed as cli arguments to pheval. The input folder should contain a folder called phenopacket-store, which contains another folder containing 2 phenopackets.
Maybe the jar file will give issues. I have been working on the java code and can update heavily the jar soon.