You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The genetics ETL aggregates l2g prediction and study data to build disease/target evidence. This dataset is picked up by the platform ETL to integrate with other evidence sources to build disease/target association. To evaluate the performance of l2g prediction it is desirable to work with the association data, however that requires a full ETL run, which makes l2g iteration very slow.
The aim of this issue is to add a step to gentropy, and submsequently add one more task to the ETL orchestration to build direct and indirect evidence dataset.
Direct associations: take evidence, groupby target/disease and apply a harmonic sum on the l2g scores.
Indirect associations: take evidence join with diease index, explode parent terms, group by target/parent disease and apply a harmonic sum on the l2g scores.
These two datasets needs to be saved as parquet files together with other ETL output. Important: this dataset is not ingested by the platform ETL.
The text was updated successfully, but these errors were encountered:
The genetics ETL aggregates l2g prediction and study data to build disease/target evidence. This dataset is picked up by the platform ETL to integrate with other evidence sources to build disease/target association. To evaluate the performance of l2g prediction it is desirable to work with the association data, however that requires a full ETL run, which makes l2g iteration very slow.
The aim of this issue is to add a step to gentropy, and submsequently add one more task to the ETL orchestration to build direct and indirect evidence dataset.
These two datasets needs to be saved as parquet files together with other ETL output. Important: this dataset is not ingested by the platform ETL.
The text was updated successfully, but these errors were encountered: