-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotation plugin support #426
Comments
VEP PluginsSupported language: perl ApproachPlugins are run for each line of input, before anything is printed to the output file. In addition the variant allele and overlapping genomic features are provided in an object. Plugins need to implement Implementation concernsDirectly supporting perl-based plugins, would require either integrating a perl ffi-interface into rust (complicated) or looking into perl-wasm compilation, which might work. (https://perlwasm.github.io/) |
open-cravat pluginsSupported language: python Approachopen-cravat supports modular annotators for a large number of annotation scores. Otherwise plugin functionality is pretty similar to vep. Implementation concernspyo3 can be used. compiling to wasm is also not well supported. |
DesignLook into https://perldoc.perl.org/perlembed and https://github.com/PyO3/pyo3. We might be able to get some very simple vep and open-cravat extensions running. Afterwards we should compare performance of these against e.g. wasm based extensions and potentially offer that as the main plugin approach. |
I looked a bit and now wonder how many plugins we can get to run. E.g. the VEP plugins often rely on the "tva" argument which is a complex data structure. See NMD for a simple VEP plugin. It might be easier to provide some infrastructure for tabox lookup and then implement some plugins and crowd source from then on (after publication). Overall, our native interface could pass the current vcf record as JSON serialization plus, say transcript Infos as JSON (serde is really cool), and vcf header as JSON and return a changed record as JSON. |
What about the following. We create a native plugin system based on extism. This allows writing plugins in wasm. We pass data through interfaces as JSON for simplicity. We can model interfaces inspired by VEP and cravat. We implement some core plugins such as annotation based on annonars/dbsnp in Rust. We provide a reference implementation of the VEP plugin NMD in Rust and Python compiled to WASM. We then explore how we can make a wrapper in the wasm layer that allows to run the VEP plugin NMD and some basic cravat plugin in Python. We will be able to create the native interface and the NMD demo in Python and Rust. The exploration can be time boxed to day one day and we can postpone. I don't know whether we will be able to expose all of VEPs data structure needed for the plugins. The strategy above allows for implementing something that should work easy enough with 98% confidence and the wrapper layer can be postponed/terminated. |
Sound good. This keeps the overhead to a minimum and allows us to create a clean plugin interface. |
I have implemented a dummy plugin + calling the plugin from mehari in the plugin-system branch, just to get a feeling for extism. |
Is your feature request related to a problem? Please describe.
Both VEP and open-cravat support plugins, which can extend annotation capabilities without requiring these to be directly integrated into the core software.
Describe the solution you'd like
mehari should offer a plugin interface with at least the features given by VEP. In the best case these should be VEP compatible.
Describe alternatives you've considered
Most software supports annotating custom tsv, but this might be too limited for most use-cases.
Additional context
First we will need to investigate the approach taken by both VEP and open-cravat for plugin support. Potentially something like wasmer might help, as a wasm intermediate step is utilized by multiple rust projects to allow for easy plugin integration without putting strong constraints on either programmming language or environment,.
The text was updated successfully, but these errors were encountered: