medkit


CI
Package
Project

medkit is a toolkit for a learning health system, developed by the HeKA research team.

This python library aims at:

Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.
Developing supervised models from these various modalities for decision support in healthcare.

Installation

To install medkit with basic functionalities:

pip install medkit-lib

To install medkit with all its optional features:

pip install 'medkit-lib[all]'

Example

A basic named-entity recognition pipeline using medkit:

# 1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES
from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer
from medkit.text.context.negation_detector import NegationDetector
from medkit.text.ner.hf_entity_matcher import HFEntityMatcher

# Preprocessing
char_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)
# Segmentation
sent_tokenizer = SentenceTokenizer(output_label="sentence")
synt_tokenizer = SyntagmaTokenizer(output_label="syntagma")
# Negation detection
neg_detector = NegationDetector(output_label="is_negated")
# Entity recognition
entity_matcher = HFEntityMatcher(model="my-BERT-model", attrs_to_copy=["is_negated"])

# 2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep

ner_pipeline = Pipeline(
    input_keys=["full_text"],
    output_keys=["entities"],
    steps=[
        PipelineStep(char_replacer, input_keys=["full_text"], output_keys=["clean_text"]),
        PipelineStep(sent_tokenizer, input_keys=["clean_text"], output_keys=["sentences"]),
        PipelineStep(synt_tokenizer, input_keys=["sentences"], output_keys=["syntagmas"]),
        PipelineStep(neg_detector, input_keys=["syntagmas"], output_keys=[]),
        PipelineStep(entity_matcher, input_keys=["syntagmas"], output_keys=["entities"]),
    ],
)

# 3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter

docs = BratInputConverter().load(path="/path/to/dataset/")
entities = ner_pipeline.run([doc.raw_segment for doc in docs])

Getting started

To get started with medkit, please checkout our documentation.

This documentation also contains tutorials and examples showcasing the use of medkit for different tasks.

Contributing

Thank you for your interest into medkit !

We'll be happy to get your inputs !

If your problem has not been reported by another user, please open an issue, whether it's for:

reporting a bug,
discussing the current state of the code,
submitting a fix,
proposing new features,
or contributing to documentation, ...

If you want to propose a pull request, you can read CONTRIBUTING.md.

Contact

Feel free to contact us by sending an email to medkit-maintainers@inria.fr.

Name		Name	Last commit message	Last commit date
Latest commit History 1,691 Commits
.github		.github
docs		docs
medkit		medkit
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
hatch.toml		hatch.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

medkit

Installation

Example

Getting started

Contributing

Contact

About

Releases 7

Contributors 9

Languages

License

medkit-lib/medkit

Folders and files

Latest commit

History

Repository files navigation

medkit

Installation

Example

Getting started

Contributing

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 7

Contributors 9

Languages