Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: allow simphenotype to accept TR PGENs without --repeats #263

Merged
merged 2 commits into from
Nov 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/api/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ The following methods from the :class:`Genotypes` class are disabled, however.
1. ``check_biallelic``
2. ``check_maf``

The constructor of the :class:`GenotypesTR` class also includes a :code:`vcftype` parameter. This can be helpful when the type of the TR file cannot be inferred automatically. Refer to `the TRTools docs <https://trtools.readthedocs.io/en/stable/trtools.utils.tr_harmonizer.html#trtools.utils.tr_harmonizer.VcfTypes>`_ for a list of accepted types.

.. _api-data-genotypestr:

GenotypesPLINK
Expand Down
11 changes: 11 additions & 0 deletions docs/formats/genotypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,14 @@ To convert a VCF containing tandem repeats to PGEN, use this command, instead.
.. code-block:: bash

plink2 --vcf-half-call m --make-pgen 'pvar-cols=vcfheader,qual,filter,info' --vcf input.vcf --make-pgen --out output

Tandem repeats
~~~~~~~~~~~~~~
VCFs containing tandem repeats usually have a *type* indicating the tool from which they originated. We support whichever types are supported by `TRTools <https://trtools.readthedocs.io/en/stable/CALLERS.html>`_.

We do our best to infer the *type* of a TR VCF automatically. However, there will be some instances when it cannot be inferred.
Users of TRTools know to specify :code:`--vcftype` in that situation. However, most haptools commands do not yet support the :code:`--vcftype` parameter. Instead, you can modify the header of your VCF to trick haptools into being able to infer the *type*.

For example, if you're using HipSTR, you can add :code:`##command=hipstr...`. Refer to `this code in TRTools <https://trtools.readthedocs.io/en/stable/trtools.utils.tr_harmonizer.html#trtools.utils.tr_harmonizer.InferVCFType>`_ for more details.

Please note that all of this also applies to PVAR files created from TR VCFs.
8 changes: 6 additions & 2 deletions haptools/sim_phenotype.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,8 +405,12 @@ def simulate_pt(
# check if these are all repeat IDs, haplotype IDs, or a mix of them
if len(hp.type_ids["R"]) >= len(haplotype_ids) and repeats is None:
# if they're all repeat IDs or --repeats was specified
log.info("Loading TR genotypes")
gt = GenotypesTR(fname=genotypes, log=log)
if genotypes.suffix == ".pgen":
log.info("Loading TR genotypes from PGEN file")
gt = GenotypesPLINKTR(fname=genotypes, log=log, chunk_size=chunk_size)
else:
log.info("Loading TR genotypes from VCF/BCF file")
gt = GenotypesTR(fname=genotypes, log=log)
load_as_haps = False
else:
# the genotypes variable must contain haplotype genotypes
Expand Down
11 changes: 11 additions & 0 deletions tests/test_simphenotype.py
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,17 @@ def test_repeat(self, capfd):
assert captured.out
assert result.exit_code == 0

def test_repeat_pgen(self, capfd):
gt_file = DATADIR / "simple-tr.pgen"
hp_file = DATADIR / "simple_tr.hap"

cmd = f"simphenotype --id 1:10114:GTT {gt_file} {hp_file}"
runner = CliRunner()
result = runner.invoke(main, cmd.split(" "), catch_exceptions=False)
captured = capfd.readouterr()
assert captured.out
assert result.exit_code == 0

def test_repeat_with_hapgts(self, capfd):
tmp_transform = Path("temp-transform.vcf")
with open(tmp_transform, "w") as file:
Expand Down
Loading