Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distance matrix as output? #11

Open
jzrapp opened this issue Jan 20, 2023 · 1 comment
Open

Distance matrix as output? #11

jzrapp opened this issue Jan 20, 2023 · 1 comment

Comments

@jzrapp
Copy link

jzrapp commented Jan 20, 2023

Hi @plpla,

I was wondering whether it would be possible to somehow retrieve the distance matrix as an output that your tools creates as a temporary file? Would be great to have for other analyses!

Thanks a lot,
Josephine

@nvaulin
Copy link

nvaulin commented Jul 19, 2023

Hi, @jzrapp
I am not affiliated with the authors of CRISPRStudio, but the answer to your question is yes. Are you talking about the distance matrix returned by the table_to_distances(table, pairwise_distance_fn) function on lines 415-422? If so, see what could be done:

From the scikit-bio docs one can see that there is a to_file method of the DistanceMatrix object.

Thus simply edit the abovementioned function from that:

    def table_to_distances(table, pairwise_distance_fn):
        sample_ids = table.columns
        num_samples = len(sample_ids)
        data = zeros((num_samples, num_samples))
        for i, sample1_id in enumerate(sample_ids):
            for j, sample2_id in enumerate(sample_ids[:i]):
                data[i,j] = data[j,i] = pairwise_distance_fn(table, sample1_id, sample2_id)
        return DistanceMatrix(data, sample_ids)

To that:

    def table_to_distances(table, pairwise_distance_fn):
        sample_ids = table.columns
        num_samples = len(sample_ids)
        data = zeros((num_samples, num_samples))
        for i, sample1_id in enumerate(sample_ids):
            for j, sample2_id in enumerate(sample_ids[:i]):
                data[i,j] = data[j,i] = pairwise_distance_fn(table, sample1_id, sample2_id)
        dm = DistanceMatrix(data, sample_ids)
        dm.to_file(YOUR_FILENAME)
        return  dm

To make this edits i suppose it is easier to do it before installation:

git clone https://github.com/moineaulab/CRISPRStudio.git
cd CRISPRStudio
nano CRISPR_Studio_1.0.py # here edits the lines, you can use any text editor you prefer
./Install.sh

Maybe it's also worth to make this filename as an additional tool parameter for you. If you will be interested, i can comment on that.

Sincerely,
Nikita

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants