copairs

Find pairs and compute metrics between them.

Installation

pip install git+https://github.com/cytomining/copairs.git@v0.4.1

Usage

Data

Say you have a dataset with 20 samples taken in 3 plates p1, p2, p3, each plate is composed of 5 wells w1, w2, w3, w4, w5, and each well has one or more labels (t1, t2, t3, t4) assigned.

import pandas as pd
import random

random.seed(0)
n_samples = 20
dframe = pd.DataFrame({
    'plate': [random.choice(['p1', 'p2', 'p3']) for _ in range(n_samples)],
    'well': [random.choice(['w1', 'w2', 'w3', 'w4', 'w5']) for _ in range(n_samples)],
    'label': [random.choice(['t1', 't2', 't3', 't4']) for _ in range(n_samples)]
})
dframe = dframe.drop_duplicates()
dframe = dframe.sort_values(by=['plate', 'well', 'label'])
dframe = dframe.reset_index(drop=True)

	plate	well	label
0	p1	w2	t4
1	p1	w3	t2
2	p1	w3	t4
3	p1	w4	t1
4	p1	w4	t3
5	p2	w1	t1
6	p2	w2	t1
7	p2	w3	t1
8	p2	w3	t2
9	p2	w3	t3
10	p2	w4	t2
11	p2	w5	t1
12	p2	w5	t3
13	p3	w1	t3
14	p3	w1	t4
15	p3	w4	t2
16	p3	w5	t2
17	p3	w5	t4

Getting valid pairs

To get pairs of samples that share the same label but comes from different plates at different well positions:

from copairs import Matcher
matcher = Matcher(dframe, ['plate', 'well', 'label'], seed=0)
pairs_dict = matcher.get_all_pairs(sameby=['label'], diffby=['plate', 'well'])

pairs_dict is a label_id: pairs dictionary containing the list of valid pairs for every unique value of labels

{'t4': [(0, 17), (0, 14), (17, 2), (2, 14)],
 't2': [(1, 16), (1, 10), (1, 15), (8, 16), (8, 15), (10, 16)],
 't1': [(3, 11), (3, 5), (3, 6), (3, 7)],
 't3': [(9, 4), (9, 13), (13, 4), (13, 12), (4, 12)]}

Getting valid pairs from a multilabel column

For eficiency reasons, you may not want to have duplicated rows. You can group all the labels in a single row and use MatcherMultilabel to find the corresponding pairs:

dframe_multi = dframe.groupby(['plate', 'well'])['label'].unique().reset_index()

	plate	well	label
0	p1	w2	['t4']
1	p1	w3	['t2', 't4']
2	p1	w4	['t1', 't3']
3	p2	w1	['t1']
4	p2	w2	['t1']
5	p2	w3	['t1', 't2', 't3']
6	p2	w4	['t2']
7	p2	w5	['t1', 't3']
8	p3	w1	['t3', 't4']
9	p3	w4	['t2']
10	p3	w5	['t2', 't4']

from copairs import MatcherMultilabel
matcher_multi = MatcherMultilabel(dframe_multi,
                                  columns=['plate', 'well', 'label'],
                                  multilabel_col='label',
                                  seed=0)
pairs_multi = matcher_multi.get_all_pairs(sameby=['label'],
                                          diffby=['plate', 'well'])

pairs_multi is also a label_id: pairs dictionary with the same structure discussed before:

{'t4': [(0, 10), (0, 8), (10, 1), (1, 8)],
 't2': [(1, 10), (1, 6), (1, 9), (5, 10), (5, 9), (6, 10)],
 't1': [(2, 7), (2, 3), (2, 4), (2, 5)],
 't3': [(5, 2), (5, 8), (8, 2), (8, 7), (2, 7)]}

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
.github/workflows		.github/workflows
src/copairs		src/copairs
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

copairs

Installation

Usage

Data

Getting valid pairs

Getting valid pairs from a multilabel column

About

Releases

Packages

Languages

	plate	well	label
0	p1	w2	t4
1	p1	w3	t2
2	p1	w3	t4
3	p1	w4	t1
4	p1	w4	t3
5	p2	w1	t1
6	p2	w2	t1
7	p2	w3	t1
8	p2	w3	t2
9	p2	w3	t3
10	p2	w4	t2
11	p2	w5	t1
12	p2	w5	t3
13	p3	w1	t3
14	p3	w1	t4
15	p3	w4	t2
16	p3	w5	t2
17	p3	w5	t4

	plate	well	label
0	p1	w2	t4
1	p1	w3	t2
2	p1	w3	t4
3	p1	w4	t1
4	p1	w4	t3
5	p2	w1	t1
6	p2	w2	t1
7	p2	w3	t1
8	p2	w3	t2
9	p2	w3	t3
10	p2	w4	t2
11	p2	w5	t1
12	p2	w5	t3
13	p3	w1	t3
14	p3	w1	t4
15	p3	w4	t2
16	p3	w5	t2
17	p3	w5	t4

johnarevalo/copairs

Folders and files

Latest commit

History

Repository files navigation

copairs

Installation

Usage

Data

Getting valid pairs

Getting valid pairs from a multilabel column

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages

	plate	well	label
0	p1	w2	t4
1	p1	w3	t2
2	p1	w3	t4
3	p1	w4	t1
4	p1	w4	t3
5	p2	w1	t1
6	p2	w2	t1
7	p2	w3	t1
8	p2	w3	t2
9	p2	w3	t3
10	p2	w4	t2
11	p2	w5	t1
12	p2	w5	t3
13	p3	w1	t3
14	p3	w1	t4
15	p3	w4	t2
16	p3	w5	t2
17	p3	w5	t4