Continous Integration Tests #129

jcharkow · 2024-10-31T15:13:11Z

This adds continuous integration tests (using the existing pyprophet tests) to pyprophet using github actions.

Note this uses pytest instead of nose because nose is no longer being maintained.

Many tests are currently failing so ideally we can address why they are failing before merging.

I also migrated setup to using a .toml file as I was having difficulties with setup.py not having a numpy requirement. Note that because cython is used a setup.py is still required.

convert to .toml setup for building on github

both are required because of cython, works locally now

singjc · 2024-10-31T16:34:22Z

I think some of the issues may be related to numpy 2.0. I was looking into updating pyprophet to using numpy 2.0 a month ago or so, but got busy with other work, so had to put it on hold.

If you have time, would you be interested in taking this on or working on it together?

jcharkow · 2024-10-31T20:29:09Z

We can work on it together! I will take a look at tests/test_pyprophet_score.py and tests/test_stats.py to start with

jcharkow · 2024-10-31T21:28:36Z

I am having trouble with test_osw_4. It seems to be working fine with xgboost 1.7.3 (even with numpy 2.0) however when I upgrade to xgboost 2.0 it breaks. Any ideas?

note tests still fail though, possibly because random seed is different?

update the tests to remove old parameters and fix tests that were failing

jcharkow · 2024-11-19T23:24:37Z

Looking more into test_osw_4 it seems that it fails due to a change in XGBoost default parameters. From XGBoost 2.0.0 onwards the default tree_method is 'hist' instead of 'exact' which breaks the test. I am not sure if we should hardcode using 'exact' trees instead of the 'hist' tree to preserve the original functionality. I think that hist trees are a lot quicker though.

singjc · 2024-11-20T01:53:39Z

Looking more into test_osw_4 it seems that it fails due to a change in XGBoost default parameters. From XGBoost 2.0.0 onwards the default tree_method is 'hist' instead of 'exact' which breaks the test. I am not sure if we should hardcode using 'exact' trees instead of the 'hist' tree to preserve the original functionality. I think that hist trees are a lot quicker though.

Could we just re-run the test to regenerate the reg test outputs with using the latest version of xgboost, and then fix the version to >= 2.0.0?

jcharkow · 2024-11-20T02:30:24Z

Looking more into test_osw_4 it seems that it fails due to a change in XGBoost default parameters. From XGBoost 2.0.0 onwards the default tree_method is 'hist' instead of 'exact' which breaks the test. I am not sure if we should hardcode using 'exact' trees instead of the 'hist' tree to preserve the original functionality. I think that hist trees are a lot quicker though.

Could we just re-run the test to regenerate the reg test outputs with using the latest version of xgboost, and then fix the version to >= 2.0.0?

The problem is not that the regtest fails but rather that the command crashes. In XGBoost 2.0.0 the classifier outputs np.nan values causing a crash when trying to plot the histogram.

singjc · 2024-11-20T03:25:30Z

Looking more into test_osw_4 it seems that it fails due to a change in XGBoost default parameters. From XGBoost 2.0.0 onwards the default tree_method is 'hist' instead of 'exact' which breaks the test. I am not sure if we should hardcode using 'exact' trees instead of the 'hist' tree to preserve the original functionality. I think that hist trees are a lot quicker though.

Could we just re-run the test to regenerate the reg test outputs with using the latest version of xgboost, and then fix the version to >= 2.0.0?

The problem is not that the regtest fails but rather that the command crashes. In XGBoost 2.0.0 the classifier outputs np.nan values causing a crash when trying to plot the histogram.

Does it output other valid values, or is everything np.nan? If it's specifically the plot that's failing, we may need to adjust how that handles the np.nan.

jcharkow · 2024-11-20T13:23:14Z

Upon further investigation, this occurs because of faulty normalization. The outputted XGBoost model does not fail however, I'm assuming because there are very few points, using the hist method results in all of the decoys having the exact same score. Thus when normalization of the decoy distribution is done, the standard deviation is computed as 0, resulting in everything having either positive or negative inf

I suggest either:
A. Create a new test file that does not have this error
B. If --test flag is set then make the use the 'exact' method instead of the 'hist' method. Note that using the 'exact' method still causes the regtest to fail so we would have to update that.

I am in favor of "B" because it is easier and in real use cases where there are many more features and scores I believe XGBoost hist method will be appropriate.

This is appropriate since mean and standard deviation are always computed in the context of normalization and cannot normalize if the standard deviation is 0

jcharkow · 2024-11-20T13:49:55Z

update: test_osw_5 after fixing bugs with hyperopt now fails due to the same error. Option "B" solves this test as well however regtest would have to be updated.

jcharkow · 2024-11-21T16:45:32Z

I updated the snapshots for osw_4 and osw_5 and all tests pass now! I think this is ready to merge?

this fixes the versions so tests do not fail. Will add dependabot to update the reuqirements file

singjc

Thanks for setting the CI up! I like the change to pyproject.toml, but kind of don't like that we still have to depend/keep the setup.py for the cython stuff :\

I made some suggestions/comments, should be good to merge after these are addressed.

singjc · 2024-11-21T17:00:23Z

.github/workflows/ci.yml

+    strategy:
+      matrix:
+        os: [ubuntu-latest]
+          #os: [ubuntu-latest, windows-latest, macos-latest] # remove mac tests


Is it possible to run the CI on windows and mac as well, or does it not work?

singjc · 2024-11-21T17:08:35Z

pyproject.toml

+    "duckdb",
+    "duckdb-extensions",
+    "duckdb-extension-sqlite-scanner",


duckdb is currently only used for OSW to parquet exporting right? I'm thinking if we can create a separate dependency version, so that if someone wants to be able to export to parquet, then they can install pyprophet[parquet] or something? Just so that we reduce the number of dependencies for the main library for just performing regular scoring tsv exporting? What do you think?

From my initial tests duckdb tends to speed up sqlite statements with many table joins so I was thinking of extending its usage to scoring and tsv exporting as it is minimal changes required to do this.

pyprophet/classifiers.py

singjc · 2024-11-21T17:14:12Z

pyprophet/levels_contexts.py

@@ -184,7 +189,7 @@ def infer_proteins(infile, outfile, context, parametric, pfdr, pi0_lambda, pi0_m
    con.close()

    if context == 'run-specific':
-        data = data.groupby('run_id').apply(statistics_report, outfile, context, "protein", parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_adj, lfdr_eps, color_palette).reset_index()
+        data = data.groupby('run_id').apply(statistics_report, outfile, context, "protein", parametric, pfdr, pi0_lambda, pi0_method, pi0_smooth_df, pi0_smooth_log_pi0, lfdr_truncate, lfdr_monotone, lfdr_transformation, lfdr_adj, lfdr_eps, color_palette)


Is reset_index no longer needed?

Removing reset index is required to prevent the error

File "/home/joshua/mambaforge/envs/pyprophet_dev/lib/python3.11/site-packages/pandas/core/frame.py", line 5158, in insert raise ValueError(f"cannot insert {column}, already exists") ValueError: cannot insert run_id, already exists

Same as below. Must be a change to pandas groupby functionality

Yeah seems like it, some groupby deprecations occured for Pandas v2.2.0

Deprecated the Grouping attributes group_index, result_index, and group_arraylike; these will be removed in a future version of pandas (GH 56148)

If you don't mind, would you be able to test with a version prior to pandas v2.2.0, to see if the old code works with the .reset_index(), just so we know for sure that is the change.

pyprophet/levels_contexts.py

pyprophet/main.py

pyprophet/stats.py

currently both mac and windows tests are failing so just keep ubuntu

replace semi-supervised learning normalization with sklearn which can handle cases where std = 0

sklearn transofmration does not use degrees of freedom. Keep the normalize_score_by_decoys method but use numpy std() method

…'double' need to cast to np.float_32

singjc

Looks good for the most part. Just had one question/comment about the random_state being set to 42 instead of using the random number generator in the classifiers code? Should be good to merge after that

pyprophet/classifiers.py

jcharkow added 10 commits October 30, 2024 14:41

add continous integration

78fbcea

preinstall numpy

b34300b

remove numpy from setup

955e1f1

install numpy in setup script

a684136

convert to .toml setup

ff5804c

convert to .toml setup for building on github

remove numpy from requirement

4b6750a

just ubuntu for now

122753d

fix setup.py and .toml

b9f35af

both are required because of cython, works locally now

add line to build extension

be62dfb

Merge branch 'master' into ci

065dc28

jcharkow added 2 commits October 31, 2024 16:58

fix: stats tests

15dd65b

add pytest-regtest to workflow

c99db26

jcharkow added 3 commits November 1, 2024 10:42

update autotunning so does not fail

fdb5513

note tests still fail though, possibly because random seed is different?

fix: fix level context tests

3c8bfbd

update export-parquet tests and fix tests

d320945

update the tests to remove old parameters and fix tests that were failing

raise error if standard deviation computed is 0

71804e4

This is appropriate since mean and standard deviation are always computed in the context of normalization and cannot normalize if the standard deviation is 0

jcharkow added 2 commits November 21, 2024 11:31

test: set tree method as exact for tests

516389e

update snapshot tests

ca4a14e

jcharkow added 2 commits November 21, 2024 12:07

update actions to dependent on reuiqrements file

0704475

this fixes the versions so tests do not fail. Will add dependabot to update the reuqirements file

add dependabot

ce075f3

singjc requested changes Nov 21, 2024

View reviewed changes

jcharkow added 10 commits November 21, 2024 12:46

add tests for windows and mac

a0235e0

remove default_rng

88291fa

remove mac tests

08df7cd

remove copy of np arrays

0c342e9

remove windows tests

527c365

currently both mac and windows tests are failing so just keep ubuntu

refactor: new function for normlaizing score to decoys

9c64665

replace semi-supervised learning normalization with sklearn

9be0a9a

replace semi-supervised learning normalization with sklearn which can handle cases where std = 0

revert to numpy std

aa70dbe

sklearn transofmration does not use degrees of freedom. Keep the normalize_score_by_decoys method but use numpy std() method

minor updates to pyprophet.toml

005af6c

fix: ValueError: Buffer dtype mismatch, expected 'DATA_TYPE' but got …

81f78bf

…'double' need to cast to np.float_32

jcharkow requested a review from singjc November 28, 2024 13:41

singjc requested changes Nov 28, 2024

View reviewed changes

pyprophet/classifiers.py Show resolved Hide resolved

jcharkow requested a review from singjc November 28, 2024 23:26

singjc approved these changes Nov 29, 2024

View reviewed changes

singjc merged commit fe296fd into PyProphet:master Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continous Integration Tests #129

Continous Integration Tests #129

jcharkow commented Oct 31, 2024 •

edited

Loading

singjc commented Oct 31, 2024

jcharkow commented Oct 31, 2024

jcharkow commented Oct 31, 2024

jcharkow commented Nov 19, 2024

singjc commented Nov 20, 2024

jcharkow commented Nov 20, 2024

singjc commented Nov 20, 2024 •

edited

Loading

jcharkow commented Nov 20, 2024

jcharkow commented Nov 20, 2024

jcharkow commented Nov 21, 2024

singjc left a comment

singjc Nov 21, 2024

singjc Nov 21, 2024

jcharkow Nov 21, 2024

singjc Nov 21, 2024

jcharkow Nov 21, 2024

singjc Nov 28, 2024 •

edited

Loading

singjc left a comment

Continous Integration Tests #129

Continous Integration Tests #129

Conversation

jcharkow commented Oct 31, 2024 • edited Loading

singjc commented Oct 31, 2024

jcharkow commented Oct 31, 2024

jcharkow commented Oct 31, 2024

jcharkow commented Nov 19, 2024

singjc commented Nov 20, 2024

jcharkow commented Nov 20, 2024

singjc commented Nov 20, 2024 • edited Loading

jcharkow commented Nov 20, 2024

jcharkow commented Nov 20, 2024

jcharkow commented Nov 21, 2024

singjc left a comment

Choose a reason for hiding this comment

singjc Nov 21, 2024

Choose a reason for hiding this comment

singjc Nov 21, 2024

Choose a reason for hiding this comment

jcharkow Nov 21, 2024

Choose a reason for hiding this comment

singjc Nov 21, 2024

Choose a reason for hiding this comment

jcharkow Nov 21, 2024

Choose a reason for hiding this comment

singjc Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

singjc left a comment

Choose a reason for hiding this comment

jcharkow commented Oct 31, 2024 •

edited

Loading

singjc commented Nov 20, 2024 •

edited

Loading

singjc Nov 28, 2024 •

edited

Loading