Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overarching design for new functional testing #38

Open
fcooper8472 opened this issue Jan 26, 2021 · 14 comments
Open

Overarching design for new functional testing #38

fcooper8472 opened this issue Jan 26, 2021 · 14 comments

Comments

@fcooper8472
Copy link
Member

Some thoughts, very much open to discussion. Most relevant to @MichaelClerx @martinjrobins @ben18785.

  • Move the actual functional tests to the main PINTS repo. This will ensure all PINTS specific code is version controlled with PINTS, not somewhere else. I would imagine this being a new top level directory in the PINTS repo. This also means a single commit hash gives you a complete view into PINTS + functional testing state at once.
  • Functional testing repo would just contain code for running those tests, helping to separate code that uses PINTS from the infrastructure that runs the functional test.
  • Use GitHub pages to host a new (hugo) website for showing results, using this template: each model (mcmc_banana, nested_normal, etc) would have top-level navigation sections, with specific tests (mcmc_banana_DifferentialEvolutionMCMC, mcmc_banana_DreamMCMC, ...) being individual pages containing plots.
  • Frontpage would include an overview of any currently failing tests (with links).
  • Outputs could be customised on a per-test basis because each test would write to a dedicated table in the sqlite database: presumably functional testing would pass a database connection to the test. This eliminates the need for a hacky random JSON object, and instead each test would have a nice simple flat data table containing all relevant info.
  • Plots would use altair, would be interactive, and allow clicking through to commits, hovering over points, etc.
  • Each test would be responsible for plotting: for instance by returning a list of vega-lite specifications.
  • The tests could be run on skip, via GitHub actions: each run would cause a fresh checkout of functional testing and PINTS, and would refer to a hardcoded database location (which will grow and is not suitable for versioning).

Anyone have any major thoughts/comments on this as a basic structure?

@ben18785
Copy link

Thanks @fcooper8472 -- that all sounds good to me. The one thing I'm less sure about is having a separate table for each test since I see a lot of overlap between the outputs of them. That said, if having the tables separately means that it'll be easier to add tests that return quite different measures, then perhaps this is easiest.

@fcooper8472
Copy link
Member Author

Yep, there's a lot of overlap, e.g. git commit hash, but at the moment everything's just shoved into a single table as extra lines, so you're not storing any extra data having different tables.

I would imagine having some kind of payload object with all the overlapping fields filled in by the base class to enforce some kind of extensible uniformity. Haven't thought through all the details yet though.

@MichaelClerx
Copy link
Member

Thanks @fcooper8472 ! Does sound very good!

The only bit I'm unsure about is the first point:

  • Will tests not require packages not included in PINTS proper? E.g. vega, or some form of db access? Or if they don't do any db writing, then presumably they need to implement an interface from the functional testing package?
  • Some other things look pints-specific to me e.g. knowing where to look for tests, knowing the mcmc_ scheme, knowing what the table looks like?

Or are you thinking these will all be "settings" in PINTS installation of the FT project?

@MichaelClerx
Copy link
Member

MichaelClerx commented Jan 26, 2021

(Incidentally, I wouldn't do away with storing the commit hash of FT. Still keep it as meta-data. But yeah having it as a dual key system was probably overkill.)

@MichaelClerx
Copy link
Member

It would greatly add to the value of FT, I agree, if it was something you could readily add to any project :D

@fcooper8472
Copy link
Member Author

Having though about this some more, here is my next iteration of thoughts:

  • Let's keep the tests separate from PINTS
  • Functional testing needs to be run from the PINTS repo, or there is no sane way to run it on just push to master on PINTS. This would mean writing a functional testing GitHub workflow on PINTS that just checks out functional-testing, with a token if necessary to push changes
  • I want to try and let GitHub run the functional tests (rather than running them on Skip). Each run can go for up to 6 hours on GH and I think if we're not well inside that we're probably doing something wrong!
  • Let's stop over-thinking how we store results. Instead of having a database that sits somewhere that we have interact with, let's just keep a data directory in functional testing and the results in plain text csv files that get versioned with functional testing. Each test will have a csv file with whatever columns make sense for that test. Common information between tests (commit hash, seed etc) can be in a main csv, and referencing between can be via commit hash.
  • Each run of functional testing, run from PINTS, will checkout functional-testing, run the tests, add (currently 4) rows to the csv files, rebuild the hugo website, and push all the changes.

The main problem I'm having now is navigating the existing functional testing code. I just cannot understand how it's supposed to work. I think you'll have to give me a tutorial @MichaelClerx.

@ben18785
Copy link

ben18785 commented Jan 31, 2021 via email

@MichaelClerx
Copy link
Member

Thanks Fergus!
I'm a bit hesitant to go back to CSVs, after we changed from CSV (one per test run) to a DB, but perhaps one file per test is more workable. Ideally we'd still be able to combine results from multiple nodes though, but we can maybe just use a hostname in each file or something, and then load multiple files in (if available) during analysis?

Re: tour. Happy to! But the current code needs some heavy reworking anyway, I find any time I touch it :D

@MichaelClerx
Copy link
Member

I'm free after 11 today or else we can do it tomorrow morning?

@iamleeg
Copy link
Contributor

iamleeg commented Feb 1, 2021

Let's stop over-thinking how we store results.

In defence of my original level of thinking, the point of the database was to be able to correlate test results longitudinally so that statistical measures of tests that relied on some random input could be obtained. You could achieve that with a collection of CSVs keyed by git hash, though it'd be more work to use the git tree and CSV files to reconstruct the history.

@fcooper8472
Copy link
Member Author

The main problem I'm suggesting we try to solve is that we currently have a database that needs to exist somewhere.

If we want to run a test on GitHub actions, that has to be somewhere that we can read from and write to. Any every physical machine that might want to run tests will need access to wherever the database is kept.

We could stick the database in GitHub, but it will change every run and need to be stored entirely every time. So it seems like some plaintext format is the way to go: we would only be versioning the next set of results.

I'm not very familiar at all with databases: what kind of operations are you thinking of that are better suited to a database than, say, csv + pandas?

@iamleeg
Copy link
Contributor

iamleeg commented Feb 1, 2021

Like I say, you'll be able to do it in either, but if you want to correlate test results across runs either way then you'll have to have a location for writable storage available between runs whatever you're storing.

@martinjrobins
Copy link
Member

doesn't have to be a fancy database even, could even be a google/MS spreadsheet(s)?

@fcooper8472
Copy link
Member Author

The problem isn't whether it's SQLite vs excel vs google sheets, it's whether we can read from and write to whatever that source is easily.

I'm suggesting versioning the data with the functional testing repo is the obvious (and simplest) solution. But I'm very much open to suggestions if there's a simple (& free) way of doing it another way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: To do
Development

No branches or pull requests

5 participants