Skip to content

Sequential Design of Experiments Test Plan

Keith Beattie edited this page Jan 27, 2021 · 1 revision

The sequence of development for each of the USF, NUSF and IRSF capabilities in the SDoE module followed the same pattern, with four stages as follows:

  1. New methodology and supporting code were developed by the team of LANL statisticians. The code was written in R and thoroughly tested to confirm that it was producing the desired results.

  2. Once there was working and validated R code, the code was translated from R to python (by the LANL team). This code was tested by comparing its results to those obtained in R. Testing performed by Christine (R code developer) and Towfiq (python code developer).

  3. The validated python code was transferred to LLNL for GUI development and connecting the internal python code to the GUI.

  4. The completed version of the module with working code and GUI was given to the LANL team, who performed extensive testing of the different capabilities of the module to verify that it was able to generate a diverse set of designs. Testing performed by 2 developers and 2 internal users.

Testing in Stage 2 involved the following:

  • Running examples with low input dimensions (2 inputs) and very small candidate sets to verify that the exact same answer is obtained between R and python

  • For various input dimensions and different sized candidate sets, run a scenario 3 times for R and 3 times for python code

    • Compare criterion values between R and python
    • Compare look of designs between R and python
    • Do more detailed comparisons for individual functions within code if there are systematic differences in performance

Testing in Stage 4 involved the following focuses:

  • Compare results from stage 2 (R and python code) to current results for several examples

  • Running examples with low input dimensions (2 inputs) and very small candidate sets to verify that the exact same answer is obtained in multiple trials

  • Running examples with a variety of input dimensions (2,3,4,5,6) and different sized candidates (50-10000) (this can be achieved by using a single example with a high dimension of inputs and a large candidate set, and then selecting only a subset of the columns and/or rows)

    • Verify consistency of timing
    • Look at consistency of criterion value (run the same set-up for the design multiple times, and see how consistently the algorithm is finding the best (or nearly best) design(s)
    • Verify that the design has appropriate properties (good space filling in all of the regions)
  • Having multiple testers create candidate sets and generate designs from scratch to test robustness of how inputs are presented

  • Testing design creation when given a candidate set that is atypical, such as a nonrectangular candidate set, or one with large areas removed.

  • Look for handling of errors (reasonable error messages and elegant exit from function) for:

    • Missing inputs (candidate set for all (USF, NUSF, IRSF), previous data for all, weight for NUSF, response for IRSF)
    • Non-standard inputs
    • Mismatch of columns between candidate set and previous data
    • Missing values
    • Checking/unchecking “include” for various columns
    • Using the undo feature to move through the sequence of choices in non-linear order
  • Evaluate how many random starts are needed to generate a “nearly best” design. This involves considering multiple numbers of random starts, each run several times, to see how the final design criteria compare. We used these results to provide the recommended default setting for number of random starts.

  • Examine consistency and desirability of the presentation of the results, which consist of a summary table and graphical summaries of the created design.

  • Verify estimated timing provided by FOQUS for different numbers of random starts

  • Check that the design ordering tool works for all types of designs

  • Check that designs using the same candidate set change as expected when altering particular design choices, such as

    • The design size
    • The optimality method (in USF)
    • The method of weight scaling and choice of maximum weight ratio (in NUSF)
    • The number of random starts selected when creating the design