[DRAFT] Issue 221/plot refactoring #224

ndphillips · 2024-05-24T11:19:14Z

Addressing #221

Function status

Checked means

Function created
Implemented in plot.FFTrees()
Legacy code removed

…l and noise levels

…plot.FFTrees() to fftrees_apply()

ndphillips · 2024-05-24T19:56:27Z

Phew I knew this would be a long slog of a PR but I didn't realize it would be so much! There are a lot of unexpected processes going on in this function that is making it hard to cleanly separate the plot into individual plotting functions.

That said, I've been able to create some initial functions for all of the plotting elements listed above and integrate them into plot.FFTrees().

The next big step is to make the individual functions more user friendly and with a consistent API between them. There are lots of decisions to be made about what the minimal arguments should be to each. For example, it's not clear to me which functions should require FFTrees objects as arguments and which should take data frames.

Currently, here's where I'm at with plot_fft(). In my first version, the main argument is a dataframe of level.stats (which I pull out of an FFTrees object using a new pluck_level_stats() helper function:

# Create an FFTrees object from the heartdisease data:
heart_fft <- FFTrees(formula = diagnosis ~.,
                     data = heart.train,
                     data.test = heart.test,
                     decision.labels = c("Healthy", "Disease"))

heart_fft |>
  pluck_level_stats(data = "test", tree = 1) |>
  plot_fft()

Will keep working on this when I find the time

hneth · 2024-05-26T08:13:53Z

Yes, it's quite a challenge to break up the older plotting function into functional parts. But it's great that you're wondering about clean input arguments, rather than just dropping the entire FFTrees object into every function!

A tricky aspect of the complex plot is that some parts only require the FFT definition, whereas others require aspects of the current data or the stats from a particular evaluation. If the plotting functions only get the input parts they actually need, this requires helper functions that extract or translate parts of the FFTrees object into simpler data structures. I like the approach you're taking with pluck_level_stats(), and think it is similar to what I tried to do with read_fft_df() (which selects a single FFT from a set of FFT definitions and converts it into a tidy data format (in which each row represents a node). Having such helper functions allows to access and manipulate aspects of FFTs that otherwise remain encapsulated in the FFTrees object.

The plot_fft() function looks great, of course, but I first was surprised that it requires level_stats as an input, rather than just an FFT definition like the tidy data frame obtained from read_fft_df(). But then I realized that we often want to enrich the basic FFT representation with stats from its evaluation (like the icon arrays at exits). In a completely modular setup, plotting the tree itself could be entirely separated from its evaluation. This would allow for even simpler tree plotting functions, but would require additional functions for adding aspects of tree performance to the plot. Hence, using level_stats as input may be a good intermediate level of abstraction?

On a similar note, we should add some option for representing the handling NA values in the data to the plot. (For instance, we could note their number for each predictor at its node, and somehow mark or separate the icons that were classified from NA cases on the final node.) But when level_stats preserve their counts, this should be implementable from your current approach?

ndphillips · 2024-05-28T21:58:02Z

Related to this PR, I just drafted a design doc describing how we could refine the structure of FFTrees internal objects to improve consistency and facilitate better object <> function mappings https://github.com/ndphillips/FFTrees/wiki/%5BDraft%5D-FFTrees-Object-Design

hneth · 2024-05-30T08:03:33Z

Thanks for a great analysis and overview (at #226)! But I'm also wondering whether we really want to risk a major re-write at this point? Basically, I see that there is a general trade off between objects, functions, and related data structures. It's certainly true that the current FFTrees object is large, partially inconsistent, and intransparent, but it also contains all the information needed in convenient list structures. Extracting and modifying parts of an object is now possible through dedicated helper functions (like the ones presently used to extract and edit FFT-definitions). Moving towards more of an OOP approach would have long-term benefits, of course, but would also require a complete re-write of the object's interactions (in creating and repeatedly evaluating FFTs on data). Although I applaud your enthusiasm, I'm also skeptical for 2 reasons:

After the last major overhaul, many parts of the original version (e.g. applying new FFT definitions to data) were broken and had to be restored. I fear that revising the object structure would take tremendous efforts and have similar repercussions.
Implementing a new structure of objects would still require managing and changing those objects. For instance, I've recently been experimenting with additional exits — e.g., for "undecided" cases — which would then require modifying the node, outcome, and cost aspects of objects and their corresponding interactions.

Overall, I see the deficits of our current setup, but as I'm also happy with many parts of FFTrees v2.0.0, I'm reluctant to jump into a major overhaul. Despite some enticing benefits, I feel that there are plenty of features that can still be added and improved within the current framework. And as I fear that restoring all present functionality would take months, rather than weeks, I'm wondering: Do we really want to fundamentally change a working system?

ndphillips · 2024-05-30T12:18:19Z

Thanks for the thoughts Hans, I hear your concerns and definitely appreciate them:

Some fast initial responses to your 2 main concerns:

Losing functionality. We definitely don't want to lose important functionality! Can you confirm if all of our critical functionality is captured within tests? If No, then perhaps a first step should be to update our tests!
Managing and changing objects. It seems like you're concerned that this proposed re-write would prevent or hinder future development. Is that right? If so, I want to make sure we've captured those concerns in the design and reduced that risk. My strong hope is that this redesign will make future development easier, not harder. Can you capture some specific examples of functionality here or in https://github.com/ndphillips/FFTrees/wiki/%5B80%25%5D-FFTrees-Object-Design that you are concerned would be affected?

hneth · 2024-05-30T19:39:17Z

Just a quick reply to your 2 questions:

No, the most recent additions of functionality (i.e., A: extracting and editing tree descriptions and applying them to data, and B: creating and evaluating FFTs for datasets with NA values) are not covered by tests yet. Parts of A are used in examples and vignettes, I believe, but B is still too recent to be properly evaluated and documented.
I actually think that well-designed objects would facilitate future changes, rather than make them more difficult. But I am afraid of the repercussions of overhauling the entire package, given its complexity and our scarcity of resources (mostly limited time).

The objects you envisage on https://github.com/ndphillips/FFTrees/wiki/%5B80%25%5D-FFTrees-Object-Design appear quite comprehensive, of course. Some additional slots for recording NA values (when they occur at nodes and eventually result in decisions) could easily be added. However, many of the listed objects could be defined and stored more sparsely. For instance, whenever we have the 4 outcomes of the 2x2 matrix (number of hi, mi, fa, cr) all accuracy-related measures (simple ones like n_decide, n_correct, n_error, all variants of acc, and the more specific measures of sens, spec, ppv, npv, dprime, etc.) can easily be derived from them. Hence, I don't think we always need to generate and store all of them in the objects, but rather provide functions for obtaining them when needed.

ndphillips · 2024-05-31T18:39:29Z

Thanks Hans!

I created an issue at Test needed for extracting and editing tree descriptions and applying to data #227 capturing the extracting and editing tree descriptions and applying them to data functionality that's missing tests. I believe it's super important that we have tests capturing functionality we don't want to break!
I hear you that the changes like the ones I propose in https://github.com/ndphillips/FFTrees/wiki/%5B80%25%5D-FFTrees-Object-Design seem daunting and risky. However, I'm confident that as long as our package tests capture our core functionality, and nothing gets merged until it's reviewed, I'm up to the task!
Thanks for the ideas regarding ways to improve the sparseness of the proposal. I've added a note in https://github.com/ndphillips/FFTrees/wiki/%5B80%25%5D-FFTrees-Object-Design#notes

Let's continue this discussion at #226

ndphillips added 2 commits May 23, 2024 09:28

ran styler on plotfftrees

e3dc38b

created plot_level_bar() helper fun and used in plotFFTrees for signa…

eb5c356

…l and noise levels

ndphillips marked this pull request as draft May 24, 2024 11:19

replaced all level bars in plot.FFTrees() with new helper function

ddd5103

ndphillips mentioned this pull request May 24, 2024

plot.FFTrees() is not written in a modular way and should be refactored #221

Open

ndphillips added 8 commits May 24, 2024 10:17

created plot_icon_arrays() function and replaced legacy code

a076e71

created plot_fft() and used in plot.FFTrees()

a340509

created plot_confusion()

691dcdd

created plot_roc

b8cd45e

more updates to plot_roc

c1672dd

lots of plotting updates. moved some level_stats data wrangling from …

4f1deb4

…plot.FFTrees() to fftrees_apply()

added assertthat to imports

a913f13

bug fix

67475bb

ndphillips mentioned this pull request May 29, 2024

Ideas for refining the structure of FFTrees objects #226

Open

updates to n_per_icon

efa83c1

ndphillips mentioned this pull request May 31, 2024

Test needed for extracting and editing tree descriptions and applying to data #227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] Issue 221/plot refactoring #224

[DRAFT] Issue 221/plot refactoring #224

ndphillips commented May 24, 2024 •

edited

Loading

ndphillips commented May 24, 2024 •

edited

Loading

hneth commented May 26, 2024

ndphillips commented May 28, 2024

hneth commented May 30, 2024

ndphillips commented May 30, 2024 •

edited

Loading

hneth commented May 30, 2024

ndphillips commented May 31, 2024 •

edited

Loading

[DRAFT] Issue 221/plot refactoring #224

Are you sure you want to change the base?

[DRAFT] Issue 221/plot refactoring #224

Conversation

ndphillips commented May 24, 2024 • edited Loading

ndphillips commented May 24, 2024 • edited Loading

hneth commented May 26, 2024

ndphillips commented May 28, 2024

hneth commented May 30, 2024

ndphillips commented May 30, 2024 • edited Loading

hneth commented May 30, 2024

ndphillips commented May 31, 2024 • edited Loading

ndphillips commented May 24, 2024 •

edited

Loading

ndphillips commented May 24, 2024 •

edited

Loading

ndphillips commented May 30, 2024 •

edited

Loading

ndphillips commented May 31, 2024 •

edited

Loading