Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Better commenting for component selection algorithm in selcomps #91

Merged
merged 5 commits into from
Jul 21, 2018

Conversation

handwerkerd
Copy link
Member

Progress towards #84 .

Changes proposed in this pull request:

  • The first half of the selcomps function is commented in much greater detail. There is more commenting and documenting work to be done in selcomps. Still, since this doesn't affect a single line of code, I figured it is better to submit a PR now, so that others can have easier access to this information.

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much, @handwerkerd, for going through this ! I've pointed out a few minor typos, but this is hugely helpful.

Do you want to leave this open as a [WIP] PR, if you're planning to come back to it ? Or do you want someone else to pick up from here ?

My one general question: there's some information in here that I think we should highlight in the readthedocs site, rather than only in the code base itself. WDYT about including those changes in this PR ?

are part of the decision tree. Certain comments are prefaced with METRIC
and variable names to make clear which are metrics and others are prefaced
with SELECTION to make clear which are for applying metrics. METRICs tend
to be summary values that containt a signal number per component. These
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "containt" --> "contain"

"""

"""
There are sections of this code that calculate metrics that are used in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of moving these metrics into the sphinx documentation, rather than as code comments ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the eventual goal, but it's a bit tricky when the code is only half-commented. Until there's full documentation of how each metric is or isn't used, making something seem definitive int he sphinx documentation could cause more harm than help. For example, there are some metrics which I thought would be used one way, but they were transformed into another metric & used in more component selection steps than I realized. In that case, saying Metric X is used in selection step Y would be wrong. My hope is, once the full process is better commented in the code, it will be relatively easy to summarize those comments in end-user documentation.

"""
countsigFS0 = seldict['F_S0_clmaps'].sum(0)
countsigFR2 = seldict['F_R2_clmaps'].sum(0)
countnoise = np.zeros(len(nc))

"""
Make table of dice values
METRICS: dice_tbl
dice_FR2, dice_FS0 are calculated for each component and the concatinated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "concatinated" --> "concatenated"

Vz: The z-scored the natural log of the non-normalized variance explained
of each component
Ktz: The z-scored natural log of the Kappa values
(the '/ 2' doesnot seem necessary beacuse it will be removed by z-scoring)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "doesnot" --> "does not"

# handwerkerd: This actually seems like a way to avoid using the theoretically
# most liberal threshold only when there was a bad estimate and _mod is
# is more inclusive. My one concern is that it's an odd way to test that
# the _mod elbow is any better. WHy not at least see if _mod < _cons?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "WHy" --> "Why"

cond1 = getelbow_cons(seldict['Kappas']) > KRcut * 2
cond2 = getelbow_mod(seldict['Kappas'], val=True) < F01
if cond1 and cond2:
Kcut = getelbow_mod(seldict['Kappas'], val=True)
else:
Kcut = getelbow_cons(seldict['Kappas'], val=True)
# only use inclusive when exclusive is extremely exclusive - half KRcut
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to remove this comment ? It might be helpful to have the original explanation, even if we decide that it doesn't make sense (for us).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to expand the purpose of that comment since it's not clear how inclusive & exclusive are defined. I felt like the following blocks of comments added more detail and replaced this one. If you think the line I deleted contains unique and relevant info, I can add it back in.
https://github.com/handwerkerd/tedana/blob/4c65baf8873ebbc4965820101b006c2cc061c4b7/tedana/selection/select_comps.py#L399-L409
https://github.com/handwerkerd/tedana/blob/4c65baf8873ebbc4965820101b006c2cc061c4b7/tedana/selection/select_comps.py#L415-L419
https://github.com/handwerkerd/tedana/blob/4c65baf8873ebbc4965820101b006c2cc061c4b7/tedana/selection/select_comps.py#L426-L432

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about keeping it with something like "Originally written by PK:" at the start of the comment ? That way, if in finishing commenting this code we decide that it adds something helpful, we can work it back in -- and otherwise delete it. Does that seem reasonable, or do you think it's confusing to have even now ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I also removed some comments on Step numbers since it was completely unclear to me why a bunch of stuff was or wasn't considered part of each step. I'll add those back in as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable to me, at least for the time being (where we're denoting them as "originally written by" or whatever you think best). Thanks again, @handwerkerd !

rejB = ncl[utils.andb([tt_table[ncl, 0] < 0,
seldict['varex'][ncl] > np.median(seldict['varex']), ncl > KRcut]) == 3]
rej = np.union1d(rej, rejB)
# adjust ncl again to only contain the remainng non-rejected components
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "remainng" --> "remaining"

ncl = np.setdiff1d(ncl, rej)

# This is where handwerkerd has paused in hypercommenting the function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤣

@emdupre
Copy link
Member

emdupre commented Jul 16, 2018

Also, I (finally) fixed the circle error. Would you be able to merge in the current master branch ? LMK if I can help in that !

@handwerkerd
Copy link
Member Author

I'd rather merge before the commenting is complete because there's no harm to adding it now and it would open up this commenting effort for others' contributions. I'd love if others continued this commenting, but I'll try to pick it up again next time I have a block of time to focus on this. I will correct the typos, do one more read-through for clarity, and add some detail to the opening docstring before merging the code.

@rmarkello
Copy link
Member

I just wanted to jump in to say how unbelievable this is, @handwerkerd ! 🎉 💯 Having tried to work through pieces of selcomps() before I know how cumbersome it can be, so to see this level of detail is really amazing !! This will be a huge boon as we figure out how best to modularize / modify the function. 👩‍💻

@codecov
Copy link

codecov bot commented Jul 20, 2018

Codecov Report

Merging #91 into master will not change coverage.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #91   +/-   ##
=======================================
  Coverage   27.95%   27.95%           
=======================================
  Files          25       25           
  Lines        1381     1381           
=======================================
  Hits          386      386           
  Misses        995      995
Impacted Files Coverage Δ
tedana/model/fit.py 8.08% <ø> (ø) ⬆️
tedana/selection/select_comps.py 0% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2fa0136...37a5431. Read the comment docs.

@handwerkerd
Copy link
Member Author

I've added info to the doctoring and done another round of proofreading. If no one sees any additional issues, it should be possible to merge.

Copy link
Member

@emdupre emdupre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @handwerkerd ! Merging.

@emdupre emdupre merged commit bfaec06 into ME-ICA:master Jul 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants