-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abbreviate coefficient names and similar in MCMC diagnostics by default? #500
Comments
Thanks @krivit. Much as I like the cleanness created in the tables by the short terms, they seem a bit hard to parse for the average user. For instance, where eoes "ce3" come from? I see that it's value 3 of that nodefactor, but ce? I'm also left with lots of other questions for more complex cases. What if there are multiple nodefactor terms? What if the names of the levels are strings rather than integers <10? Etc. What about all the many hundreds of ergm terms, many with similar names? How would you fit all of these into 4-element character strings? |
I agree with @sgoodreau -- such abbreviations might create even more confusion. BUT, I do recognize the problem. I think the tables are acceptable whenever the terms appear in rows. Perhaps we can transpose those in which terms are in columns to avoid breaking wide tables to multiple row sets. For example:
With the above the output will be longer but with much less table-wrapping (not reader-friendly). What do you think? |
... and all the very best and congratulations to everybody on the occasion of opening the half-millenial issue of the package ergm 🥇 🎆 🥳 😃 |
Wait, what? |
I agree with both @sgoodreau and @mbojan in terms of the name abbreviations. I like the suggestion to reduce output to 3 digits after the decimal. |
@martinamorris , I meant this is the issue number 500. |
Which output is more compact depends on the number of columns we can fit. That having been said, I think I chose the abbreviation settings poorly. How about this? suppressPackageStartupMessages(library(ergm))
dummy <- capture.output(suppressMessages(example(anova.ergm)))
mcmc.diagnostics(fit2, which="text", compact=4)
#> Sample statistics summary:
#>
#> Iterations = 7168:131072
#> Thinning interval = 512
#> Number of chains = 1
#> Sample size per chain = 243
#>
#> 1. Empirical mean and standard deviation for each variable,
#> plus standard error of the mean:
#>
#> Mean SD Naive SE Time-series SE
#> edges 0.7984 6.474 0.4153 0.4153
#> nodefactor.atomic type.2 0.7243 5.950 0.3817 0.3817
#> nodefactor.atomic type.3 0.4650 4.868 0.3123 0.3123
#> gwesp.fixed.0.5 1.7948 9.985 0.6405 0.6405
#>
#> 2. Quantiles for each variable:
#>
#> 2.5% 25% 50% 75% 97.5%
#> edges -11.00 -4 1.0000 5.000 13.00
#> nodefactor.atomic type.2 -10.95 -3 0.0000 5.000 11.00
#> nodefactor.atomic type.3 -8.00 -3 0.0000 3.000 10.00
#> gwesp.fixed.0.5 -14.39 -6 0.9418 8.947 24.41
#>
#>
#> Are sample statistics significantly different from observed?
#> edgs n.t.2 n.t.3 g..0 (Omni)
#> diff. 0.798 0.724 0.47 1.7948 NA
#> test stat. 1.922 1.898 1.49 2.8021 9.604
#> P-val. 0.055 0.058 0.14 0.0051 0.053
#>
#> Sample statistics cross-correlations:
#> edgs n.t.2 n.t.3 g..0
#> edgs 1.00 0.78 0.72 0.89
#> n.t.2 0.78 1.00 0.35 0.75
#> n.t.3 0.72 0.35 1.00 0.60
#> g..0 0.89 0.75 0.60 1.00
#>
#> Sample statistics auto-correlation:
#> Chain 1
#> edgs n.t.2 n.t.3 g..0
#> Lag 0 1.000 1.0000 1.0000 1.0000
#> Lag 512 0.029 -0.0011 -0.0337 0.0022
#> Lag 1024 0.059 0.1120 0.0138 0.1011
#> Lag 1536 -0.034 0.0392 -0.0576 -0.0443
#> Lag 2048 -0.075 -0.1001 -0.1065 -0.0707
#> Lag 2560 -0.028 0.0029 0.0028 -0.0222
#>
#> Sample statistics burn-in diagnostic (Geweke):
#> Chain 1
#>
#> Fraction in 1st window = 0.1
#> Fraction in 2nd window = 0.5
#>
#> edgs n.t.2 n.t.3 g..0
#> 1.09 0.19 1.12 0.70
#>
#> Individual P-values (lower = worse):
#> edgs n.t.2 n.t.3 g..0
#> 0.27 0.85 0.26 0.48
#> Joint P-value (lower = worse): 0.35
#>
#> Note: MCMC diagnostics shown here are from the last round of
#> simulation, prior to computation of final parameter estimates.
#> Because the final estimates are refinements of those used for this
#> simulation run, these diagnostics may understate model performance.
#> To directly assess the performance of the final model on in-model
#> statistics, please use the GOF command: gof(ergmFitObject,
#> GOF=~model). Created on 2023-01-19 with reprex v2.0.2 |
IMHO it compounds the earlier problem described by @sgoodreau ... What about:
People work on larger and larger screens and resolutions so R console gets wider. Perhaps we're trying to fix a non-problem really? Outputs of |
Like the idea, still not the proposed implementation. But if there are
others not as concerned with interpretation (which is what the coef names
facilitate), maybe make it an argument to ergm()?
…On Thu, Jan 19, 2023 at 4:12 AM Michał Bojanowski ***@***.***> wrote:
IMHO it compounds the earlier problem described by @sgoodreau
<https://urldefense.com/v3/__https://github.com/sgoodreau__;!!K-Hz7m0Vt54!iRLZCu2HDZODbXUKzTB03wxz1XYEFs1TKsdOQbPrUqugXhDyVwSkaiBFEZtth1cdhj0uqks_yIAN2OXqNc72VOc$>
...
What about:
1. Transposing I mentioned earlier
2. Using "footnotes" for the longer parameter names similarly to what
*pillar* does when printing tibbles with long column names? I'm not
really convinced it would help without creating its own problems, but
perhaps it's worth considering
People work on larger and larger screens and resolutions so R console gets
wider. Perhaps we trying to fix a non-problem really? Outputs of lm() and
glm() also sometimes get wrapped because of long variable names.
—
Reply to this email directly, view it on GitHub
<https://urldefense.com/v3/__https://github.com/statnet/ergm/issues/500*issuecomment-1396884223__;Iw!!K-Hz7m0Vt54!iRLZCu2HDZODbXUKzTB03wxz1XYEFs1TKsdOQbPrUqugXhDyVwSkaiBFEZtth1cdhj0uqks_yIAN2OXqYaaOgVM$>,
or unsubscribe
<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AB6QTYVP6JFND6VKWYRX4RLWTEVURANCNFSM6AAAAAATVAWJZ4__;!!K-Hz7m0Vt54!iRLZCu2HDZODbXUKzTB03wxz1XYEFs1TKsdOQbPrUqugXhDyVwSkaiBFEZtth1cdhj0uqks_yIAN2OXqSJAZYFc$>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The problem is that there is a wall of diagnostic output---particularly if there are many threads---and it's a pain to find what you need. For matrix type output, breaking across columns also breaks up their structure.
This is only for MCMC diagnostics, for which interpretation is not as important, as long as parameter names can be identified. |
I agree it is a lot of output. One thing is navigation (finding what you need), the other is breaking tables/matrices so that some columns are not side by side. The abbreviations look smart to me but I'm afraid it might be a failed quest that will result in forcing unwilling users to decipher cryptic symbol-like names. I'm not sure what it is you don't like about the transposing idea? :) It would solve the problem e.g. for this table provided that we round the numbers to eg 6 or even 3 digits:
Perhaps for navigation we can consider printing component by component, i.e. if
Show just the means and SDs:
The inspiration is the printing style of the objects from e.g. |
I've implemented an argument in
mcmc.diagnostics()
to abbreviate the coefficient names and reduce the number of significant figures when printing correlation matrices and similar to make the output more concise. Here, I have the uncompacted and one compact to a target of 4 characters:Created on 2023-01-09 with reprex v2.0.2
@CarterButts , @drh20drh20 , @martinamorris , @sgoodreau , @mbojan , @ anyone else, any thoughts about what the default should be?
The text was updated successfully, but these errors were encountered: