-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add predict methods #24
Conversation
- no spaces in class names - don't use same class name for different data structures
@vspinu thanks for the contribution. give me some time to review the changes |
out <- predict_Medoids(newdata, MEDOIDS = object$medoids, | ||
distance_metric = object$distance_metric, | ||
fuzzy = fuzzy, threads = threads) | ||
if (fuzzy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the predict_Medoids() function when the user specifies fuzzy = TRUE then the function returns 3 objects:
- clusters
- fuzzy_clusters
- dissimilarity
Is your intention to follow the same output format of the predict() function as is the case with other algorithms that return either a vector or a matrix (in case of binary classification for instance)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When fuzzy = TRUE the returned object is a matrix with class probabilities. This is consistent with methods in other packages which return a matrix of probabilities/memberships.
We can add an extra argument clusters = TRUE
which would return a full list. This would be similar to se.fit = TRUE
argument of predict.glm
for example. I don't think this is strictly necessary though given that the workhorse predict_Medoids
will still stay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then no need of an extra parameter. I thought initially that the previous predict functions will be deprecated. Thus, it's fine the 'predict.MedoidsCluster()' function.
@vspinu thanks again for the additions and the corrections (regarding the classes). I'll be glad if you add also tests and update the vignette. It will be also nice to have the print, plot and summary methods. |
Ok. I will spend some time on this.
Of course. I didn't mean to deprecate those. |
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. |
I've added a message so that the PR is not closed from the github-actions-bot till the changes are applied (tests, etc.) |
Sorry for stalling this. Things started piling on my weekends suddenly, will try to resume this week. |
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. |
I have re-added the comment, added tests and adjusted the vignette to use
You can modify those as you see fit. One problem is Regarding the UI, for consistency and simplicity, I would propose to actually have only 3 top level functions: |
First of all thanks for all your implementations. Let me see the changes in the next days. Regarding the UI, I think it's a good idea but we have to add deprecation messages for the previous functions so that the users can get accustomed to the new functions. |
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. |
I've added a message so that this PR is not closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
Done! I have also bumped the dev version. |
This is Robo-lampros because the Human-lampros is lazy. This PR has been automatically marked as stale because it has not had recent activity. It will be closed after 7 days if no further activity occurs. |
thank you for all the changes, I'll need a few days to do the following:
|
@vspinu, |
Yes. It has to do with the fact that interally C++ is using positional references instead of names. So each time one modifies the fields of the result internal code would break. I have fixed that by addressing with names. I had to rename a few internal slots for that. This also makes the internal names consistent with the user level output. I don't think you will object that, but please have a look at the last commit. |
thank you @vspinu I'll have a look today in the afternoon |
Had to rename internal slots for internal consistency reason: - bst_sample_silhouette_matrix -> silhouette_matrix - bst_dissimilarity -> best_dissimilarity - bst_sample_dissimilarity_matrix -> dissimilarity_matrix This is also consistent with user level fields names.
a40c69d
to
3b8d87c
Compare
thank you very much @vspinu for all the changes. Give me a few days to update the version on CRAN.
Let me know. thanks. |
Sure. Thanks! I still think my contribution is a bit too slim for a contributor tag, but ok. I will try to get some more stuff in as I go with using the package. |
I intended today to submit the new version of the ClusterR package to CRAN and I ran an > system("R CMD build ClusterR")
* checking for file ‘ClusterR/DESCRIPTION’ ... OK
* preparing ‘ClusterR’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘the_clusterR_package.Rmd’ using rmarkdown
Quitting from lines 189-198 (the_clusterR_package.Rmd)
Error: processing vignette 'the_clusterR_package.Rmd' failed with diagnostics:
no applicable method for 'predict' applied to an object of class "c('KMeansCluster', 'k-means clustering')"
--- failed re-building ‘the_clusterR_package.Rmd’
SUMMARY: processing the following file failed:
‘the_clusterR_package.Rmd’
Error: Vignette re-building failed.
Execution halted
Can you reproduce this error locally with the cloned repository? It seems that it's related to this pull request. |
I explained in another PR on how I build and check an R package that includes Rcpp code. Therefore, if you want to reproduce the error can you follow these steps? |
I am pretty sure everything is ok and it's some strange local issue. Those methods are registered in the NAMESPACE file. Make sure you don't introduce any diffs to NAMESPACE with your process.
I have also triggered a winbuild (don't be surprised to see the email with a result). |
thanks @vspinu, the winbuild looks ok (see the attached). the github action of this PR worked too. I'm wondering why I get this error. I'll have a look again and notify you. |
you were right about the differences in the NAMESPACE file but I do not understand why do I get these diffs? I'm not experienced with S3 Methods. What I do is I make a clean build, check and install after I remove the .Rd files of the 'man' folder and create the documentation using S3method(predict,GMMCluster)
S3method(predict,KMeansCluster)
S3method(predict,MedoidsCluster)
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster) in the NAMESPACE and after I use S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster) do we have to use Collate in the DESCRIPTION file now that the ClusterR package makes use of S3 methods? |
I also just use |
I use the latest versions of 'devtools' (2.4.2) and 'roxygen2' (7.1.2) on Linux with R version 4.1.1 (2021-08-10) Which Operating System do you use? Is it possible that you can run the `devtools::document()' function on a Linux machine? It seems to me that it has to do with how we document the S3 methods (a similar SO issue) When I run S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
export(print.GMMCluster)
export(predict.GMMCluster)
export(predict.KMeansCluster)
export(predict.MedoidsCluster)
whereas currently in this repo we have, S3method(predict,GMMCluster)
S3method(predict,KMeansCluster)
S3method(predict,MedoidsCluster)
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
The I know that 'devtools::document' is not a requirement but if I'm not able to re-create the .Rd files then I might have issues in the future in case that I want to update any function of the package |
But that's exactly what is needed. It fixes your original issue, right? The I am on linux. |
I'm still receiving the error, > system("R CMD build ClusterR")
* checking for file ‘ClusterR/DESCRIPTION’ ... OK
* preparing ‘ClusterR’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* installing the package to build vignettes
* creating vignettes ... ERROR
--- re-building ‘the_clusterR_package.Rmd’ using rmarkdown
Quitting from lines 189-198 (the_clusterR_package.Rmd)
Error: processing vignette 'the_clusterR_package.Rmd' failed with diagnostics:
no applicable method for 'predict' applied to an object of class "c('KMeansCluster', 'k-means clustering')"
--- failed re-building ‘the_clusterR_package.Rmd’
SUMMARY: processing the following file failed:
‘the_clusterR_package.Rmd’
Error: Vignette re-building failed.
Execution halted and my NAMESPACE file includes the following, # Generated by roxygen2: do not edit by hand
S3method(print,GMMCluster)
S3method(print,KMeansCluster)
S3method(print,MedoidsCluster)
export(AP_affinity_propagation)
export(AP_preferenceRange)
export(Clara_Medoids)
export(Cluster_Medoids)
export(GMM)
export(KMeans_arma)
export(KMeans_rcpp)
export(MiniBatchKmeans)
export(Optimal_Clusters_GMM)
export(Optimal_Clusters_KMeans)
export(Optimal_Clusters_Medoids)
export(Silhouette_Dissimilarity_Plot)
export(center_scale)
export(distance_matrix)
export(external_validation)
export(plot_2d)
export(predict.GMMCluster)
export(predict.KMeansCluster)
export(predict.MedoidsCluster)
export(predict_GMM)
export(predict_KMeans)
export(predict_MBatchKMeans)
export(predict_Medoids)
import(gtools)
importFrom(Rcpp,evalCpp)
importFrom(ggplot2,aes)
importFrom(ggplot2,element_blank)
importFrom(ggplot2,geom_point)
importFrom(ggplot2,ggplot)
importFrom(ggplot2,scale_shape_manual)
importFrom(ggplot2,scale_size_manual)
importFrom(ggplot2,theme)
importFrom(gmp,as.bigz)
importFrom(gmp,asNumeric)
importFrom(gmp,chooseZ)
importFrom(grDevices,dev.cur)
importFrom(grDevices,dev.off)
importFrom(graphics,abline)
importFrom(graphics,axis)
importFrom(graphics,barplot)
importFrom(graphics,legend)
importFrom(graphics,lines)
importFrom(graphics,mtext)
importFrom(graphics,par)
importFrom(graphics,plot)
importFrom(graphics,text)
importFrom(graphics,title)
importFrom(stats,na.omit)
importFrom(utils,globalVariables)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
useDynLib(ClusterR, .registration = TRUE)
I don't know why and you can't reproduce it although on linux |
That's a bit crazy. I have a hunch. Could you add Do you have stats package in your search path when you run devtools::document? For me |
In this comment I modified you function to make the
I tried this as well but I receive the same error.
Althouth the Although I had the chance to go through the S3 and S4 classes in the past, I preferred in general the R6 classes due to its simplicity and familiarity with Python classes. One of the reasons that I'm now not in position to recognize the source of the problem that arises from the |
I submitted the updated version to CRAN (1.2.6). This was actually a problem with the R version I used. After updating to 4.1.1 the issue was resolved. |
Perfectos! Happy to hear it solved by itself :) |
I love this package because it's fills the bill for all clustering methods in one place. But the interface and UI is rather non-Rish.
As a starter I am adding
predict
methods for cluster objects. For this Icluster medoids silhouette
) are really non-standard and are cumbersome to work with when defining generics.If these changes are ok with you I will add tests, update vignette etc. Also, time permitting, will add print, plot and summary methods to make the UI of the package more user friendly for the seasoned R users.
With this change one can do:
instead of the current