Should we change the prediction value of Clusterers from integers to strings? #255
Replies: 1 comment
-
I really like signaling to the users that the "cluster number" is not really a number. That might avoid many mistakes and help them stay focused on addressing the problem the right way. The memory consumption idea is sadly not in line with the way PHP works. The integer uses a zval (that is stored in the object properties and is about 24 bytes in size, as a C union for all the other base types). The string will use the storage of the zval too plus a pointer to a |
Beta Was this translation helpful? Give feedback.
-
Instead of cluster numbers as integers
1
,2
,3
, ...99
, we could do cluster "names" as base26 strings such as'a'
,'b'
,'c'
...'aa'
, ...'zzz'
.vs.
The motivation for this change would be two-fold. First, it would be more in line with the library convention of strings being categorical data and integers and/or floating point numbers being continuous data. Second, in most cases, it would use less memory since the first 26 clusters only need 8 bits rather than 64 bits. The second 26 clusters need 16 bits vs 64 bits, It;s only until the 209th cluster that this naming scheme starts to be less memory efficient than an integer.
Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions