Skip to content

Outputs some statistics about how many features and classes are present in the file

Notifications You must be signed in to change notification settings

mricherzhagen/analyze-libsvm-file

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

analyze-libsvm-file

Outputs some statistics about how many features and classes are present in the libsvm training/test data file.

Example output for the heart_scale file that is distributed with libsvm:

Classes: 
  1.0  44.444%  120
 -1.0  55.556%  150
Features:
    #   Count%   Count n-count        Min       Mean        Max   Variance
    1  97.407%     263       7  -1.000000   0.061312   1.000000   0.147243
    2 100.000%     270          -1.000000   0.355556   1.000000   0.873580
    3 100.000%     270          -1.000000   0.449383   1.000000   0.399701
    4 100.000%     270          -1.000000  -0.295388   1.000000   0.113156
    5 100.000%     270          -1.000000  -0.435346   1.000000   0.055495
    6 100.000%     270          -1.000000  -0.703704   1.000000   0.504801
    7  99.259%     268       2  -1.000000   0.022388   1.000000   0.999499
    8 100.000%     270          -1.000000   0.201187   1.000000   0.124623
    9 100.000%     270          -1.000000  -0.340741   1.000000   0.883896
   10  99.630%     269       1  -1.000000  -0.663749   1.000000   0.134841
   11  54.815%     148     122  -1.000000  -0.756757   1.000000   0.427319
   12 100.000%     270          -1.000000  -0.553086   1.000000   0.394507
   13 100.000%     270          -1.000000  -0.151852   1.000000   0.938052

  ALL  96.239%    3378     132  -1.000000  -0.216215   1.000000   0.461286

How to read the output:

Classes: 1.0 44.444% 120 means, there are 120 samples (44.444% of all samples) in the class with the label 1.0

Features: 1 97.407% 263 7 means, that 263 samples contain the feature and 7 samples don't contain it. So 97.407% of the samples contain feature 1. If all samples contain the feature the number of samples that are missing the feature is omited.

The last four numbers show the minimum, average, maximum and variance of the feature values.

The last line is a summary for all the features. The count%, minimum, average, maximum and variance are averaged.

About

Outputs some statistics about how many features and classes are present in the file

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published