GoBayes

A web api providing memory-based naïve bayesian text classification.

Why?

Bayesian text classification is often used for things like
spam detection, sentiment determination, or general categorization.

Essentially you collect samples of text that you know are of a certain
"type" or "category," then you use it to train a bayesian classifier.
Once you have trained the classifier with many samples of various
categories, you can begin to classify and/or score text samples to see
which category they fit best in.

You could, for instance, set up classification of sentiment by finding
samples of text that are happy, sad, angry, sarcastic, and so on, then
train a classifier using those samples. Once your classifier is trained,
you can begin to classify other text into one of those categories.

What a classifier does is look at text and tell you how much that text
"looks like" other categories of text that it has been trained for.

Installation

$ go get github.com/hickeroar/gobayes

Usage

$ gobayes
Server is listening on port 8000.

$ gobayes -help
Usage of gobayes:
  -port string
        The port the server should listen on. (default "8000")

$ gobayes -port 8181
Server is listening on port 8181.

Using the Classifier

Training the Classifier

Endpoint:

/train/<string:category>
Example: /train/spam
Accepts: POST

The result is of content-type "application/json" and contains a breakdown of each trained category including the total text tokens that category contains, and the probabilities of any given token existing in that category vs other categories.

{
    "Success": true,
    "Categories": {
        "ham": {
            "TokenTally": 1864,
            "ProbNotInCat": 0.540093757710338,
            "ProbInCat": 0.459906242289662
        },
        "spam": {
            "TokenTally": 2189,
            "ProbNotInCat": 0.4599062422896619,
            "ProbInCat": 0.5400937577103381
        }
    }
}

The POST payload should contain the raw text that will train the classifier.
You can train a category as many times as you want.

Untraining the Classifier

Endpoint:

/untrain/<string:category>
Example: /untrain/spam
Accepts: POST

The result is of content-type "application/json" and contains a breakdown of each trained category including the total text tokens that category contains, and the probabilities of any given token existing in that category vs other categories.

{
    "Success": true,
    "Categories": {
        "ham": {
            "TokenTally": 1864,
            "ProbNotInCat": 0.540093757710338,
            "ProbInCat": 0.459906242289662
        },
        "spam": {
            "TokenTally": 2189,
            "ProbNotInCat": 0.4599062422896619,
            "ProbInCat": 0.5400937577103381
        }
    }
}

The POST payload should contain the raw text that will untrain the classifier.
If there are no remaining tokens in a category, that category will be removed.

Getting Classifier Status

Endpoint:

/info
Accepts: GET

The result is of content-type "application/json" and contains a breakdown of each trained category including the total text tokens that category contains, and the probabilities of any given token existing in that category vs other categories.

{
    "Categories": {
        "ham": {
            "TokenTally": 1864,
            "ProbNotInCat": 0.540093757710338,
            "ProbInCat": 0.459906242289662
        },
        "spam": {
            "TokenTally": 2189,
            "ProbNotInCat": 0.4599062422896619,
            "ProbInCat": 0.5400937577103381
        }
    }
}

No payload or parameters are expected.

Classifying Text

Endpoint:

/classify
Accepts: POST

The result is of content-type "application/json" and contains a simple object showing the Category classification and the classification Score that was calculated. The Score should not be relied on as any kind of specific indicator, as it's a relative score and varies based on the number of tokens your categories have been trained with.

{
    "Category": "spam",
    "Score": 43.48754443957434
}

The POST payload should contain the raw text that you want to classify.

Scoring Text

Endpoint

/score
Accepts: POST

The result of content-type "application/json" and contains is a simple object showing the scores achieved by each category. As with classification, the scores should not be relied on as any kind of specific indicator, as it's a relative score and varies based on the number of tokens your categories have been trained with.

{
    "ham": 5.512455560425657,
    "spam": 43.48754443957434
}

The POST payload should contain the raw text that you want to score.

Flushing Training Data

Endpoint

/flush
Accepts: POST

The result of content-type "application/json" and contains a the list of categories that have been trained. This list should be empty, indicating that the classifier has been flushed.

{
    "Success": true,
    "Categories": {}
}

No payload or parameters are expected.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.vscode		.vscode
bayes		bayes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gobayes.go		gobayes.go
responses.go		responses.go
trash.yml		trash.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GoBayes

Why?

Installation

Usage

Using the Classifier

Training the Classifier

Endpoint:

Untraining the Classifier

Endpoint:

Getting Classifier Status

Endpoint:

Classifying Text

Endpoint:

Scoring Text

Endpoint

Flushing Training Data

Endpoint

About

Releases

Packages

Languages

License

hickeroar/gobayes

Folders and files

Latest commit

History

Repository files navigation

GoBayes

Why?

Installation

Usage

Using the Classifier

Training the Classifier

Endpoint:

Untraining the Classifier

Endpoint:

Getting Classifier Status

Endpoint:

Classifying Text

Endpoint:

Scoring Text

Endpoint

Flushing Training Data

Endpoint

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages