The MLP Architecture

Leia esse README em português.

A multilayer perceptron (MLP) consists of an artificial neural network with at least three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

In this repository there is a parallel implementation of an MLP

The MLP Architecture

An artificial neuron receives inputs signals and weights . The weights reflects the influence of the input. The neuron has the ability to calculate the weighted sum of its inputs and then applies an activation function to obtain a signal that will be transmitted to the next neuron.

The MLP architecure can be divided in 4 steps. The first step is to attribute random values for the weights and the threshold. The second step is to calculate the values of the neurons in the hidden layer. The third step is to calculate the error in the neuron of the output layer and correct their weight and calculates the error of the neurons in the hidden layer and correct their weight. After this procedure it is possible to atualizate the weight of the neuron of the output layer and the neuron of the hidden layer. The last step is to propagate this 3 first procedures to train by doing a backpropagation.

The Dataset

The owner of the Dataset is Nick Street and was created in 1995. It was created for diagnost breast cancer. Features are computed from a digital image of a fine needle aspirate(FNA) of a breast mass, that can describe the characteristics of the cell nuclei present in the image. The results were obtained using Multisuface Method-Tree(MST), a classification method which uses linear programming to construct a decision tree.

It was used 569 instances with 32 attributes ( ID, diagnosis and 30 real input features):

ID -> to identify each person by a code
diagnosis -> can be M (malignant) or B (benign)
real valued features -> computed for each cell nucleus

The results predict field 2, diagnosis: B (benign), M (malignant). Sets are linearly separable using all 30 input features. Accomplish 97,5% accuracy and have also diagnosed 176 consecutive new pacients as of November 1995.

Creators:

Dr. William H. Wolberg, General Surgery Dept., University of
Wisconsin,  Clinical Sciences Center, Madison, WI 53792
wolberg@eagle.surgery.wisc.edu

W. Nick Street, Computer Sciences Dept., University of
Wisconsin, 1210 West Dayton St., Madison, WI 53706
street@cs.wisc.edu  608-262-6619

Olvi L. Mangasarian, Computer Sciences Dept., University of
Wisconsin, 1210 West Dayton St., Madison, WI 53706
olvi@cs.wisc.edu

Usage

In order to use the code, you need to first and foremost clone this repository.

git clone github.com/viniciusvviterbo/Multilayer-Perceptron
cd ./Multilayer-Perceptron

Formatting the dataset

In this project we opted for describing the main informations in the first line, an empty line - for ease of read, it is entirely optional -, and the data itself. Example:

[NUMBER OF CASES] [NUMBER OF INPUTS] [NUMBER OF OUTPUTS]

[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]
[INPUT 1] [INPUT 2] ... [INPUT N] [OUTPUT 1] [OUTPUT 2] ... [OUTPUT N]

For testing the code, in this repository we included the dataset for the XOR logical port (pattern_logic-port.in), and it can be used for better understanding the needed format.

Normalizing the dataset

A normalized dataset is preferred for its (kinda) absolute results given at the end of training: 0 or 1. To normalize the dataset, execute:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < PATTERN_FILE > NORMALIZED_PATTERN_FILE

Example:

g++ ./normalizeDataset.cpp -o ./normalizeDataset
./normalizeDataset.cpp < pattern_breast-cancer.in > normalized_pattern_breast-cancer.in

Compiling the source code

Compile the source code using OpenMP

g++ ./mlp.cpp -o ./mlp -fopenmp

Training and Result

In this code, we are dividing the dataset informed by half. The first half is used for training purposes only, the second one is used for testing, this way the network sees the latter half as new content and tries to obtain the correct result.

The expected results and the ones obtained by the MLP are printed for comparison.

Executing

For executing, the command needs some parameters:

.mlp HIDDEN_LAYER_LENGTH TRAINING_RATE THRESHOLD < PATTERN_FILE

HIDDEN_LAYER_LENGTH refers to the number of neurons in the network hidden layer;
TRAINING_RATE refers to the network's rate of training, a floating point number used during the correction phase of backpropagation;
THRESHOLD refers to the maximum error admitted by the network in order to obtain an acceptably correct result;
PATTERN_FILE refers to the normalized pattern file

Example:

.mlp 5 0.2 1e-5 < ./normalized_pattern_breast-cancer.in

References

Fabrício Goés Youtube Channel - by Dr. Luis Goés

Eitas Tutoriais - by Espaço de Inovação Tecnológica Aplicada e Social - PUC Minas

Breast Cancer Wisconsin (Diagnostic) Data Set - from UCI Machine Learning Repository

Koliko - by Alex Frukta & Vladimir Tomin

GNU AGPL v3.0

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.gitignore		.gitignore
1114904@parcode.icei.pucminas.br		1114904@parcode.icei.pucminas.br
LICENSE		LICENSE
README.md		README.md
README.pt.md		README.pt.md
mlp		mlp
mlp.cpp		mlp.cpp
normalizeDataset.cpp		normalizeDataset.cpp
normalized_pattern_breast-cancer.in		normalized_pattern_breast-cancer.in
out.txt		out.txt
parMLP		parMLP
parMLP.cpp		parMLP.cpp
pattern_breast-cancer.in		pattern_breast-cancer.in
pattern_logic-port.in		pattern_logic-port.in

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The MLP Architecture

The Dataset

Usage

Formatting the dataset

Normalizing the dataset

Compiling the source code

Training and Result

Executing

References

About

Releases

Packages

Languages

License

lincolncout/Multilayer-Perceptron

Folders and files

Latest commit

History

Repository files navigation

The MLP Architecture

The Dataset

Usage

Formatting the dataset

Normalizing the dataset

Compiling the source code

Training and Result

Executing

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages