Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



7 Commits

Repository files navigation


This repo is part of an attempt to develop various neural network models from scratch in python and providing alternative implementations of them for devices with CUDA-enabled GPUs.


  • Numpy 1.20.3
  • PyCUDA 2021.1


pip install numpy pycuda


common activation functions e.g. sigmoid, tanh, ReLU, softmax and their derivatives

implements a layer in an artificial neural network. supports common weight initialization schemes like glorot normal and glorot uniform.

implements common loss functions (e.g. binary cross-entropy, mse, categorical crossentropy etc.) and their derivatives

contains the implementation of the ANN model with support for vector output. batching like gradient descent, stochastic gradient descent, mini-batch gradient descent have been implemented. 2nd moment based optimizers like adagrad, and RMSprop as well as ADAM based on both 1st and 2nd moments have been implemented.


model = network.Network(inputDim=inputDim, initializationScheme=initializationScheme)

      inputDim: (int) the number of features in input (also the number of neurons in input layer)
      initializationScheme: (string) default: randn. can be any of the values below:
            1. randn: random normal with mean 0
            2. he : he initialization
            3. glorot-normal
            4. glorot-uniform

model.addLayer(dim=dim, activation=activation)

      dim: (int) the number of hidden layers
      activation: (string) the activation function of the layer. can be any one of the following:
            1. relu
            2. tanh
            3. sigmoid
            4. leaky-relu
            5. softmax

model.compile(loss=loss_func, optimizer=optimizer, batch_type=batch_type, batch_size=batch_size)

      loss: (string) the loss function of the network. default: binary-crossentropy. can be any one of the following:
            1. binary-crossentropy
            2. mse
            3. cat-crossentropy
      optimizer: (string) the gradient descent optimizer for reducing loss. default: gd. can be any one of the following:
            1. gd: gradient descent
            2. adagrad
            3. rmsprop
            4. adam
      batch_type: (string) the batching for gradient descent. default: bgd. can be any one of the following:
            1. bgd: batch gradient descent
            2. sgd: stochastic gradient descent
            3. mbgd: mini-batch gradient descent
      batch_size: (int) batch size if mbgd is used for batching

model.train(X_train, y_train, epochs=epochs, alpha=alpha, verbose=verbose)

      epochs: (int) number of epochs to train the neural network. default: 100
      alpha: (int) learning rate for gradient descent. default: 0.1
      verbose: (boolean) default: false

pred, accuracy = model.predict(X, y)


  1. pred: numpy array the prediction matrix
  2. accuracy: float the accuracy of prediction (if target variable is categorical)

overall code to train and test:

model = network.Network(inputDim=4, intializationScheme='glorot-uniform')
model.addLayer(dim = 3, activation='sigmoid')
model.addLayer(dim = 1, activation='sigmoid')
model.compile(optimizer='adam', batch_type="mbgd", batch_size=32)
model.train(X_train.T, y_train.T, alpha=0.1, epochs = 100)


using a 2 layer neural ann with 300 hidden units and 784 (28 x 28 images) input units

  • achieved 97.78% accuracy over test set on training over grayscale images of handwritten letters
  • achieved around 96% accuracy over mnist digits dataset


deep-learning algorithms written from scratch






No releases published


No packages published