Skip to content

maxsch3/keras-batchflow

Repository files navigation

Keras-batchflow

Keras batchflow is a pipeline-friendly batch generator framework for Keras. The framework is generating batches for Keras's fit_generator and predict_generator functions in all sorts of less standard scenarios: multi-input and multi-output scenarios, scenarios employing dynamic data augmentation with dependencies between variables, etc.

The framework bridges gaps between keras, pandas and sklearn. With Keras batchflow you can use pandas dataframe directly as a datasource for your keras model. You can use all breadth of standard sklearn encoders to transform columns of a dataframe into a numpy arrays.

Read the documenatation here

Meanwhile, quick tester example of what the framework is capable of

import pandas as pd
from sklearn import LabelEncoder, LabelBinarizer
from keras_batchflow.batch_generators import BatchGenerator

df = pd.DataFrame({
    'var1': ['Leaf', 'Flower', 'Leaf', 'Branch', 'Green', 'Yellow', 'Red', 'Brown'],
    'var2': ['Green', 'Yellow', 'Red', 'Brown', 'Green', 'Yellow', 'Red', 'Brown'],
    'label': ['Class 0', 'Class 1', 'Class 0', 'Class 2', 'Class 0', 'Class 1', 'Class 0', 'Class 2'],
})

#prefit sklearn encoders
var1_enc = LabelEncoder().fit(df['var1'])
var2_enc = LabelEncoder().fit(df['var2'])
label_enc = LabelBinarizer().fit(df['label'])

# define a batch generator
train_gen = BatchGenerator(
    df,
    x_structure=[('var1', var1_enc), ('var2', var2_enc)],
    y_structure=('label', label_enc),
    batch_size=4,
    train_mode=True
)

The generator returns batches of format (x_structure, y_structure) and the shape of the batches is:

>>> train_gen.shape
([(1, ), (1, )], (3, ))

The first element is a x_structure and it is a list if two inputs. Both of them are outputs of LabelEncoders, that return integer ids of categorical variables, hence the dimension has just 1 column. The y_structure is a single output produced by one-hot encoder, hence the dimension has 3 columns.

Now you can define a neural network and use above generator in fit_generator to train it

from keras.layers import Dense, Concatenate, Embedding, Input, Lambda
from keras.models import Model
import keras.backend as K

metadata_x, metadata_y = bg.metadata

var1_input = Input(shape=metadata_x[0]['shape'])
var2_input = Input(shape=metadata_x[0]['shape'])
var1_emb = Embedding(metadata_x[0]['n_classes'], 10)(var1_input)
var2_emb = Embedding(metadata_x[1]['n_classes'], 10)(var2_input)
features = Concatenate()([var1_emb, var2_emb])
# features has dimensions (None, 1, 10), so I remove the second one  
features = Lambda(lambda x: K.squeeze(x, axis=1))(features)
classes = Dense(metadata_y['shape'][0], activation='softmax')(features)

model = Model([var1_input, var2_input], classes)

model.compile('adam', 'categorical_crossentropy')

model.fit_generator(bg)

Installation

pip install git+https://github.com/maxsch3/keras-batchflow.git