You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
//Load the data into memory then parallelize
//This isn't a good approach in general - but is simple to use for this example
DataSetIterator iterTrain = new MnistDataSetIterator(batchSizePerWorker, true, 12345);
DataSetIterator iterTest = new MnistDataSetIterator(batchSizePerWorker, true, 12345);
List<DataSet> trainDataList = new ArrayList<>();
List<DataSet> testDataList = new ArrayList<>();
while (iterTrain.hasNext()) {
trainDataList.add(iterTrain.next());
}
while (iterTest.hasNext()) {
testDataList.add(iterTest.next());
}
i know this way will limit by the machine memory.
do you have any advice about the good approch in parallelizing data?
The text was updated successfully, but these errors were encountered:
ok, so there's 2 ways
(a) use SparkContext.parallelize (that's a standard spark op) - easy but bad performance (all preprocessing happens on master)
(b) write a better data pipeline that does the proper reading + conversion in parallel
in example: https://github.com/deeplearning4j/dl4j-examples/blob/master/dl4j-spark-examples/dl4j-spark/src/main/java/org/deeplearning4j/mlp/MnistMLPExample.java
there has some comment:
i know this way will limit by the machine memory.
do you have any advice about the good approch in parallelizing data?
The text was updated successfully, but these errors were encountered: