You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
It looks the operator counter of the aggregated output is the 2x of real value.
I used below gluon model (mainly copied from https://mxnet.incubator.apache.org/tutorials/gluon/gluon.html, contains 2 convolutions for each forward pass) and did profiling, the aggregated convolution counter is 4 instead of 2, the counter doubled for other OPs as well.
# import dependencies
from __future__ import print_function
import numpy as np
import mxnet as mx
import mxnet.ndarray as F
import mxnet.gluon as gluon
from mxnet.gluon import nn
from mxnet import autograd
class Net(gluon.Block):
def __init__(self, **kwargs):
super(Net, self).__init__(**kwargs)
with self.name_scope():
# layers created in name_scope will inherit name space
# from parent layer.
self.conv1 = nn.Conv2D(6, kernel_size=5)
self.pool1 = nn.MaxPool2D(pool_size=(2,2))
self.conv2 = nn.Conv2D(16, kernel_size=5)
self.pool2 = nn.MaxPool2D(pool_size=(2,2))
self.fc1 = nn.Dense(120)
self.fc2 = nn.Dense(84)
self.fc3 = nn.Dense(10)
def forward(self, x):
x = self.pool1(F.relu(self.conv1(x)))
x = self.pool2(F.relu(self.conv2(x)))
# 0 means copy over size from corresponding dimension.
# -1 means infer size from the rest of dimensions.
x = x.reshape((0, -1))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# Initialize on CPU. Replace with `mx.gpu(0)`, or `[mx.gpu(0), mx.gpu(1)]`,
# etc to use one or more GPUs.
net.collect_params().initialize(mx.init.Xavier(), ctx=mx.cpu())
mx.profiler.set_config(aggregate_stats=True)
mx.profiler.set_state('run')
data = mx.nd.random_normal(shape=(10, 1, 32, 32)) # dummy data
output = net(data)
output.wait_to_read()
print(mx.profiler.dumps())
The profile output is below:
[jinhuang@mlt-ace image-classification]$ python test_gluon.py
Profile Statistics.
Note that counter items are counter values and not time units.
Device Storage
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
Memory: cpu/0 65 535.0320 0.0240 726.1520 363.0640
MXNET_C_API
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
MXNDArrayReshape64 1 0.0140 0.0140 0.0140 0.0140
MXNDArrayFree 21 0.1100 0.0000 0.0290 0.0052
MXAutogradMarkVariables 10 0.1080 0.0080 0.0180 0.0108
MXNDArrayCreateEx 10 0.0340 0.0020 0.0070 0.0034
MXNDArrayGetDType 15 0.0000 0.0000 0.0000 0.0000
MXSymbolSetAttr 47 0.0920 0.0010 0.0130 0.0020
MXNDArrayGetContext 13 0.0070 0.0000 0.0010 0.0005
MXNDArrayWaitToRead 1 12.0930 12.0930 12.0930 12.0930
MXSymbolInferShape 7 0.5700 0.0490 0.2320 0.0814
MXNet C API Calls 230 0.2300 0.0010 0.2300 0.1145
MXNet C API Concurrency 459 0.0010 0.0000 0.0010 0.0005
MXSymbolCreateAtomicSymbol 7 0.3780 0.0190 0.1540 0.0540
MXAutogradSetIsRecording 10 0.0140 0.0000 0.0070 0.0014
MXAutogradSetIsTraining 10 0.0040 0.0000 0.0010 0.0004
MXImperativeInvokeEx 52 3.7600 0.0150 1.7250 0.0723
MXNDArrayGetShape 25 0.0190 0.0000 0.0030 0.0008
operator
=================
Name Total Count Time (ms) Min Time (ms) Max Time (ms) Avg Time (ms)
---- ----------- --------- ------------- ------------- -------------
WaitForVar 2 0.0080 0.0040 0.0040 0.0040
FullyConnected 6 0.7770 0.0590 0.2380 0.1295
Pooling 4 0.5280 0.0890 0.1750 0.1320
_zeros 20 0.6220 0.0020 0.1410 0.0311
ResourceParallelRandomSetSeed 2 14.7190 7.3580 7.3610 7.3595
DeleteVariable 40 0.1130 0.0010 0.0070 0.0028
relu 8 0.2220 0.0070 0.0770 0.0278
_random_normal 2 10.8890 5.4430 5.4460 5.4445
_random_uniform 10 0.7370 0.0100 0.2740 0.0737
_full 10 0.0350 0.0010 0.0080 0.0035
Convolution 4 12.5590 0.5260 5.7530 3.1398
zeros_like 20 0.3800 0.0030 0.1490 0.0190
CopyCPU2CPU 20 0.4690 0.0030 0.1530 0.0235
Reorder 4 0.0110 0.0010 0.0050 0.0027
The text was updated successfully, but these errors were encountered:
It looks the operator counter of the aggregated output is the 2x of real value.
I used below gluon model (mainly copied from https://mxnet.incubator.apache.org/tutorials/gluon/gluon.html, contains 2 convolutions for each forward pass) and did profiling, the aggregated convolution counter is 4 instead of 2, the counter doubled for other OPs as well.
The profile output is below:
The text was updated successfully, but these errors were encountered: