diff --git a/docs/tutorial/data.md b/docs/tutorial/data.md index 40605f7cd73..3bf7d932eda 100644 --- a/docs/tutorial/data.md +++ b/docs/tutorial/data.md @@ -10,15 +10,15 @@ New input types are supported by developing a new data layer -- the rest of the This data layer definition - layers { + layer { name: "mnist" - # DATA layer loads leveldb or lmdb storage DBs for high-throughput. - type: DATA + # Data layer loads leveldb or lmdb storage DBs for high-throughput. + type: "Data" # the 1st top is the data itself: the name is only convention top: "data" # the 2nd top is the ground truth: the name is only convention top: "label" - # the DATA layer configuration + # the Data layer configuration data_param { # path to the DB source: "examples/mnist/mnist_train_lmdb" @@ -46,9 +46,9 @@ The (data, label) pairing is a convenience for classification models. **Transformations**: data preprocessing is parametrized by transformation messages within the data layer definition. - layers { + layer { name: "data" - type: DATA + type: "Data" [...] transform_param { scale: 0.1 diff --git a/docs/tutorial/layers.md b/docs/tutorial/layers.md index c4529e6afc0..ff2ee491244 100644 --- a/docs/tutorial/layers.md +++ b/docs/tutorial/layers.md @@ -23,7 +23,7 @@ In contrast, other layers (with few exceptions) ignore the spatial structure of #### Convolution -* LayerType: `CONVOLUTION` +* Layer type: `Convolution` * CPU implementation: `./src/caffe/layers/convolution_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/convolution_layer.cu` * Parameters (`ConvolutionParameter convolution_param`) @@ -43,15 +43,15 @@ In contrast, other layers (with few exceptions) ignore the spatial structure of - `n * c_o * h_o * w_o`, where `h_o = (h_i + 2 * pad_h - kernel_h) / stride_h + 1` and `w_o` likewise. * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`) - layers { + layer { name: "conv1" - type: CONVOLUTION + type: "Convolution" bottom: "data" top: "conv1" - blobs_lr: 1 # learning rate multiplier for the filters - blobs_lr: 2 # learning rate multiplier for the biases - weight_decay: 1 # weight decay multiplier for the filters - weight_decay: 0 # weight decay multiplier for the biases + # learning rate and decay multipliers for the filters + param { lr_mult: 1 decay_mult: 1 } + # learning rate and decay multipliers for the biases + param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 # learn 96 filters kernel_size: 11 # each filter is 11x11 @@ -67,11 +67,11 @@ In contrast, other layers (with few exceptions) ignore the spatial structure of } } -The `CONVOLUTION` layer convolves the input image with a set of learnable filters, each producing one feature map in the output image. +The `Convolution` layer convolves the input image with a set of learnable filters, each producing one feature map in the output image. #### Pooling -* LayerType: `POOLING` +* Layer type: `Pooling` * CPU implementation: `./src/caffe/layers/pooling_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/pooling_layer.cu` * Parameters (`PoolingParameter pooling_param`) @@ -87,9 +87,9 @@ The `CONVOLUTION` layer convolves the input image with a set of learnable filter - `n * c * h_o * w_o`, where h_o and w_o are computed in the same way as convolution. * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`) - layers { + layer { name: "pool1" - type: POOLING + type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { @@ -101,7 +101,7 @@ The `CONVOLUTION` layer convolves the input image with a set of learnable filter #### Local Response Normalization (LRN) -* LayerType: `LRN` +* Layer type: `LRN` * CPU Implementation: `./src/caffe/layers/lrn_layer.cpp` * CUDA GPU Implementation: `./src/caffe/layers/lrn_layer.cu` * Parameters (`LRNParameter lrn_param`) @@ -115,7 +115,7 @@ The local response normalization layer performs a kind of "lateral inhibition" b #### im2col -`IM2COL` is a helper for doing the image-to-column transformation that you most likely do not need to know about. This is used in Caffe's original convolution to do matrix multiplication by laying out all patches into a matrix. +`Im2col` is a helper for doing the image-to-column transformation that you most likely do not need to know about. This is used in Caffe's original convolution to do matrix multiplication by laying out all patches into a matrix. ### Loss Layers @@ -123,19 +123,19 @@ Loss drives learning by comparing an output to a target and assigning cost to mi #### Softmax -* LayerType: `SOFTMAX_LOSS` +* Layer type: `SoftmaxWithLoss` The softmax loss layer computes the multinomial logistic loss of the softmax of its inputs. It's conceptually identical to a softmax layer followed by a multinomial logistic loss layer, but provides a more numerically stable gradient. #### Sum-of-Squares / Euclidean -* LayerType: `EUCLIDEAN_LOSS` +* Layer type: `EuclideanLoss` The Euclidean loss layer computes the sum of squares of differences of its two inputs, $$\frac 1 {2N} \sum_{i=1}^N \| x^1_i - x^2_i \|_2^2$$. #### Hinge / Margin -* LayerType: `HINGE_LOSS` +* Layer type: `HingeLoss` * CPU implementation: `./src/caffe/layers/hinge_loss_layer.cpp` * CUDA GPU implementation: none yet * Parameters (`HingeLossParameter hinge_loss_param`) @@ -149,17 +149,17 @@ The Euclidean loss layer computes the sum of squares of differences of its two i * Samples # L1 Norm - layers { + layer { name: "loss" - type: HINGE_LOSS + type: "HingeLoss" bottom: "pred" bottom: "label" } # L2 Norm - layers { + layer { name: "loss" - type: HINGE_LOSS + type: "HingeLoss" bottom: "pred" bottom: "label" top: "loss" @@ -172,15 +172,15 @@ The hinge loss layer computes a one-vs-all hinge or squared hinge loss. #### Sigmoid Cross-Entropy -`SIGMOID_CROSS_ENTROPY_LOSS` +`SigmoidCrossEntropyLoss` #### Infogain -`INFOGAIN_LOSS` +`InfogainLoss` #### Accuracy and Top-k -`ACCURACY` scores the output as the accuracy of output with respect to target -- it is not actually a loss and has no backward step. +`Accuracy` scores the output as the accuracy of output with respect to target -- it is not actually a loss and has no backward step. ### Activation / Neuron Layers @@ -193,7 +193,7 @@ In general, activation / Neuron layers are element-wise operators, taking one bo #### ReLU / Rectified-Linear and Leaky-ReLU -* LayerType: `RELU` +* Layer type: `ReLU` * CPU implementation: `./src/caffe/layers/relu_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/relu_layer.cu` * Parameters (`ReLUParameter relu_param`) @@ -201,66 +201,66 @@ In general, activation / Neuron layers are element-wise operators, taking one bo - `negative_slope` [default 0]: specifies whether to leak the negative part by multiplying it with the slope value rather than setting it to 0. * Sample (as seen in `./examples/imagenet/imagenet_train_val.prototxt`) - layers { + layer { name: "relu1" - type: RELU + type: "ReLU" bottom: "conv1" top: "conv1" } -Given an input value x, The `RELU` layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption. +Given an input value x, The `ReLU` layer computes the output as x if x > 0 and negative_slope * x if x <= 0. When the negative slope parameter is not set, it is equivalent to the standard ReLU function of taking max(x, 0). It also supports in-place computation, meaning that the bottom and the top blob could be the same to preserve memory consumption. #### Sigmoid -* LayerType: `SIGMOID` +* Layer type: `Sigmoid` * CPU implementation: `./src/caffe/layers/sigmoid_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/sigmoid_layer.cu` * Sample (as seen in `./examples/imagenet/mnist_autoencoder.prototxt`) - layers { + layer { name: "encode1neuron" bottom: "encode1" top: "encode1neuron" - type: SIGMOID + type: "Sigmoid" } -The `SIGMOID` layer computes the output as sigmoid(x) for each input element x. +The `Sigmoid` layer computes the output as sigmoid(x) for each input element x. #### TanH / Hyperbolic Tangent -* LayerType: `TANH` +* Layer type: `TanH` * CPU implementation: `./src/caffe/layers/tanh_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/tanh_layer.cu` * Sample - layers { + layer { name: "layer" bottom: "in" top: "out" - type: TANH + type: "TanH" } -The `TANH` layer computes the output as tanh(x) for each input element x. +The `TanH` layer computes the output as tanh(x) for each input element x. #### Absolute Value -* LayerType: `ABSVAL` +* Layer type: `AbsVal` * CPU implementation: `./src/caffe/layers/absval_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/absval_layer.cu` * Sample - layers { + layer { name: "layer" bottom: "in" top: "out" - type: ABSVAL + type: "AbsVal" } -The `ABSVAL` layer computes the output as abs(x) for each input element x. +The `AbsVal` layer computes the output as abs(x) for each input element x. #### Power -* LayerType: `POWER` +* Layer type: `Power` * CPU implementation: `./src/caffe/layers/power_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/power_layer.cu` * Parameters (`PowerParameter power_param`) @@ -270,11 +270,11 @@ The `ABSVAL` layer computes the output as abs(x) for each input element x. - `shift` [default 0] * Sample - layers { + layer { name: "layer" bottom: "in" top: "out" - type: POWER + type: "Power" power_param { power: 1 scale: 1 @@ -282,16 +282,16 @@ The `ABSVAL` layer computes the output as abs(x) for each input element x. } } -The `POWER` layer computes the output as (shift + scale * x) ^ power for each input element x. +The `Power` layer computes the output as (shift + scale * x) ^ power for each input element x. #### BNLL -* LayerType: `BNLL` +* Layer type: `BNLL` * CPU implementation: `./src/caffe/layers/bnll_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/bnll_layer.cu` * Sample - layers { + layer { name: "layer" bottom: "in" top: "out" @@ -309,7 +309,7 @@ Common input preprocessing (mean subtraction, scaling, random cropping, and mirr #### Database -* LayerType: `DATA` +* Layer type: `Data` * Parameters - Required - `source`: the name of the directory containing the database @@ -322,7 +322,7 @@ Common input preprocessing (mean subtraction, scaling, random cropping, and mirr #### In-Memory -* LayerType: `MEMORY_DATA` +* Layer type: `MemoryData` * Parameters - Required - `batch_size`, `channels`, `height`, `width`: specify the size of input chunks to read from memory @@ -331,7 +331,7 @@ The memory data layer reads data directly from memory, without copying it. In or #### HDF5 Input -* LayerType: `HDF5_DATA` +* Layer type: `HDF5Data` * Parameters - Required - `source`: the name of the file to read from @@ -339,7 +339,7 @@ The memory data layer reads data directly from memory, without copying it. In or #### HDF5 Output -* LayerType: `HDF5_OUTPUT` +* Layer type: `HDF5Output` * Parameters - Required - `file_name`: name of file to write to @@ -348,7 +348,7 @@ The HDF5 output layer performs the opposite function of the other layers in this #### Images -* LayerType: `IMAGE_DATA` +* Layer type: `ImageData` * Parameters - Required - `source`: name of a text file, with each line giving an image filename and label @@ -360,17 +360,17 @@ The HDF5 output layer performs the opposite function of the other layers in this #### Windows -`WINDOW_DATA` +`WindowData` #### Dummy -`DUMMY_DATA` is for development and debugging. See `DummyDataParameter`. +`DummyData` is for development and debugging. See `DummyDataParameter`. ### Common Layers #### Inner Product -* LayerType: `INNER_PRODUCT` +* Layer type: `InnerProduct` * CPU implementation: `./src/caffe/layers/inner_product_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/inner_product_layer.cu` * Parameters (`InnerProductParameter inner_product_param`) @@ -387,13 +387,13 @@ The HDF5 output layer performs the opposite function of the other layers in this - `n * c_o * 1 * 1` * Sample - layers { + layer { name: "fc8" - type: INNER_PRODUCT - blobs_lr: 1 # learning rate multiplier for the filters - blobs_lr: 2 # learning rate multiplier for the biases - weight_decay: 1 # weight decay multiplier for the filters - weight_decay: 0 # weight decay multiplier for the biases + type: "InnerProduct" + # learning rate and decay multipliers for the weights + param { lr_mult: 1 decay_mult: 1 } + # learning rate and decay multipliers for the biases + param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 1000 weight_filler { @@ -409,15 +409,15 @@ The HDF5 output layer performs the opposite function of the other layers in this top: "fc8" } -The `INNER_PRODUCT` layer (also usually referred to as the fully connected layer) treats the input as a simple vector and produces an output in the form of a single vector (with the blob's height and width set to 1). +The `InnerProduct` layer (also usually referred to as the fully connected layer) treats the input as a simple vector and produces an output in the form of a single vector (with the blob's height and width set to 1). #### Splitting -The `SPLIT` layer is a utility layer that splits an input blob to multiple output blobs. This is used when a blob is fed into multiple output layers. +The `Split` layer is a utility layer that splits an input blob to multiple output blobs. This is used when a blob is fed into multiple output layers. #### Flattening -The `FLATTEN` layer is a utility layer that flattens an input of shape `n * c * h * w` to a simple vector output of shape `n * (c*h*w) * 1 * 1`. +The `Flatten` layer is a utility layer that flattens an input of shape `n * c * h * w` to a simple vector output of shape `n * (c*h*w)` #### Reshape @@ -460,67 +460,67 @@ As another example, specifying `reshape_param { shape { dim: 0 dim: -1 } }` make #### Concatenation -* LayerType: `CONCAT` +* Layer type: `Concat` * CPU implementation: `./src/caffe/layers/concat_layer.cpp` * CUDA GPU implementation: `./src/caffe/layers/concat_layer.cu` * Parameters (`ConcatParameter concat_param`) - Optional - - `concat_dim` [default 1]: 0 for concatenation along num and 1 for channels. + - `axis` [default 1]: 0 for concatenation along num and 1 for channels. * Input - `n_i * c_i * h * w` for each input blob i from 1 to K. * Output - - if `concat_dim = 0`: `(n_1 + n_2 + ... + n_K) * c_1 * h * w`, and all input `c_i` should be the same. - - if `concat_dim = 1`: `n_1 * (c_1 + c_2 + ... + c_K) * h * w`, and all input `n_i` should be the same. + - if `axis = 0`: `(n_1 + n_2 + ... + n_K) * c_1 * h * w`, and all input `c_i` should be the same. + - if `axis = 1`: `n_1 * (c_1 + c_2 + ... + c_K) * h * w`, and all input `n_i` should be the same. * Sample - layers { + layer { name: "concat" bottom: "in1" bottom: "in2" top: "out" - type: CONCAT + type: "Concat" concat_param { - concat_dim: 1 + axis: 1 } } -The `CONCAT` layer is a utility layer that concatenates its multiple input blobs to one single output blob. Currently, the layer supports concatenation along num or channels only. +The `Concat` layer is a utility layer that concatenates its multiple input blobs to one single output blob. #### Slicing -The `SLICE` layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices. +The `Slice` layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices. * Sample - layers { + layer { name: "slicer_label" - type: SLICE + type: "Slice" bottom: "label" ## Example of label with a shape N x 3 x 1 x 1 top: "label1" top: "label2" top: "label3" slice_param { - slice_dim: 1 - slice_point: 1 - slice_point: 2 + axis: 1 + slice_point: 1 + slice_point: 2 } } -`slice_dim` indicates the target dimension and can assume only two values: 0 for num or 1 for channel; `slice_point` indicates indexes in the selected dimension (the number of indexes must be equal to the number of top blobs minus one). +`axis` indicates the target axis; `slice_point` indicates indexes in the selected dimension (the number of indices must be equal to the number of top blobs minus one). #### Elementwise Operations -`ELTWISE` +`Eltwise` #### Argmax -`ARGMAX` +`ArgMax` #### Softmax -`SOFTMAX` +`Softmax` #### Mean-Variance Normalization diff --git a/docs/tutorial/loss.md b/docs/tutorial/loss.md index aac561774bb..d2d0e77fbed 100644 --- a/docs/tutorial/loss.md +++ b/docs/tutorial/loss.md @@ -10,30 +10,30 @@ Hence, the goal of learning is to find a setting of the weights that *minimizes* The loss in Caffe is computed by the Forward pass of the network. Each layer takes a set of input (`bottom`) blobs and produces a set of output (`top`) blobs. Some of these layers' outputs may be used in the loss function. -A typical choice of loss function for one-versus-all classification tasks is the `SOFTMAX_LOSS` function, used in a network definition as follows, for example: +A typical choice of loss function for one-versus-all classification tasks is the `SoftmaxWithLoss` function, used in a network definition as follows, for example: - layers { + layer { name: "loss" - type: SOFTMAX_LOSS + type: "SoftmaxWithLoss" bottom: "pred" bottom: "label" top: "loss" } -In a `SOFTMAX_LOSS` function, the `top` blob is a scalar (dimensions $$1 \times 1 \times 1 \times 1$$) which averages the loss (computed from predicted labels `pred` and actuals labels `label`) over the entire mini-batch. +In a `SoftmaxWithLoss` function, the `top` blob is a scalar (empty shape) which averages the loss (computed from predicted labels `pred` and actuals labels `label`) over the entire mini-batch. ### Loss weights -For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a `SOFTMAX_LOSS` layer and reconstructs it using a `EUCLIDEAN_LOSS` layer), *loss weights* can be used to specify their relative importance. +For nets with multiple layers producing a loss (e.g., a network that both classifies the input using a `SoftmaxWithLoss` layer and reconstructs it using a `EuclideanLoss` layer), *loss weights* can be used to specify their relative importance. -By convention, Caffe layer types with the suffix `_LOSS` contribute to the loss function, but other layers are assumed to be purely used for intermediate computations. +By convention, Caffe layer types with the suffix `Loss` contribute to the loss function, but other layers are assumed to be purely used for intermediate computations. However, any layer can be used as a loss by adding a field `loss_weight: ` to a layer definition for each `top` blob produced by the layer. -Layers with the suffix `_LOSS` have an implicit `loss_weight: 1` for the first `top` blob (and `loss_weight: 0` for any additional `top`s); other layers have an implicit `loss_weight: 0` for all `top`s. -So, the above `SOFTMAX_LOSS` layer could be equivalently written as: +Layers with the suffix `Loss` have an implicit `loss_weight: 1` for the first `top` blob (and `loss_weight: 0` for any additional `top`s); other layers have an implicit `loss_weight: 0` for all `top`s. +So, the above `SoftmaxWithLoss` layer could be equivalently written as: - layers { + layer { name: "loss" - type: SOFTMAX_LOSS + type: "SoftmaxWithLoss" bottom: "pred" bottom: "label" top: "loss" diff --git a/docs/tutorial/net_layer_blob.md b/docs/tutorial/net_layer_blob.md index 1f0966f88a4..e8b7bd316a9 100644 --- a/docs/tutorial/net_layer_blob.md +++ b/docs/tutorial/net_layer_blob.md @@ -11,22 +11,20 @@ We will go over the details of these components in more detail. ## Blob storage and communication -A Blob is a wrapper over the actual data being processed and passed along by Caffe, and also under the hood provides synchronization capability between the CPU and the GPU. Mathematically, a blob is a 4-dimensional array that stores things in the order of (Num, Channels, Height and Width), from major to minor, and stored in a C-contiguous fashion. The main reason for putting Num (the name is due to legacy reasons, and is equivalent to the notation of "batch" as in minibatch SGD). +A Blob is a wrapper over the actual data being processed and passed along by Caffe, and also under the hood provides synchronization capability between the CPU and the GPU. Mathematically, a blob is an N-dimensional array stored in a C-contiguous fashion. -Caffe stores and communicates data in 4-dimensional arrays called blobs. Blobs provide a unified memory interface, holding data e.g. batches of images, model parameters, and derivatives for optimization. +Caffe stores and communicates data using blobs. Blobs provide a unified memory interface holding data; e.g., batches of images, model parameters, and derivatives for optimization. Blobs conceal the computational and mental overhead of mixed CPU/GPU operation by synchronizing from the CPU host to the GPU device as needed. Memory on the host and device is allocated on demand (lazily) for efficient memory usage. -The conventional blob dimensions for data are number N x channel K x height H x width W. Blob memory is row-major in layout so the last / rightmost dimension changes fastest. For example, the value at index (n, k, h, w) is physically located at index ((n * K + k) * H + h) * W + w. +The conventional blob dimensions for batches of image data are number N x channel K x height H x width W. Blob memory is row-major in layout, so the last / rightmost dimension changes fastest. For example, in a 4D blob, the value at index (n, k, h, w) is physically located at index ((n * K + k) * H + h) * W + w. - Number / N is the batch size of the data. Batch processing achieves better throughput for communication and device processing. For an ImageNet training batch of 256 images B = 256. - Channel / K is the feature dimension e.g. for RGB images K = 3. -Note that although we have designed blobs with its dimensions corresponding to image applications, they are named purely for notational purpose and it is totally valid for you to do non-image applications. For example, if you simply need fully-connected networks like the conventional multi-layer perceptron, use blobs of dimensions (Num, Channels, 1, 1) and call the InnerProductLayer (which we will cover soon). +Note that although many blobs in Caffe examples are 4D with axes for image applications, it is totally valid to use blobs for non-image applications. For example, if you simply need fully-connected networks like the conventional multi-layer perceptron, use 2D blobs (shape (N, D)) and call the InnerProductLayer (which we will cover soon). -Caffe operations are general with respect to the channel dimension / K. Grayscale and hyperspectral imagery are fine. Caffe can likewise model and process arbitrary vectors in blobs with singleton. That is, the shape of blob holding 1000 vectors of 16 feature dimensions is 1000 x 16 x 1 x 1. - -Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024. +Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1000 x 1024. For custom data it may be necessary to hack your own input preparation tool or data layer. However once your data is in your job is done. The modularity of layers accomplishes the rest of the work for you. @@ -95,9 +93,9 @@ A simple logistic regression classifier is defined by name: "LogReg" - layers { + layer { name: "mnist" - type: DATA + type: "Data" top: "data" top: "label" data_param { @@ -105,18 +103,18 @@ is defined by batch_size: 64 } } - layers { + layer { name: "ip" - type: INNER_PRODUCT + type: "InnerProduct" bottom: "data" top: "ip" inner_product_param { num_output: 2 } } - layers { + layer { name: "loss" - type: SOFTMAX_LOSS + type: "SoftmaxWithLoss" bottom: "ip" bottom: "label" top: "loss" @@ -135,19 +133,19 @@ Model initialization is handled by `Net::Init()`. The initialization mainly does I0902 22:52:17.935807 2079114000 data_layer.cpp:135] Opening leveldb input_leveldb I0902 22:52:17.937155 2079114000 data_layer.cpp:195] output data size: 64,1,28,28 I0902 22:52:17.938570 2079114000 net.cpp:103] Top shape: 64 1 28 28 (50176) - I0902 22:52:17.938593 2079114000 net.cpp:103] Top shape: 64 1 1 1 (64) + I0902 22:52:17.938593 2079114000 net.cpp:103] Top shape: 64 (64) I0902 22:52:17.938611 2079114000 net.cpp:67] Creating Layer ip I0902 22:52:17.938617 2079114000 net.cpp:394] ip <- data I0902 22:52:17.939177 2079114000 net.cpp:356] ip -> ip I0902 22:52:17.939196 2079114000 net.cpp:96] Setting up ip - I0902 22:52:17.940289 2079114000 net.cpp:103] Top shape: 64 2 1 1 (128) + I0902 22:52:17.940289 2079114000 net.cpp:103] Top shape: 64 2 (128) I0902 22:52:17.941270 2079114000 net.cpp:67] Creating Layer loss I0902 22:52:17.941305 2079114000 net.cpp:394] loss <- ip I0902 22:52:17.941314 2079114000 net.cpp:394] loss <- label I0902 22:52:17.941323 2079114000 net.cpp:356] loss -> loss # set up the loss and configure the backward pass I0902 22:52:17.941328 2079114000 net.cpp:96] Setting up loss - I0902 22:52:17.941328 2079114000 net.cpp:103] Top shape: 1 1 1 1 (1) + I0902 22:52:17.941328 2079114000 net.cpp:103] Top shape: (1) I0902 22:52:17.941329 2079114000 net.cpp:109] with loss weight 1 I0902 22:52:17.941779 2079114000 net.cpp:170] loss needs backward computation. I0902 22:52:17.941787 2079114000 net.cpp:170] ip needs backward computation. diff --git a/examples/mnist/readme.md b/examples/mnist/readme.md index ef7f5da67d5..269e53ab9b9 100644 --- a/examples/mnist/readme.md +++ b/examples/mnist/readme.md @@ -38,9 +38,9 @@ Specifically, we will write a `caffe::NetParameter` (or in python, `caffe.proto. Currently, we will read the MNIST data from the lmdb we created earlier in the demo. This is defined by a data layer: - layers { + layer { name: "mnist" - type: DATA + type: "Data" data_param { source: "mnist_train_lmdb" backend: LMDB @@ -57,14 +57,14 @@ Specifically, this layer has name `mnist`, type `data`, and it reads the data fr Let's define the first convolution layer: - layers { + layer { name: "conv1" - type: CONVOLUTION - blobs_lr: 1. - blobs_lr: 2. + type: "Convolution" + param { lr_mult: 1 } + param { lr_mult: 2 } convolution_param { num_output: 20 - kernelsize: 5 + kernel_size: 5 stride: 1 weight_filler { type: "xavier" @@ -81,15 +81,15 @@ This layer takes the `data` blob (it is provided by the data layer), and produce The fillers allow us to randomly initialize the value of the weights and bias. For the weight filler, we will use the `xavier` algorithm that automatically determines the scale of initialization based on the number of input and output neurons. For the bias filler, we will simply initialize it as constant, with the default filling value 0. -`blobs_lr` are the learning rate adjustments for the layer's learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates. +`lr_mult`s are the learning rate adjustments for the layer's learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates. ### Writing the Pooling Layer Phew. Pooling layers are actually much easier to define: - layers { + layer { name: "pool1" - type: POOLING + type: "Pooling" pooling_param { kernel_size: 2 stride: 2 @@ -107,11 +107,11 @@ Similarly, you can write up the second convolution and pooling layers. Check `$C Writing a fully connected layer is also simple: - layers { + layer { name: "ip1" - type: INNER_PRODUCT - blobs_lr: 1. - blobs_lr: 2. + type: "InnerProduct" + param { lr_mult: 1 } + param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { @@ -125,15 +125,15 @@ Writing a fully connected layer is also simple: top: "ip1" } -This defines a fully connected layer (for some legacy reason, Caffe calls it an `innerproduct` layer) with 500 outputs. All other lines look familiar, right? +This defines a fully connected layer (known in Caffe as an `InnerProduct` layer) with 500 outputs. All other lines look familiar, right? ### Writing the ReLU Layer A ReLU Layer is also simple: - layers { + layer { name: "relu1" - type: RELU + type: "ReLU" bottom: "ip1" top: "ip1" } @@ -142,11 +142,11 @@ Since ReLU is an element-wise operation, we can do *in-place* operations to save After the ReLU layer, we will write another innerproduct layer: - layers { + layer { name: "ip2" - type: INNER_PRODUCT - blobs_lr: 1. - blobs_lr: 2. + type: "InnerProduct" + param { lr_mult: 1 } + param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { @@ -164,9 +164,9 @@ After the ReLU layer, we will write another innerproduct layer: Finally, we will write the loss! - layers { + layer { name: "loss" - type: SOFTMAX_LOSS + type: "SoftmaxWithLoss" bottom: "ip2" bottom: "label" } @@ -178,7 +178,7 @@ The `softmax_loss` layer implements both the softmax and the multinomial logisti Layer definitions can include rules for whether and when they are included in the network definition, like the one below: - layers { + layer { // ...layer definition... include: { phase: TRAIN } } @@ -190,7 +190,7 @@ In the above example, this layer will be included only in `TRAIN` phase. If we change `TRAIN` with `TEST`, then this layer will be used only in test phase. By default, that is without layer rules, a layer is always included in the network. Thus, `lenet_train_test.prototxt` has two `DATA` layers defined (with different `batch_size`), one for the training phase and one for the testing phase. -Also, there is an `ACCURACY` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`. +Also, there is an `Accuracy` layer which is included only in `TEST` phase for reporting the model accuracy every 100 iteration, as defined in `lenet_solver.prototxt`. ## Define the MNIST Solver diff --git a/examples/siamese/readme.md b/examples/siamese/readme.md index ce98ec10819..83db8c94395 100644 --- a/examples/siamese/readme.md +++ b/examples/siamese/readme.md @@ -39,13 +39,19 @@ exactly the same as the [LeNet model](mnist.html), the only difference is that we have replaced the top layers that produced probabilities over the 10 digit classes with a linear "feature" layer that produces a 2 dimensional vector. - layers { + layer { name: "feat" - type: INNER_PRODUCT + type: "InnerProduct" bottom: "ip2" top: "feat" - blobs_lr: 1 - blobs_lr: 2 + param { + name: "feat_w" + lr_mult: 1 + } + param { + name: "feat_b" + lr_mult: 2 + } inner_product_param { num_output: 2 } @@ -64,17 +70,19 @@ earlier. Each entry in this database contains the image data for a pair of images (`pair_data`) and a binary label saying if they belong to the same class or different classes (`sim`). - layers { + layer { name: "pair_data" - type: DATA + type: "Data" top: "pair_data" top: "sim" - data_param { - source: "examples/siamese/mnist-siamese-train-leveldb" + include { phase: TRAIN } + transform_param { scale: 0.00390625 + } + data_param { + source: "examples/siamese/mnist_siamese_train_leveldb" batch_size: 64 } - include: { phase: TRAIN } } In order to pack a pair of images into the same blob in the database we pack one @@ -83,16 +91,16 @@ so we add a slice layer after the data layer. This takes the `pair_data` and slices it along the channel dimension so that we have a single image in `data` and its paired image in `data_p.` - layers { - name: "slice_pair" - type: SLICE - bottom: "pair_data" - top: "data" - top: "data_p" - slice_param { - slice_dim: 1 - slice_point: 1 - } + layer { + name: "slice_pair" + type: "Slice" + bottom: "pair_data" + top: "data" + top: "data_p" + slice_param { + slice_dim: 1 + slice_point: 1 + } } ### Building the First Side of the Siamese Net @@ -105,17 +113,17 @@ parameters allows Caffe to share the parameters between layers on both sides of the siamese net. In the definition this looks like: ... - param: "conv1_w" - param: "conv1_b" + param { name: "conv1_w" ... } + param { name: "conv1_b" ... } ... - param: "conv2_w" - param: "conv2_b" + param { name: "conv2_w" ... } + param { name: "conv2_b" ... } ... - param: "ip1_w" - param: "ip1_b" + param { name: "ip1_w" ... } + param { name: "ip1_b" ... } ... - param: "ip2_w" - param: "ip2_b" + param { name: "ip2_w" ... } + param { name: "ip2_b" ... } ... ### Building the Second Side of the Siamese Net @@ -133,9 +141,9 @@ an Invariant Mapping". This loss function encourages matching pairs to be close together in feature space while pushing non-matching pairs apart. This cost function is implemented with the `CONTRASTIVE_LOSS` layer: - layers { + layer { name: "loss" - type: CONTRASTIVE_LOSS + type: "ContrastiveLoss" contrastive_loss_param { margin: 1.0 } diff --git a/matlab/caffe/hdf5creation/demo.m b/matlab/caffe/hdf5creation/demo.m index f554b87e5f6..4f9f7b5a454 100644 --- a/matlab/caffe/hdf5creation/demo.m +++ b/matlab/caffe/hdf5creation/demo.m @@ -52,9 +52,9 @@ fprintf('HDF5 filename listed in %s \n', 'list.txt'); % NOTE: In net definition prototxt, use list.txt as input to HDF5_DATA as: -% layers { +% layer { % name: "data" -% type: HDF5_DATA +% type: "HDF5Data" % top: "data" % top: "labelvec" % hdf5_data_param {