ArgMaxLayer with top k predictions #615

kloudkl · 2014-07-04T09:29:20Z

To meet the needs for multiple top predictions in #499 and #598, the argmax is extended to output the top k results.

Unlike the implementation of the top k accuracy layer in #531, the argmax layer doesn't assume the input probabilities to be sorted. It picks the top k results with a priority queue.

Candidate reviewers:
@sguada who authored #421 Argmax layer
@robwhess who authored #531 Top-k accuracy
@shuokay who asked #598 How to make top-k prediction

sguada · 2014-07-04T16:54:11Z

src/caffe/test/test_argmax_layer.cpp

-        blob_top_(new Blob<Dtype>()) {
+      : blob_bottom_(new Blob<Dtype>(10, 20, 1, 1)),
+        blob_top_(new Blob<Dtype>()),
+        top_k_(10) {


I would use top_k_(5)

kloudkl · 2014-07-05T09:21:08Z

@sguada, thanks for your reviewing efforts! The new functionality is tested more thoroughly and ready to be merged!

robwhess · 2014-07-08T04:51:54Z

src/caffe/layers/argmax_layer.cpp

+      for (int j = 0; j < top_k_; ++j) {
+        top_data[i * 2 * top_k_ + (top_k_ - 1 - j) * 2] =
+            top_k_results.top().first;
+        top_data[i * 2 * top_k_ + (top_k_ - 1 - j) * 2 + 1] =


@kloudkl, is there a cleaner way to index into top_data here and down on line 73? E.g. using Blob::offset() seems like it'd be a lot cleaner/easier to read.

robwhess · 2014-07-08T05:10:40Z

@kloudkl, this may just be a biased preference for my own code, but for the sake of consistent implementation of the same functionality, I'd rather see the truncated insertion sort from the AccuracyLayer (accuracy_layer.cpp lines 46-55) used to find the top k probabilities and their corresponding indices here as well instead of the priority queue. It seems strange to me to implement the same thing two different ways in the same project, and I have a slight aesthetic preference for the code I wrote :)

sguada · 2014-07-08T05:18:03Z

I agree with @robwhess don't reinvent new code, try to reuse as much as possible.

Contrary to your claim, #531 don't assume that the inputs are ordered.

kloudkl · 2014-07-08T12:19:57Z

The sorting algorithm is only a small part of the whole system, so it may not be worth much optimization. In terms of readability and conciseness, the std::sort is no doubt the winner. If you agree, I'd like to unify all the sorting codes to directly reuse it rather than "reinvent new code".

Insertion sort

Best case: O(n + d), where d is the number of inversions
Average case: O(n * n)
Worst case: O(n * n)
Memory: O(1)

Heapsort

Best case: O(n * log n) 
Average case: O(n * log n)
Worst case: O(n * log n)
Memory: O(1)

std::sort

O(N·log(N)), where N = std::distance(first, last) comparisons on average. (until C++11)
O(N·log(N)), where N = std::distance(first, last) comparisons. (since C++11)

robwhess · 2014-07-08T16:06:06Z

@kloudkl we don't need to sort all of the probabilities, just pick the top 5 (or k). That's what you're doing with the priority queue, and that's what I'm doing with the truncated insertion sort in AccuracyLayer. Both of our approaches are O(n), though I doubt algorithmic complexity is a huge concern for this layer. I'm just suggesting we use the same approach both places, and I prefer the truncated insertion sort because it's less bulky (no extra comparator class), and I think easier to read and understand.

shuokay · 2014-07-09T09:02:32Z

@kloudkl ,I think std::partial_sort can help
@robwhess

kloudkl · 2014-07-09T09:31:05Z

@shuokay, very cool!

robwhess · 2014-07-09T21:26:40Z

include/caffe/common_layers.hpp

+template<typename Dtype>
+bool int_Dtype_pair_greater(std::pair<int, Dtype> a,
+                            std::pair<int, Dtype> b) {
+  return a.second > b.second || (a.second == b.second && a.first > b.first);


@kloudkl is there a reason we need a stable sort here? Can't we drop the second term of the OR here? It seems both unlikely that classes will have exactly the same probability and, if they do, unimportant that we keep them in the (arbitrary) order of the assigned indices. The second term here just seems to me to just add extra clutter to the code.

Oh, also, I just realized this was in common_layers.hpp. That doesn't seem like the right place for this. Can we just put it in argmax_layer.cpp as _int_Dtype_pair_greater()? I think that'd make more sense.

robwhess · 2014-07-09T21:28:25Z

Yes, I like this solution quite a bit better than both the original solutions. I didn't know about partial_sort(). Would it be appropriate to make a corresponding change to the AccuracyLayer in this PR?

Other than that, the only concern I have with this PR is the line comment I made above.

robwhess · 2014-07-09T22:16:40Z

Oh, one more thing. There's a lint error introduced:

src/caffe/layers/argmax_layer.cpp:39:  Add #include <utility> for pair<>  [build/include_what_you_use] [4]

@kloudkl, can you add #include <utility> to argmax_layer.cpp?

kloudkl · 2014-07-10T01:39:54Z

@robwhess, everything you requested is in place.

robwhess · 2014-07-10T19:51:20Z

@kloudkl, that all looks good. One more very minor request: can you make the int_Dtype_pair_greater() functions static and begin their names with an underscore so it's clear that they only have file scope?

kloudkl · 2014-07-11T02:24:20Z

In Caffe, there is not a single function or method with a leading or training underscore. It's better to follow the convention.

robwhess · 2014-07-11T03:48:26Z

OK. That's fine, though to be fair, there also aren't any static functions in the src/caffe/ portion of the code. It's just a personal preference, though. I'll run tests tomorrow.

robwhess · 2014-07-11T21:16:30Z

Cool, tests are all passing for me.

@sguada, @shelhamer, @jeffdonahue, @longjon, @sergeyk, I think this PR can be merged.

robwhess · 2014-07-11T21:18:45Z

Oh, wait, nevermind, don't merge yet. @kloudkl, the lint error I mentioned above now also applies to accuracy_layer.cpp. Can you add #include <utility> there before this is merged?

kloudkl · 2014-07-12T16:31:35Z

@robwhess, it is added. Thanks for your help!

robwhess · 2014-07-12T17:42:51Z

Cool. I think we're ready to merge here.

bhack · 2014-07-13T16:23:58Z

Could be extended to support a vector of bottom blob labels instead of single blob?

robwhess · 2014-07-14T02:20:10Z

@bhack I think that's out of the scope of this PR. This should be merged, as is.

shelhamer · 2014-07-14T08:10:51Z

Ok looks good to me, but I'm traveling and only took a quick glance so @longjon please review and merge.

Thanks for your work everybody!

bhack · 2014-07-14T11:11:12Z

@kloudkl @robwhess Handling a vector of bottom (i.e. multiple softmax) could be useful for this #596 #680

longjon · 2014-07-14T20:10:09Z

src/caffe/layers/accuracy_layer.cpp

    }
+    std::partial_sort(
+        bottom_data_vector.begin(), bottom_data_vector.begin() + top_k_,
+        bottom_data_vector.end(), int_Dtype_pair_greater<Dtype>);
    // check if true label is in top k predictions
    for (int k = 0; k < top_k_; k++)


Please use curly braces for loop bodies. (Although Google C++ style guide doesn't require it for single line statements, as far as I know we always use explicit curly braces in Caffe.)

longjon · 2014-07-14T21:27:46Z

Looks good except as noted. I like using std::partial_sort; building vectors is probably far from an optimally efficient implementation, but there's no reason to worry about that for new functionality that's not a bottleneck.

kloudkl · 2014-07-20T06:43:23Z

All done. Any more concerns?

longjon · 2014-07-20T10:24:22Z

Merged. I took the liberty of removing some now-unused includes. Thanks @kloudkl and @robwhess for getting this done; I think we ended up with a nice, tight implementation.

kloudkl mentioned this pull request Jul 4, 2014

How to make top-k predict #598

Closed

sguada reviewed Jul 4, 2014
View reviewed changes

robwhess reviewed Jul 8, 2014
View reviewed changes

robwhess reviewed Jul 9, 2014
View reviewed changes

shelhamer assigned longjon Jul 14, 2014

longjon reviewed Jul 14, 2014
View reviewed changes

kloudkl added 10 commits July 20, 2014 00:26

Extend the ArgMaxLayer to output top k results

7722514

Add the test cases for the mulitple top predictions argmax layer

b9a9c58

Simplify the top-k argmax layer using std::sort

dfe69b2

Use std::partial_sort in the ArgMaxLayer as suggested by @shuokay

4697b3f

Move compararing function from common_layers to argmax_layer

7b1a2ca

Refactor the accuracy layer with std::partial_sort

907c78a

Add more test cases for the accuracy layer

6abdb00

Limit the comparison functions to have file scope

cf26171

Include <utility> for pair in the accuracy layer

fe3f6aa

Fix style issues in accuracy & argmax layer

a928148

longjon merged commit a928148 into BVLC:dev Jul 20, 2014

longjon added a commit that referenced this pull request Jul 20, 2014

Merge pull request #615 from kloudkl/top-k-argmax

4bcd614

kloudkl deleted the top-k-argmax branch July 21, 2014 09:16

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#615 from kloudkl/top-k-argmax

0757478

RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014

Merge pull request BVLC#615 from kloudkl/top-k-argmax

ae1e861

PENGUINLIONG mentioned this pull request Sep 15, 2017

Test accuracy changes with test batch size #5621

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ArgMaxLayer with top k predictions #615

ArgMaxLayer with top k predictions #615

kloudkl commented Jul 4, 2014

sguada Jul 4, 2014

kloudkl commented Jul 5, 2014

robwhess Jul 8, 2014

robwhess commented Jul 8, 2014

sguada commented Jul 8, 2014

kloudkl commented Jul 8, 2014

robwhess commented Jul 8, 2014

shuokay commented Jul 9, 2014

kloudkl commented Jul 9, 2014

robwhess Jul 9, 2014

robwhess Jul 9, 2014

robwhess commented Jul 9, 2014

robwhess commented Jul 9, 2014

kloudkl commented Jul 10, 2014

robwhess commented Jul 10, 2014

kloudkl commented Jul 11, 2014

robwhess commented Jul 11, 2014

robwhess commented Jul 11, 2014

robwhess commented Jul 11, 2014

kloudkl commented Jul 12, 2014

robwhess commented Jul 12, 2014

bhack commented Jul 13, 2014

robwhess commented Jul 14, 2014

shelhamer commented Jul 14, 2014

bhack commented Jul 14, 2014

longjon Jul 14, 2014

longjon commented Jul 14, 2014

kloudkl commented Jul 20, 2014

longjon commented Jul 20, 2014

ArgMaxLayer with top k predictions #615

ArgMaxLayer with top k predictions #615

Conversation

kloudkl commented Jul 4, 2014

sguada Jul 4, 2014

Choose a reason for hiding this comment

kloudkl commented Jul 5, 2014

robwhess Jul 8, 2014

Choose a reason for hiding this comment

robwhess commented Jul 8, 2014

sguada commented Jul 8, 2014

kloudkl commented Jul 8, 2014

robwhess commented Jul 8, 2014

shuokay commented Jul 9, 2014

kloudkl commented Jul 9, 2014

robwhess Jul 9, 2014

Choose a reason for hiding this comment

robwhess Jul 9, 2014

Choose a reason for hiding this comment

robwhess commented Jul 9, 2014

robwhess commented Jul 9, 2014

kloudkl commented Jul 10, 2014

robwhess commented Jul 10, 2014

kloudkl commented Jul 11, 2014

robwhess commented Jul 11, 2014

robwhess commented Jul 11, 2014

robwhess commented Jul 11, 2014

kloudkl commented Jul 12, 2014

robwhess commented Jul 12, 2014

bhack commented Jul 13, 2014

robwhess commented Jul 14, 2014

shelhamer commented Jul 14, 2014

bhack commented Jul 14, 2014

longjon Jul 14, 2014

Choose a reason for hiding this comment

longjon commented Jul 14, 2014

kloudkl commented Jul 20, 2014

longjon commented Jul 20, 2014