Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgMaxLayer with top k predictions #615

Merged
merged 10 commits into from
Jul 20, 2014
Merged

ArgMaxLayer with top k predictions #615

merged 10 commits into from
Jul 20, 2014

Conversation

kloudkl
Copy link
Contributor

@kloudkl kloudkl commented Jul 4, 2014

To meet the needs for multiple top predictions in #499 and #598, the argmax is extended to output the top k results.

Unlike the implementation of the top k accuracy layer in #531, the argmax layer doesn't assume the input probabilities to be sorted. It picks the top k results with a priority queue.

Candidate reviewers:
@sguada who authored #421 Argmax layer
@robwhess who authored #531 Top-k accuracy
@shuokay who asked #598 How to make top-k prediction

@kloudkl kloudkl mentioned this pull request Jul 4, 2014
blob_top_(new Blob<Dtype>()) {
: blob_bottom_(new Blob<Dtype>(10, 20, 1, 1)),
blob_top_(new Blob<Dtype>()),
top_k_(10) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use top_k_(5)

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 5, 2014

@sguada, thanks for your reviewing efforts! The new functionality is tested more thoroughly and ready to be merged!

for (int j = 0; j < top_k_; ++j) {
top_data[i * 2 * top_k_ + (top_k_ - 1 - j) * 2] =
top_k_results.top().first;
top_data[i * 2 * top_k_ + (top_k_ - 1 - j) * 2 + 1] =
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kloudkl, is there a cleaner way to index into top_data here and down on line 73? E.g. using Blob::offset() seems like it'd be a lot cleaner/easier to read.

@robwhess
Copy link

robwhess commented Jul 8, 2014

@kloudkl, this may just be a biased preference for my own code, but for the sake of consistent implementation of the same functionality, I'd rather see the truncated insertion sort from the AccuracyLayer (accuracy_layer.cpp lines 46-55) used to find the top k probabilities and their corresponding indices here as well instead of the priority queue. It seems strange to me to implement the same thing two different ways in the same project, and I have a slight aesthetic preference for the code I wrote :)

@sguada
Copy link
Contributor

sguada commented Jul 8, 2014

I agree with @robwhess don't reinvent new code, try to reuse as much as possible.

Contrary to your claim, #531 don't assume that the inputs are ordered.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 8, 2014

The sorting algorithm is only a small part of the whole system, so it may not be worth much optimization. In terms of readability and conciseness, the std::sort is no doubt the winner. If you agree, I'd like to unify all the sorting codes to directly reuse it rather than "reinvent new code".

Insertion sort

Best case: O(n + d), where d is the number of inversions
Average case: O(n * n)
Worst case: O(n * n)
Memory: O(1)

Heapsort

Best case: O(n * log n) 
Average case: O(n * log n)
Worst case: O(n * log n)
Memory: O(1)

std::sort

O(N·log(N)), where N = std::distance(first, last) comparisons on average. (until C++11)
O(N·log(N)), where N = std::distance(first, last) comparisons. (since C++11)

@robwhess
Copy link

robwhess commented Jul 8, 2014

@kloudkl we don't need to sort all of the probabilities, just pick the top 5 (or k). That's what you're doing with the priority queue, and that's what I'm doing with the truncated insertion sort in AccuracyLayer. Both of our approaches are O(n), though I doubt algorithmic complexity is a huge concern for this layer. I'm just suggesting we use the same approach both places, and I prefer the truncated insertion sort because it's less bulky (no extra comparator class), and I think easier to read and understand.

@shuokay
Copy link

shuokay commented Jul 9, 2014

@kloudkl ,I think std::partial_sort can help
@robwhess

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 9, 2014

@shuokay, very cool!

template<typename Dtype>
bool int_Dtype_pair_greater(std::pair<int, Dtype> a,
std::pair<int, Dtype> b) {
return a.second > b.second || (a.second == b.second && a.first > b.first);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kloudkl is there a reason we need a stable sort here? Can't we drop the second term of the OR here? It seems both unlikely that classes will have exactly the same probability and, if they do, unimportant that we keep them in the (arbitrary) order of the assigned indices. The second term here just seems to me to just add extra clutter to the code.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, also, I just realized this was in common_layers.hpp. That doesn't seem like the right place for this. Can we just put it in argmax_layer.cpp as _int_Dtype_pair_greater()? I think that'd make more sense.

@robwhess
Copy link

robwhess commented Jul 9, 2014

Yes, I like this solution quite a bit better than both the original solutions. I didn't know about partial_sort(). Would it be appropriate to make a corresponding change to the AccuracyLayer in this PR?

Other than that, the only concern I have with this PR is the line comment I made above.

@robwhess
Copy link

robwhess commented Jul 9, 2014

Oh, one more thing. There's a lint error introduced:

src/caffe/layers/argmax_layer.cpp:39:  Add #include <utility> for pair<>  [build/include_what_you_use] [4]

@kloudkl, can you add #include <utility> to argmax_layer.cpp?

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 10, 2014

@robwhess, everything you requested is in place.

@robwhess
Copy link

@kloudkl, that all looks good. One more very minor request: can you make the int_Dtype_pair_greater() functions static and begin their names with an underscore so it's clear that they only have file scope?

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 11, 2014

In Caffe, there is not a single function or method with a leading or training underscore. It's better to follow the convention.

@robwhess
Copy link

OK. That's fine, though to be fair, there also aren't any static functions in the src/caffe/ portion of the code. It's just a personal preference, though. I'll run tests tomorrow.

@robwhess
Copy link

Cool, tests are all passing for me.

@sguada, @shelhamer, @jeffdonahue, @longjon, @sergeyk, I think this PR can be merged.

@robwhess
Copy link

Oh, wait, nevermind, don't merge yet. @kloudkl, the lint error I mentioned above now also applies to accuracy_layer.cpp. Can you add #include <utility> there before this is merged?

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 12, 2014

@robwhess, it is added. Thanks for your help!

@robwhess
Copy link

Cool. I think we're ready to merge here.

@bhack
Copy link
Contributor

bhack commented Jul 13, 2014

Could be extended to support a vector of bottom blob labels instead of single blob?

@robwhess
Copy link

@bhack I think that's out of the scope of this PR. This should be merged, as is.

@shelhamer
Copy link
Member

Ok looks good to me, but I'm traveling and only took a quick glance so @longjon please review and merge.

Thanks for your work everybody!

@bhack
Copy link
Contributor

bhack commented Jul 14, 2014

@kloudkl @robwhess Handling a vector of bottom (i.e. multiple softmax) could be useful for this #596 #680

}
std::partial_sort(
bottom_data_vector.begin(), bottom_data_vector.begin() + top_k_,
bottom_data_vector.end(), int_Dtype_pair_greater<Dtype>);
// check if true label is in top k predictions
for (int k = 0; k < top_k_; k++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use curly braces for loop bodies. (Although Google C++ style guide doesn't require it for single line statements, as far as I know we always use explicit curly braces in Caffe.)

@longjon
Copy link
Contributor

longjon commented Jul 14, 2014

Looks good except as noted. I like using std::partial_sort; building vectors is probably far from an optimally efficient implementation, but there's no reason to worry about that for new functionality that's not a bottleneck.

@kloudkl
Copy link
Contributor Author

kloudkl commented Jul 20, 2014

All done. Any more concerns?

@longjon longjon merged commit a928148 into BVLC:dev Jul 20, 2014
longjon added a commit that referenced this pull request Jul 20, 2014
@longjon
Copy link
Contributor

longjon commented Jul 20, 2014

Merged. I took the liberty of removing some now-unused includes. Thanks @kloudkl and @robwhess for getting this done; I think we ended up with a nice, tight implementation.

@kloudkl kloudkl deleted the top-k-argmax branch July 21, 2014 09:16
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
RazvanRanca pushed a commit to RazvanRanca/caffe that referenced this pull request Nov 4, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants