Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

int->size_t to support large datasets more than 2G instances #2473

Closed
wants to merge 1 commit into from

Conversation

buaaliyi
Copy link
Contributor

When I was trying to use Caffe to train my large dataset (billions of instances), I found the class 'SyncedMemory' uses data type 'size_t' to alloc memories, while blob.count_ and blob.capacity_ is of type 'int'. As a result, this have cut off the alloc size to less than 2GB, and my experiment was failed due to the pointer overflow.
This patch fixed the data size related types from int to size_t that guarantee to use the correct size on 64bit machine, even though dataset size is over 2 billions.

Thanks for the review.

@mynameisfiber
Copy link

Excited to see this patch pass the Travis tests... I'm running into the same issue!

tags
TAGS
*.tags
cscope.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of this patch.

@jeffdonahue
Copy link
Contributor

Thanks for the PR @buaaliyi, and for the reviewing efforts @flx42. I think we do eventually want to increase the blob size limit. A couple of comments:

  • In Blobs are N-D arrays (for N not necessarily equals 4) #1970 (comment) we effectively set a ceiling of INT64_MAX on future blob size -- see the discussion there for reasoning on not using unsigned ints (in short, there are places in the code that use negative dimensions as special values) . So unless there is further discussion with good reasoning otherwise, this should use int64_t rather than size_t. (Maybe typedef'ing is the way to go? blob_size_t is so verbose though...)
  • There are a lot more places in the codebase that need to be updated before a PR increasing the blob size limit could be merged -- I think almost every C source file currently has ints based on the current max size of blobs lurking about.

@flx42
Copy link
Contributor

flx42 commented May 17, 2015

Why not use size_t? Functions malloc and cudaMalloc* take size_t as the size argument. If int64_t is used instead it would require additional range checks before allocating memory, especially on a 32-bit architecture where we likely have sizeof(size_t) != sizeof(int64_t)

@buaaliyi
Copy link
Contributor Author

This patch has been updated base on @flx42 's comments

Thank you @jeffdonahue and @flx42 for your advices. Let me do a further check, which to fix those places base on the current block max size, besides MemoryDataLayer.

@buaaliyi buaaliyi force-pushed the master branch 2 times, most recently from 1552c20 to 2581f18 Compare May 18, 2015 07:40
@jmcq
Copy link

jmcq commented Oct 6, 2015

I need this same change. I had just filed #3159 in error because I did not search properly, and will close it if I can.

I would resolve this by using ssize_t (signed size_t) everywhere you use int to hold a size but are willing to forgo the highest bit in order to get special negative values. Wherever you are using unsigned int, you could use size_t. This should be a straightforward change, at least for g++ on linux, though it will probably touch most files.

I can prepare a CPU-tested change set for a pull request if one of the developers is willing to consider it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants