int->size_t to support large datasets more than 2G instances #2473

buaaliyi · 2015-05-16T06:25:49Z

When I was trying to use Caffe to train my large dataset (billions of instances), I found the class 'SyncedMemory' uses data type 'size_t' to alloc memories, while blob.count_ and blob.capacity_ is of type 'int'. As a result, this have cut off the alloc size to less than 2GB, and my experiment was failed due to the pointer overflow.
This patch fixed the data size related types from int to size_t that guarantee to use the correct size on 64bit machine, even though dataset size is over 2 billions.

Thanks for the review.

mynameisfiber · 2015-05-16T17:49:34Z

Excited to see this patch pass the Travis tests... I'm running into the same issue!

flx42 · 2015-05-17T00:55:47Z

.gitignore

+tags
+TAGS
+*.tags
+cscope.*


This should not be part of this patch.

jeffdonahue · 2015-05-17T01:26:06Z

Thanks for the PR @buaaliyi, and for the reviewing efforts @flx42. I think we do eventually want to increase the blob size limit. A couple of comments:

In Blobs are N-D arrays (for N not necessarily equals 4) #1970 (comment) we effectively set a ceiling of INT64_MAX on future blob size -- see the discussion there for reasoning on not using unsigned ints (in short, there are places in the code that use negative dimensions as special values) . So unless there is further discussion with good reasoning otherwise, this should use int64_t rather than size_t. (Maybe typedef'ing is the way to go? blob_size_t is so verbose though...)
There are a lot more places in the codebase that need to be updated before a PR increasing the blob size limit could be merged -- I think almost every C source file currently has ints based on the current max size of blobs lurking about.

flx42 · 2015-05-17T01:58:46Z

Why not use size_t? Functions malloc and cudaMalloc* take size_t as the size argument. If int64_t is used instead it would require additional range checks before allocating memory, especially on a 32-bit architecture where we likely have sizeof(size_t) != sizeof(int64_t)

buaaliyi · 2015-05-17T04:43:08Z

This patch has been updated base on @flx42 's comments

Thank you @jeffdonahue and @flx42 for your advices. Let me do a further check, which to fix those places base on the current block max size, besides MemoryDataLayer.

jmcq · 2015-10-06T16:33:15Z

I need this same change. I had just filed #3159 in error because I did not search properly, and will close it if I can.

I would resolve this by using ssize_t (signed size_t) everywhere you use int to hold a size but are willing to forgo the highest bit in order to get special negative values. Wherever you are using unsigned int, you could use size_t. This should be a straightforward change, at least for g++ on linux, though it will probably touch most files.

I can prepare a CPU-tested change set for a pull request if one of the developers is willing to consider it.

flx42 reviewed May 17, 2015
View reviewed changes

.gitignore

tags

TAGS

*.tags

cscope.*

Copy link

Contributor

flx42 May 17, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be part of this patch.

mynameisfiber mentioned this pull request May 17, 2015

syncedmem.hpp: Check failed: *ptr host allocation of size 158297088 failed #2474

Closed

buaaliyi force-pushed the master branch from 24f6599 to a5e76fb Compare May 17, 2015 04:37

buaaliyi force-pushed the master branch 2 times, most recently from 1552c20 to 2581f18 Compare May 18, 2015 07:40

Check output file descriptor

491b332

buaaliyi force-pushed the master branch from 2581f18 to 491b332 Compare August 26, 2015 03:42

buaaliyi closed this Aug 26, 2015

ronghanghu mentioned this pull request Oct 6, 2015

Data structures use int instead of size_t or long, issue for 64-bit machines #3159

Closed

tiferet mentioned this pull request Jan 4, 2016

Training with HDF5: blob size exceeds INT_MAX #3510

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

int->size_t to support large datasets more than 2G instances #2473

int->size_t to support large datasets more than 2G instances #2473

buaaliyi commented May 16, 2015

mynameisfiber commented May 16, 2015

flx42 May 17, 2015

jeffdonahue commented May 17, 2015

flx42 commented May 17, 2015

buaaliyi commented May 17, 2015

jmcq commented Oct 6, 2015

int->size_t to support large datasets more than 2G instances #2473

int->size_t to support large datasets more than 2G instances #2473

Conversation

buaaliyi commented May 16, 2015

mynameisfiber commented May 16, 2015

flx42 May 17, 2015

Choose a reason for hiding this comment

jeffdonahue commented May 17, 2015

flx42 commented May 17, 2015

buaaliyi commented May 17, 2015

jmcq commented Oct 6, 2015