Replies: 1 comment
-
(Assuming what you are asking is how many Git repos can run through Nosey Parker using a single datastore) We have internally run through several different multi-terabyte input corpora. The one I'm most familiar with is about 2TiB from 8k Git repos, resulting in a 4.3GB datastore database file, with 105k findings and 4.2M matches. To scan that, I use machines with between 12 and 32 vCPUs, with 64-128GB of RAM. The limit that would likely be hit first is main memory use: at present, a set of SHA1 hashes for every file encountered is held in main memory. That's at least 20 bytes per distinct file, so ~20GiB if you saw 1 billion distinct files. I would expect that Nosey Parker with a single data store would scale to 10s of thousands of input Git repos in a single |
Beta Was this translation helpful? Give feedback.
-
What is the max # of repos I can put in a single data repo? Mostly curios. 20 200 2000 20000 etc?
Beta Was this translation helpful? Give feedback.
All reactions