-
-
Notifications
You must be signed in to change notification settings - Fork 43
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Improve readStream to support more records
Previously, the WOF importer loaded all records into memory in one stream, and then processed and indexed the records in Elasticsearch in a second stream after the first stream was done. This has several problems: * It requires that all data can fit into memory. While this is not _so_bad for WOF admin data, where a reasonably new machine can handle things just fine, it's horrible for venue data, where there are already 10s of millions of records. * Its slower: by separating the disk and network I/O sections, they can't be interleaved to speed things up. * It doesn't give good feedback when running the importer that something is happening: the importer sits for several minutes loading records before the dbclient progress logs start displaying This change fixes all those issues, by processing all records in a single stream, starting at the highest hierarchy level, and finishing at the lowest, so that all records always have the admin data they need to be processed. Fixes #101 Fixes #7 Connects #94
- Loading branch information
1 parent
4b2e1a6
commit 7215708
Showing
4 changed files
with
85 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters