A Scanner is similar to Java's Stream but targeted at common operations for working with databases. Datarouter uses scanners internally and often returns them so the application can chain more operations to them.
A Scanner can be converted to a single-use Iterable with .iterable()
or to a Stream with .stream()
.
<dependency>
<groupId>io.datarouter</groupId>
<artifactId>datarouter-scanner</artifactId>
<version>0.0.126</version>
</dependency>
These methods share behavior with those in Stream but are implemented independently:
map
distinct
sort
(Streamsorted
)limit
skip
forEach
reduce
findMin
(Streammin
)findMax
(Streammax
)count
anyMatch
allMatch
noneMatch
findFirst
empty
of
toArray
concat
- Not built into java, so you must call
Scanner.of(something)
instead ofsomething.stream()
- No primitive support
- Less overhead as there are fewer objects involved
- Less focused on behind-the-scenes parallelism for simplicity
- Scanner is missing findAny() as it's equivalent to findFirst()
- More explicit parallelism on a step by step basis
- Specify an executor and thread count for each parallel step
hasAny
- return true when the first item is seenisEmpty
- return true if the scanner completes without seeing any itemsfindLast
- returnsOptional<T>
with the last item, if any foundcollect
- uses theSupplier<Collection>
to create a collection, thenadd
s each item to it- equivalent to
stream.collect(Collectors.toCollection(TreeSet::new))
- equivalent to
list
- collect all items to aList
- equivalent to
stream.toList()
- equivalent to
listTo
- collect all items to aList
and pass it to aFunction
- equivalent to
stream.collect(Collectors.collectingAndThen(Collectors.toList(), function))
- equivalent to
toMap
- collect all items to aMap
. By default existing values will be overwrittenkeyFunction
- required to extract the map key
groupBy
- collect all items to aMap
where each value is aCollection
, with 4 variantskeyFunction
- required to extract the map keyvalueFunction
- optionally transform each item before collecting in the mapmapSupplier
- optionalSupplier<Map>
to replace the defaultHashMap::new
collectionSupplier
- optionalSupplier<Collection>
to replace the defaultArrayList::new
each
- each item passed to aConsumer
- unlike
Stream::peek
all items are guaranteed to be consumed
- unlike
flush
- all items collected to aList
and passed to aConsumer
- the
Scanner
can be continued with the logic unchanged
- the
include
- keep items matching thePredicate
- equivalent to
Stream::filter
- equivalent to
exclude
- discard items matching thePredicate
distinctBy
- remove items where the output of the function has already been seendeduplicateConsecutive
- remove consecutive duplicates- as opposed to
distinct()
which removes all duplicates
- as opposed to
deduplicateConsecutiveBy
- remove items where the function maps to the previously mapped value
advanceUntil
- terminate the Scanner when thePredicate
passesadvanceWhile
- terminate the Scanner when thePredicate
fails- equivalent to
Stream::takeWhile
- equivalent to
concat
- similar to Stream's flatMap or concat
- output the contents of the first scanner, followed by the second, third, etc
- efficient, requiring no memory buffering
collate
- no equivalent in Stream
- assuming the input scanners are sorted, merges them into a sorted output stream, useful for scanning partitioned tables
- the first item of each scanner must be in memory (sometimes triggering a batch of items loaded into memory), potentially making this expensive with many input scanners
take
- collect N items to a Listbatch
- convertScanner<T>
toScanner<List<T>>
with batch size Nsample
- return every Nth itemretain
- convertScanner<T>
toScanner<RetainingGroup<T>>
which gives access to the previous N itemsprefetch
- load the next N items using the providedExecutorService
shuffle
- collect the items internally and randomly select one of the remaining items on eachadvance()
splitBy
- splitScanner<T>
intoScanner<Scanner<T>>
based on the provided mapperFunction<T,R>
apply
- Apply the provided Function which returns another Scanner. The returned Scanner is responsible for consuming the input Scanner.then
- pass the Scanner to a method that accepts it, and invoke the method. The method is responsible for terminating the Scanner.
Scanner
supports Java's comprehensive Collector
library by internally converting to Stream
before collecting.
Calling .parallelOrdered(..)
or .parallelUnordered(..)
returns a ParallelScanner
which executes the next operation in an executor, useful when the operations are CPU or IO intensive.
- Specifying unordered allows the results to be reordered for better throughput, not blocking executor slots on a slow item.
- The
Threads
param gives the Scanner an ExecutorService and a thread count, where the thread count may be much lower than the underlying executor's max capacity, facilitating sharing an executor between many callers such as web requests. - The
enabled
flag allows for enabling or disabling of the parallelism via a runtime setting.
It has these methods:
map
exclude
include
each
forEach
This library is licensed under the Apache License, Version 2.0 - see LICENSE for details.