Skip to content

Auntie, my dear ultra-fast module for untying/splitting/counting a stream of data by a chosen separator sequence.

License

Notifications You must be signed in to change notification settings

rootslab/auntie

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auntie

NPM VERSION CODACY BADGE CODECLIMATE-TEST-COVERAGE LICENSE

NODE VERSION TRAVIS CI BUILD BUILD STATUS DEVDEPENDENCY STATUS

NPM MONTHLY NPM YEARLY NPM TOTAL

NPM GRAPH

Auntie, my dear ultra-fast module for untying/splitting/counting a stream of data by a chosen separator sequence.

It uses Bop under the hood, a Boyer-Moore parser, optimized for sequence lengths up to 255 bytes.

Table of Contents


Install

$ npm install auntie [-g]

require:

const Auntie  = require( 'auntie' );

Run Tests

to run all test files, install devDependencies:

 $ cd auntie/
 # install or update devDependencies
 $ npm install 
 # run tests
 $ npm test

to execute a single test file, simply do:

 $ node test/file-name.js

output example and running time:

...
- current path is 'test'.
- time elapsed: 106.596 secs.

  26 test files were loaded.
  26 test files were launched.
  1272671 assertions succeeded.

Run Benchmarks

$ cd auntie/
$ npm run bench

to execute a single bench file, simply do:

 $ node bench/file-name.js

Constructor

Arguments between [ ] are optional.

Auntie( [ Buffer | String | Number sequence ] )

or

new Auntie( [ Buffer | String | Number sequence ] )

NOTE: default is the CRLF sequence \r\n.

Properties

NOTE: do not mess up with these properties.

The current sequence for splitting data
Auntie.seq : Buffer
the Boyer-Moore parser, under the hood.
Auntie.bop : Bop
a Boyer-Moore parser, to search for generic (sub)sequences
Auntie.gbop : Bop
the remaining data, without any match found.
Auntie.snip : Buffer
the remaining data, used for counting.
Auntie.csnip : Buffer
the current number of matches, min/max distance, remaining bytes.
Auntie.cnt : Array

Methods

name description
count count (only) how many times the sequence appears in the current data.
dist count occurrences, min and max distance between sequences and remaining bytes.
do split data or a stream of data by the current sequence.
flush flush the remaining data, resetting internal state/counters.
set set a new sequence for splitting data.
comb search a char or a sequence into the current data.

Arguments between [ ] are optional.

Auntie.count

the fastest/lightest way to count how many times the sequence appears in the current data.
/*
 * it returns an Array with the current number of occurrences.
 * 
 * NOTE: it saves the minimum necessary data that does not contains
 * the sequence, for the next #count call with fresh data (to check
 * for single occurrences between 2 chunks of data.
 */
'count' : function ( Buffer data ) : Array

Auntie.dist

count occurrences, min and max distance between sequences and remaining bytes.
/*
 * it returns an Array with:
 * - the current number of occurrences 
 * - the minimum distance, in bytes, between any 2 sequences
 * - the maximum distance, in bytes, between any 2 sequences
 * - the remaining bytes to the end of data (without any matching sequence)
 * 
 * NOTE:
 * - also the distance from index 0 to the first match will be considered
 * - it saves the remaining data that does not contains the sequence,
 *   for the next #dist call with fresh data, to check for occurrences
 *   between chunks).
 */
'dist' : function ( Buffer data ) : Array

Auntie.do

split data or a stream of data by the current sequence
/*
 * if collect is true, it returns an Array of data slices; otherwise, it 
 * emits a 'snap' event for every slice; then, after having finished to
 * parse data, it emits a 'snip' event, with the remaining data that does
 * not contain the sequence ( the current Auntie.snip property ).
 *
 * NOTE: it saves the remaining data that does not contains the
 * sequence, for the next #do call on fresh data (to check for 
 * occurrences between chunks).
 */
'do' : function ( Buffer data [, Boolean collect ] ) : [ Array ]

Auntie.flush

flush the remaining data, resetting internal state/counters
/*
 * if collect is true it returns a Buffer, otherwise it emits 
 * a 'snip' event with data. Obviously the snip doesn't contain
 * the sequence (no match). It is equal to get and reset the
 * internal me.snip property.
 */
'flush' : function ( [ Boolean collect ] ) : [ Buffer ]

Auntie.set

set a new sequence for splitting data.
// default sequence is '\r\n' or CRLF sequence.
'set' : function ( [ Buffer | String | Number sequence ] ) : Auntie

Auntie.comb

search for a char or a sequence into the current data.
/*
 * parse current data for a generic sequence. It returns an Array of indexes.
 * NOTE: it doesn't affect the current streaming parser and it doesn't save
 * any data. It simply parses a chunk of data for the specified sequence,
 * optionally from a starting index and limiting results to a specified number
 * of occurrences (like Bop.parse does).
 */
'comb' : function ( Buffer | String seq, Buffer data [, Number from [, Number limit ] ] ) : Array

Events

Auntie emits only 2 types of events: snap and snip.

!snap a result.
'snap' : function ( Buffer result )
!snip current remaining data (with no match found).
'snip' : function ( Buffer result )

NOTE: if the 'collect' switch for the do/flush was set (true), then no event will be emitted.


Examples

split lines from a CSV file (CRLF):

count lines from a file (CRLF):

snap event and collect (CRLF):

See All examples.

MIT License

Copyright (c) 2017-present < Guglielmo Ferri : 44gatti@gmail.com >

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the 'Software'), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

Auntie, my dear ultra-fast module for untying/splitting/counting a stream of data by a chosen separator sequence.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published