Skip to content

Utility to split any text file in different ways. Largest confirmed file was 60 gigs.

License

Notifications You must be signed in to change notification settings

Datakido/TextFileSplitter

Repository files navigation

TextFileSplitter

This utility was originally created as part of an integration between internal databases at Ancestry.com and DREAMmail, and external email campaign management system. It has evolved into what it is today. It was originally hosted at the SystemWidgets website. This website died after one too many problems. The Datakido organization has taken over its care.

The GUI can not be open sourced as it contains commercial controls. It will be hosted at http://datakido.com once this site is configured.

ALERT!

You must have .NET 4.0 installed in order for this utility to work.

Command Line:

Usage: TextFileSplitter [options] -i= -o= Options:

-h= Tells the processor to insert the header into each file chunk. It will be assumed one header line is wanted, if no number is assigned to this parameter. -splitstrategy: This tells the processor what strategy to use: ls: Use the split by line strategy. kbs: Use the split by size strategy. boundary: Use the split by text boundary strategy. regex: Used to tell the processor to check for a regex boundry. topchunk: Used the split off one chunk strategy. -filepattern= Used to tell the processor to name each file chunk using this pattern. -boundaryasfilename Used in conjunction with the boundary and regex strategies. Will use the boundary as the filename -omitboundary Used with regex and boundary strategies. Will omit the boundary text/line. -testcontains Used in conjunction with the boundary strategy. The match should be partial. -testliteral Used in conjunction with the boundary strategy. The match should be literal.

Example command-line for splitting a file using bytes: -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -h -splitstrategy:kbs:100000

Example command-line for splitting a file using line counts -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -h -splitstrategy:ls:1000

Example command-line for splitting a file using line counts and 3 header lines -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -h=3 -splitstrategy:ls:1000

Example command-line for splitting a file using a text boundary with partial match -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -splitstrategy:boundary:A001 -testcontains

Example command-line for splitting a file using a text boundary with literal match -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -splitstrategy:boundary:A001 -testliteral

Example command-line for splitting a file using a regular expression -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -splitstrategy:regex:^A001|

Example command-line using file name patterns -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -filepattern:[FILENAME]_[SEQUENCE:0000] Result: MonthlyUpdate20070625_0001

Example command-line for splitting a file using a regular expression and boundary as file name -i=C:\Temp\MonthlyUpdate20070625.txt -o=C:\Temp -splitstrategy:regex:^[ -boundaryasfilename

Each file will be appended with a dash, then the number in the sequence. Using the filename above it would look like this:

MonthlyUpdate20070625-1.txt MonthlyUpdate20070625-2.txt MonthlyUpdate20070625-3.txt

File naming conventions can use the following tokens to customize the filename for each chunk. THESE ARE CASE SENSITIVE!

  • [FILENAME] - The name of the file without the extension
  • [SEQUENCE:0] - The file chunck number. Each 0 is a placeholder for a digit.
  • [DATE:format] - This will allow you to add a date in the format that you specify.
  • [EXT] - The file's extension.

The following characters can be used to separate parameters: / or - or -- -i, --i, or /i are all valid.

Parameter values can be delimited using = or :. h=3 or h:3 are all valid.

Releases

No releases published

Packages

No packages published

Languages