I wanted to share a solution I recently developed to have allow parsing of Flat File (CSV) and application of different operations:
- reading/skipping some lines
- transposing (row-oriented to column-oriented)
- getting list of unique values from column
- intelligent computation of indicators
!!! All the CSV files need to be comma separated. !!!
Please download the latest library, create a custom library (Scripting):
Then deploy the library in your runtime.
All the scenario take the assumptions that the current Document contains Data in Flat File format.
You can use iparse.readFixedLines([inputStream],[first line index],[last line index])
For instance:
import java.util.Properties;
import java.io.InputStream;
import com.boomi.proserv.iparser.*;
import java.io.ByteArrayInputStream;
IParser iparser = new IParser();
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
is = iparser.transpose(is);
dataContext.storeStream(is, props);
}
You can use iparse.transpose()
For instance:
import java.util.Properties;
import java.io.InputStream;
import com.boomi.proserv.iparser.*;
import java.io.ByteArrayInputStream;
IParser iparser = new IParser();
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
is = iparser.transpose(is);
dataContext.storeStream(is, props);
}
Set<String> listOfPossibleValues = iparser.getPossiblesValues(is, "A");
!!! All the CSV files need to be comma separated. !!!
Sample of Data to processed
A:Gender | B:AgeCategory | C:Departement | D:FTETotal | E:TraineeTotal |
---|---|---|---|---|
MAN | age<30 Yrs | iPaaS | 10 | 15 |
MAN | age>30 Yrs | iPaaS | 70 | 2 |
WOMAN | age<30 Yrs | iPaaS | 15 | 7 |
WOMAN | age>30 Yrs | iPaaS | 70 | 0 |
WOMAN | age>30 Yrs | Data Science | 1 | 1 |
MAN | age<30 Yrs | ETL | 5 | 1 |
MAN | age>30 Yrs | ETL | 35 | 0 |
WOMAN | age<30 Yrs | ETL | 7 | 0 |
WOMAN | age>30 Yrs | ETL | 25 | 0 |
Configuration file
Type | Name | Label | Filter | Operation | OperationDetails |
---|---|---|---|---|---|
Indicator | ID1 | Number of men with age below 30 years old | A=MAN,B=.*<30 Yrs | sum | |
Indicator | ID2 | Number of women with age below 30 years old | A=WOMAN,B=.*<30 Yrs | sum | |
Virtual | ID3 | Total number with age below 30 years old | compute | ID1+ID2 | |
Indicator | ID4 | Total number of trainees | sum(E) | ||
Indicator | ID5 | Total number of trainees in iPaaS | C=iPaaS.* | sum(E) | |
Indicator | ID6 | Total number of trainees not in iPaaS | C=(?!.iPaaS.).* | sum(E) |
- Type: Indicator or Virtual (value is used for display only - not used for processing)
- Name: Name of the indicator
- Label: Description of the descriptor
- Filter: Regular expression (one or several separated with comma) You have to put the column letter and the regular expression to apply (column A equals to MAN will be A=MAN)
- Operation: sum, div, avg, min, max, count or compute (for Virtual indicator) If no parameter is provided (for instance sum) the operation will be applied to all the data column, otherwise a column need to be provided (for instance sum(D))
- OperationDetails: only for Virtual indicator, when an calculation on severals Indicator (or Virtual Indicator) need to be applied
is = iparser.processFile([configurationFile location], inputstream, [dataColumnStart], [dataColumnEnd]);
import java.util.Properties;
import java.io.InputStream;
import com.boomi.proserv.iparser.*;
import java.io.ByteArrayInputStream;
IParser iparser = new IParser();
for( int i = 0; i < dataContext.getDataCount(); i++ ) {
InputStream is = dataContext.getStream(i);
Properties props = dataContext.getProperties(i);
is = iparser.processFile(
"work/iParser_Configuration_Headcount_Indicators.csv"
, is, "D", "E");
dataContext.storeStream(is, props);
}
Output:
"Indicator","ID1",31.0
"Indicator","ID2",29.0
"Virtual","ID3",60.0
"Indicator","ID4",26.0
"Indicator","ID5",24.0
"Indicator","ID6",2.0
You can processFileGroupBy and provide an additional parameter with is the GroupByColumn, here "C"
is = iparser.processFileGroupBy(
"work/iParser_Configuration_Headcount_Indicators.csv"
, is, "D", "E",
"C");
Output:
"Indicator","ID1","C=Data Science",0.0
"Indicator","ID2","C=Data Science",0.0
"Virtual","ID3","C=Data Science",0.0
"Indicator","ID4","C=Data Science",1.0
"Indicator","ID5","C=Data Science",1.0
"Indicator","ID6","C=Data Science",1.0
"Indicator","ID1","C=iPaaS",25.0
"Indicator","ID2","C=iPaaS",22.0
"Virtual","ID3","C=iPaaS",47.0
"Indicator","ID4","C=iPaaS",24.0
"Indicator","ID5","C=iPaaS",24.0
"Indicator","ID6","C=iPaaS",24.0
"Indicator","ID1","C=ETL",6.0
"Indicator","ID2","C=ETL",7.0
"Virtual","ID3","C=ETL",13.0
"Indicator","ID4","C=ETL",1.0
"Indicator","ID5","C=ETL",1.0
"Indicator","ID6","C=ETL",1.0
You can use a filter with standard equals:
A=MAN
You can have a filter with different conditions (all need to be satisfied)
A=MAN,B=.*<30 Yrs
You can have a filter with a negation using negative lookahead ?! using the following syntax: (?!.expressionNotEqualTo.).*
C=(?!.*iPaaS.*).*