Releases: alibaba/Alink
Alink version 1.5.1
- Improve the performance of dl module.
- Resolve many issues on Windows platform.
- Add incremental training mode for LR, Softmax etc.
- Improve the performance of graph-based random walk algorithms.
Alink version 1.5.0
Alink version 1.4.0
- Adapt flink 1.13.
- Fixed some bugs
- Add some feature engineering methods
- Refine the documents of BatchOp/StreamOp
- Add java demoes
Alink version 1.3.2
Alink version 1.3.1
- Adapt flink 1.12.
- Add plugin of kafka.
- Add s3 file system.
- Add odps catalog.
- Fix poisson and add glm model info.
- Add multi-files in pipeline loader and local predictor loader.
- Use legacy serializer to compatible with old ak format.
- Change vector type as CompositeType and Change Sparse vector as pojo type.
- Remove the REGEXP_REPLACE in sql selector for flink 1.12
Alink version 1.3.0
- Add more model info batch op and support print model info in pipeline model.
- Add recommendation module.
- Supported recommender are:
- Als
- Factorization Machines
- ItemCF
- UserCF
- Supported others function for recommendation module are:
- Leave k-object out
- Leave top k-object out
- Ranking evaluation
- Multi-Label evaluation
- Supported recommender are:
- Add online learning algorithoms.
- ftrl model filter
- Add a series of similarity algorithms.
- VectorNearestNeighbor
- TextSimilarity
- TextNearestNeighbor
- TextApproxNearestNeighbor
- StringSimilarity
- StringNearestNeighbor
- StringApproxNearestNeighbor
- Add DocWordCountBatchOp,KeywordsExtractionBatchOp, TfidfBatchOp,WordCountBatchOp
- Add KNN
- Add GeoKMeans, Streaming Kmeans
- Add model selctor algorithms.
- RandomSearchCV
- RandomSearchTVSplit
- Add plugin in filesystem and catalog. Add catalogs of hive, mysql, derby and sqlite
- PyAlink:
- Align with new functionalities in Java side, including new operators, catalog, plugin mechanism, and so on;
- For Flink version 1.9, PyAlink now depends on PyFlink directly, resulting in supporting flink run, and table-related operetions.
- Fix some issues, optimize performance and add more parameters in linear and tree model
- Add test utils module and optimize performance of unit tests.
- Remove the db module.
- Refine the save/load in pipeline and pipeline model. Use Ak as the default format for save/load.
- Support load LocalPredictor from Ak file which saved on filesystem. This will avoid
collect
when load the LocalPredictor. see #78 #79 - Add multi-threads in all mapper
- Optimize memory usage of batch prediction.
- Add pseudoInverse in matrix
- Support that the sparse vector has not size
- Fix sequencing issue when linkFrom the model info batch op
- Optimize the format of lazy print.
- Add Stopwatch and TimeSpan
- Add serialVersionUID in all serializable classes.
Alink version 1.2.0
-
Adapt for Flink 1.11
-
Add Factorization Machines classification and regression #115
-
Support Lazy APIs for higher user interactivity and richer information.
Lazy APIs enable intermediate outputs of the ML pipeline to be printed, collected, and post-processed along with the mainstream of data process. Such intermediate outputs include: ML model and training information, evaluation metrics, data statistics, etc.- PyAlink supported
- Support Lazy APIs for BatchOperators and related methods in EstimatorBase/TransformerBase #116
- Add model information:
- Linear model #118 #132
- Tree model #125
- PCA #117
- ChisqSelector #117
- VectorChisqSelector #117
- KMeans #120
- BisectingKMeans #120
- NaiveBayes #122
- Lda #122
- GaussianMixture #120
- OneHotEncoder #120
- QuantileDiscretizer #120
- MinMaxScaler #122
- VectorMinMaxScaler #122
- MaxAbsScaler #122
- VectorMaxAbsScaler #122
- StandardScaler #122
- VectorStandardScaler #122
- Add training information:
- word2vec #125
- Add statistics:
- Add EvaluationMetrics #124
-
Add FileSystem APIs. #126
Using FileSystem APIs, users can process files on different file systems with unified and friendly experience. Such processing can beexists
,isDir
,list
,read
,write
or other commonly functions used for files. Supported file system are:- HDFS
- OSS
- Local
-
Add Ak source/sink and Csv source/sink support new FileSystem APIs. #126
Ak is a file format storing data together with its schema that can be written to filesystem. It makes the advantages of compressed, tabular data representation.The supported APIs are shown in the table below:HDFS OSS Local Ak source ✔️ ✔️ ✔️ Ak sink ✔️ ✔️ ✔️ Csv source ✔️ ✔️ ✔️ Csv sink ✔️ ✔️ ✔️ -
Support EqualWidthDiscretizer. #123
-
Feature Enhancements and API unification in Clustering. #121
-
Refine code of QuantileDiscretizer and OneHotEncoder #111
-
Fix predict stream op in alspredictstreamop.md #104
Alink version 1.1.2
- Add transformers among formats Vector, CSV, Json, KV, Columns and Triple #93
• Support AnyToAny transformation
• Unified transformation params and easy use. - Support SQL select statements in the Pipeline and LocalPredictor #61
• Support flink planner built-in functions regarding individual rows: comparison, logical, arithmetic, string, temporal, conditional, type conversion, hash, etc.
• Add alink_shaded/shaded_protobuf_java to support usage of native Calcite. - Support Hive source and sink #96
• Support Batch/Stream source&sink of Hive.
• Support partition of table.
• Simplify the dependence of Hive jar.
• Support multiple versions: 2.0, 2.1, 2.2, 2.3, 3.0 - Fix PyAlink starting and UDF issues on Windows #76, #77
- Support BigInteger type in MySql source #86
- Add open and close in mapper. #92
- Add open function in SegmentMapper and StopwordsRemoverMapper #94
- Unify HandleInvalid Params #95
Alink version 1.1.1
Enhancements & New Features
- Optimize conversion between operator and dataframe
- Auto-detect localIp when useRemoveEnv
- Add enum type parameter #65
• Adapt enum type params in quantile, distance and decision tree. #67
• linear model train params change to enum #71
• Kafka, StringIndexer and Join add enum parameters #72
• Adapt enum type params in pca, chi square test, glm and correlation. #73 - streamop window group by #68
- Add operators to parse strings in CSV, JSON and KV formats to columns #70
- Tokenizer supports string split with multiple spaces #69
- Make error message clear when selected columns are not found #66
- Add an FTRL example #64
Fix & Refinements
Alink version 1.1.0
Enhancements & New Features
- Improvement of UDF/UDTF operators, Java and PyAlink have consistent usage and behaviors. #32 #44.
- Publish to maven central and PyPI.
- Support Flink 1.10 and Flink 1.9. #46
- Support more Kafka connectors. #41.
API change
- Modify Naive Bayes algorithm as a text classifier. #47
- Modify and enhance the parameter, model in QuantileDiscretizer, OneHotEncoder and Bucketizer. #48
Documentation
Fix & Refinements
- Fix the problem in LDA online method and refine comments in FeatureLabelUtil. #29
- Fit the bug that initial data of KMeansAssignCluster is not cleared. #31
- Fix the int overflow bug in reading large csv file, and dd test cases for CsvFileInputSplit. See #27
- Cleanup some code. #15
- Remove a redundant test case whose data source is unaccessible. see #28
- Fix the NEP in PCA. see #42
PyPI support
- Support PyAlink installation using
pip install pyalink
Maven Dependencies
Alink is now synchronized to the Maven central repository, which you can easily add to Maven projects.
With Flink-1.10
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.10_2.11</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.10.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.10.0</version>
</dependency>
With Flink-1.9
<dependency>
<groupId>com.alibaba.alink</groupId>
<artifactId>alink_core_flink-1.9_2.11</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-scala_2.11</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-planner_2.11</artifactId>
<version>1.9.0</version>
</dependency>