Releases: FederatedAI/FATE
Release v2.2.0
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
Deployment
- Upgrade from Python 3.8 to Python 3.10
- Upgrade the PyTorch version to 2.x
Release v2.1.1
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
Component
- Support server model saving in Homo-NN
ML
- aggregator support aggregation of torch.bfloat16 data type
Release v2.1.0
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
Arch
- Some bugs fixed for spark computing engine
Component
- Unified IO keys naming format for all components
- Add LLMLoader to support running FATE-LLM v2.0 with pipeline
OSX
- Compatible with eggroll-v2.x
- add 2.x api backport support
- bug fix
-
Improved the display issue of output data.
-
Enhanced the PyPI package: configuration files have been relocated to the user's home directory, and the relative paths for uploading data are based on the user's home directory.
-
Added support for running FATE algorithms with Spark + Hadoop.
-
Fixed an issue where failed tasks could not be retried.
-
Fixed an issue where the system couldn't run when the task cores exceeded the system total cores.
- Pipeline: add supports for fate-llm 2.0
- newly added LLMModelLoader, LLMDatasetLoader, LLMDataFuncLoader
- newly added configuration parsing of seq2seq_runner and ot_runner
- Pipeline: unified input interface of components
- Adapt to fate-v2.0 framework:
- Migrate parameter-efficient fine-tuning training methods and models.
- Migrate Standard Offsite-Tuning and Extended Offsite-Tuning(Federated Offsite-Tuning+)
- Newly trainer,dataset, data_processing function design
- New FedKSeed Federated Tuning Algorithm: train large language models in a federated learning setting with extremely low communication cost
- Add Support for Job Runtime Configuration
Release v2.0.0
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FATE 2.0
collaps
Arch 2.0:Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection
- Introduce
Context
to manage useful APIs for developers, such asDistributed Compting
,Federation
,Cipher
,Tensor
,Metrics
, andIO
. - Introduce
Tensor
data structure to handle local and distributed matrix operation, with built-in heterogeneous acceleration support.- abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
- Introduce
DataFrame
, a 2D tabular data structure for data io and simple feature engineering- add data block manager to support mixed-type columns & feature anonymization
- added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc
- Refactor
Federation
, a unified interface for federated communication. We provide a unified Serdes control and more user-friendly api. - Introduce
Config
, a unified configuration for FATE, including safety restrictions, system configuration, and algorithm configuration - Refactor
logger
, customizable logging for different use cases and flavors. - Introduce
Launcher
, a simple tool for federated program execution, especially useful for standalone and local debugging - Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
- Deepspeed integration: support distributed training using deepspeed with Eggroll.
- Protocol: Support for SSHE(mpc and homomophic encryption mixed protocol), ECDH, Secure Aggregation protocols
- Experimental Integrate
Crypten
for SMPC support, more protocols and features will be added in the future
Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines
- Introduce components toolbox to wrap ML modules as standard executable programs
- spec and loader expose clear API for smooth internal extension and external system integration
- Provide several cli tools to interact and execute components
- Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
- Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs
ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy
- Data preprocessing: Added DataFrame Transformer; Reader, Union and DataSplit migration completed
- Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale and Pearson Correlation
- Federated Training Migrated: HeteroSecureBoost, HomoNN, HeteroCoordinatedLogisticRegression, HeteroCoordinatedLinearRegression, SSHE-LogisticRegression and SSHE-LinearRegression
- Federated Training Added:
- SSHE-HeteroNN: based on mpc and homomorphic encryption mixed protocal
- FedPASS-HeteroNN: based on fedpass protocol
Algorithm Performance Improvements (Comparison with FATE-v1.11.*)
- PSI (Privacy Set Intersection): tested on a dataset of 100 million with an intersection result of 100 million, 1.8+ times of FATE-v1.11.4
- Hetero-SSHE-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 4.3+ times of FATE-v1.11.4
- Hetero-NN(Based on FedPass Protocol): tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, basically consistent with the plaintext performance, 143+ times of FATE-v1.11.4
- Hetero-Coordinated-LR: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.2+ times of FATE-v1.11.4
- Hetero-Feature-Binning: tested on data of guest 10w * 30 dimensions and host 10w * 300 dimensions, 1.5+ times of FATE-v1.11.4
OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection
- Implement the transmission interface in accordance with the “ Technical Specification for Financial Industry Privacy Computing Interconnection Platform”,The transmission interface is compatible with FATE 1.X version and FATE 2.X version
- Supports GRPC synchronous and streaming transmission, supports TLS secure transmission protocol, and is compatible with FATE1.X rollsite components
- Supports Http 1.X protocol transmission and TLS secure transmission protocol
- Support message queue mode transmission, used to replace rabbitmq and pulsar components in FATE 1.X
- Supports Eggroll and Spark computing engines
- Supports networking as an Exchange component, with support for FATE 1.X and FATE 2.X access
- Compared to the rollsite component, it improves the exception handling logic during transmission and provides more accurate log output for quickly locating exceptions.
- The routing configuration is basically consistent with the original rollsite, reducing the difficulty of porting
- Supports HTTP interface modification of routing tables and provides simple permission verification
- Improved network connection management logic, reduced connection leakage risk, and improved transmission efficiency
- Using different ports to handle access requests both inside and outside the cluster, facilitating the adoption of different security policies for different ports
FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection
collaps
- Adapted to new scalable and standardized federated DSL IR
- Built an interconnected scheduling layer framework, supported the BFIA protocol
- Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
- Optimized algorithm component scheduling,support container-level algorithm loading, enhancing support for cross-platform heterogeneous scenarios
- Optimized multi-version algorithm component registration, supporting registration for mode of components
- Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling
- Optimized client authentication logic, supporting permission management for multiple clients
- Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
- Added OFX(Open Flow Exchange) module: encapsulated scheduling client to allow cross-platform scheduling
- Supported the new communication engine OSX, while remaining compatible with all engines from FATE Flow 1.x
- Decoupled the System Layer and the Algorithm Layer, with system configuration moved from the FATE repository to the Flow repository
- Published FATE Flow package to PyPI and added service-level CLI for service management
- Migrated major functionality from FATE Flow 1.x
FATE-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling
collaps
- Introduce new scalable and standardized federated DSL IR(Intermediate Representation) for federated modeling job
- Compile python client to DSL IR
- Federated DSL IR extension enhancement: supports multi-party asymmetric scheduling.
- Support mutual translation between Standardized Fate-2.0.0 DSL IR and UnionPay's BFIA protocol.
- Support components with UnionPay's BFIA protocol through adapter mode
- Flow CLI and PipeLine share configuration
FATE-Test: FATE Automated Testing Tool
collaps
- Migrated automated testing for functionality, performance, and correctness
FATE-Board 2.0
collaps
- Refactoring DAG components, adding support for stage status, and displaying dynamic ports.
- Update the cache structure to optimize issues such as user timeout handling and duplicate storage of configuration information.
- Optimize some interactive functions.
- Update the style theme.
Eggroll 3.0
collaps
Enhancements in the JVM Part:
- Core Component Reconstruction: The
cluster-manager
andnode-manager
components have been entirely rebuilt using Java, ensuring uniformity and enhanced performance. - Transport Component Modification: The
rollsite
transport component has been removed and replaced with the more efficientosx
component. - Improved Process Management: Advanced logic has been implemented to manage processes more effectively, significantly reducing the risk of process leakage.
- Enhanced Data Storage Logic: Data storage mechanisms have been refined for better performance and reliability.
- Concurrency Control Improvements: We've upgraded the logic for concurrency control in the original components, leading to performance boosts.
- Visualization Component: A new visualization component has been added for convenient monitoring of computational information.
- Refined Logging: The logging system has been enhanced for more precise outputs, aiding in rapid anomaly detection.
Upgrades in the Python Part:
- Reconstruction of
roll_pair
andegg_pair
: These components now support serialization and partition methods controlled by the caller. Serialization safety is uniformly managed by the caller. - Automated Cleanup of Intermediate Tables: The issue of automatic cleaning for intermediate tables between federation and computing has been resolved, eliminating the need for extra operations by the caller.
- Unified Configuration Control: A flexible co...
Release v1.11.4
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FederatedML
- Unified key length configuration of encryption algorithm, update default key length to 2048.
Bug-Fix
- Modify hessian computation of softmax cross entropy in SecureBoost, to align with LightGBM.
- Fix Model initialization error in Homo Neural Network predicting process.
Release v2.0.0-beta
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
Arch 2.0:Building Unified and Standardized API for Heterogeneous Computing Engines Interconnection
- Framework: PSI-ECDH protocol support, single entry for histogram statistical computation
- Protocol: Support for ECDH, Secure Aggregation protocols
- Tensor: abstracted PHETensor, smooth switch between various underlying PHE implementations through standard interface
- DataFrame: New data block manager supports mixed-type columns & feature anonymization; added 30+ operator interfaces for statistics, including comparison, indexing, data binning, and transformation, etc.
- Enhanced workflow: Support for Cross Validation workflow
Components 2.0: Building Standardized Algorithm Components for different Scheduling Engines
- Input-Output: Further decoupling of FATE-Flow, providing standardized black-box calling processes
- Component Definition: Support for typing-based definition, automatic checking for component parameters, support for multiple types of data and model input and output, in addition to multiple inputs
ML 2.0: Major functionality migration from FATE-v1.x, decoupling call hierarchy
- Data preprocessing: Added DataFrame Transformer, Union and DataSplit migration completed
- Feature Engineering: Migrated HeteroFederatedBinning, HeteroFeatureSelection, DataStatistics, Sampling, FeatureScale
- Federated Training: Migrated HeteroSecureBoost, HomoNN, vertical CoordinatedLogisticRegression, and CoordinatedLinearRegression
- Evaluation: Migrated Evaluation
OSX(Open Site Exchange) 1.0: Building Open Platform for Cross-Site Communication Interconnection
- Improved HTTP/1.X protocol support, support for GRPC-to-HTTP transmission
- Support for TLS secure transmission protocol
- Added routing table configuration interface
- Added routing table connectivity automatic check
- Improved transmission function in cluster mode
- Enhanced flow control in cluster mode
- Support for simple interface authentication
FATE Flow 2.0: Building Open and Standardized Scheduling Platform for Scheduling Interconnection
- Migrated functions: data upload/download, process scheduling, component output data/model/metric management, multi-storage adaptation for models, authentication, authorization, feature anonymization, multi-computing/storage/communication engine adaptation, and system high availability
- Optimized process scheduling, with scheduling separated and customizable, and added priority scheduling
- Optimized algorithm component scheduling, dividing execution steps into preprocessing, running, and post-processing
- Optimized multi-version algorithm component registration, supporting registration for mode of components
- Optimized client authentication logic, supporting permission management for multiple clients
- Optimized RESTful interface, making parameter fields and types, return fields, and status codes clearer
- Decoupling the system layer from the algorithm layer, with system configuration moved from the FATE repository to the Flow repository
- Published FATE Flow package to PyPI and added service-level CLI for service management
Fate-Client 2.0: Building Scalable Federated DSL for Application Layer Interconnection And Providing Tools For Fast Federated Modeling.
- Migrated Flow CLI and Flow SDK
- Updated federated DSL IR: enhance IR, add DataWarehouse and ModelWarehouse to load data and model from other sources
- Update component definitions to support Fate-v2.0.0-beta
- Flow CLI and PipeLine share configuration
Fate-Test: FATE Automated Testing Tool
- Migrated automated testing for functionality, performance, and correctness
Release v1.11.3
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FederatedML
- FedAVGTrainer update code strcuture: support OffsitetTuningTrainer
- FedAVGTrainer update log format: report batch progress instead of batch index
Release v1.11.2
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FederatedML
- Integrate DeepSpeed, support distributed training of FATE-LLM
- Separate upgraded FATE-LLM's from FATE into new “FATE-LLM” github repo
- HomoNN now supports data collator and distributed sampler
- Hetero SecureBoost supports running multiple boosting rounds in complete secure mode with
complete_secure
option
Bug-Fix
- Fix hessian computation of softmax cross entropy in SecureBoostt
Release v1.11.1
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FederatedML
- Support Homo Graph Neural Network
- PSI-DH protocol enhancement: use Oakley MODP modulus groups
Release v1.11.0
By downloading, installing or using the software, you accept and agree to be bound by all of the terms and conditions of the LICENSE and DISCLAIMER.
Major Features and Improvements
FederatedML
- Support FATE-LLM (Federated Large Language Models)
- Integration of LLM for federated learning: BERT, ALBERT, RoBERTa, GPT-2, BART, DeBERTa, and DistilBERT. Please note that if using such pretrain-models, compliance with their licenses is needed.
- Integration of Parameter-efficient tuning methods for federated learning: Bottleneck Adapters (including Houlsby, Pfeiffer, Parallel schemes), Invertible Adapters, LoRA, IA3, and Compacter.
- Improved Homo Federated Trainer class, allowing CUDA device specification and DataParallel acceleration for multi-GPU devices.
- TokenizerDataset feature upgrade, better adaptation to HuggingFace Tokenizer.
Bug-Fix
- Fix inconsistent
bin_num
display of Hetero Feature Binning for data contains missing value - Fix inconsistency in transforming data for transforming selected columns of Hetero Feature Binning When using ModelLoader
- Fix
exclusive_data_type
not valid in DataTransform when meta for input data is missing - Fix weighted loss calculation and feature importance display issues in Tree-Based models
- Fix sample id display of NN