Release Notes
Compare to 7.2 (including features from 8.0b1 and 8.0b2)
- Support for Latest Dependencies
- Compatible with the latest
protobuf
python package which improves serialization latency. - Support
torch 2.4.0
,numpy 2.0
,scikit-learn 1.5
.
- Compatible with the latest
- Support stateful Core ML models
- Updates to the converter to produce Core ML models with the State Type (new type introduced in iOS18/macOS15).
- Adds a toy stateful attention example model to show how to use in-place kv-cache.
- Increase conversion support coverage for models produced by
torch.export
- Op translation support is at 56% parity with our mature
torch.jit.trace
converter - Representative deep learning models (mobilebert, deeplab, edsr, mobilenet, vit, inception, resnet, wav2letter, emformer) have been supported
- Representative foundation models (llama, stable diffusion) have been supported
- The model quantized by
ct.optimize.torch
could be exported bytorch.export
and then convert.
- Op translation support is at 56% parity with our mature
- New Compression Features
- coremltools.optimize
- Support compression with more granularities: blockwise quantization, grouped channel wise palettization
- 4 bit weight quantization and 3 bit palettization
- Support joint compression modes (8 bit look-up-tables for palettization, pruning+quantization/palettization)
- Vector palettization by setting
cluster_dim > 1
and palettization with per channel scale by settingenable_per_channel_scale=True
. - Experimental activation quantization (take a W16A16 Core ML model and produce a W8A8 model)
- API updates for
coremltools.optimize.coreml
andcoremltools.optimize.torch
- Support some models quantized by
torchao
(including the ops produced by torchao such as_weight_int4pack_mm
). - Support more ops in
quantized_decomposed
namespace, such asembedding_4bit
, etc.
- coremltools.optimize
- Support new ops and fixes bugs for old ops
- compression related ops:
constexpr_blockwise_shift_scale
,constexpr_lut_to_dense
,constexpr_sparse_to_dense
, etc - updates to the GRU op
- SDPA op
scaled_dot_product_attention
clip
op
- compression related ops:
- Updated the model loading API
- Support
optimizationHints
. - Support loading specific functions for prediction.
- Support
- New utilities in
coremltools.utils
coremltools.utils.MultiFunctionDescriptor
andcoremltools.utils.save_multifunction
, for creating an mlprogram with multiple functions in it, that can share weights.coremltools.models.utils.bisect_model
can break a large Core ML model into two smaller models with similar sizes.coremltools.models.utils.materialize_dynamic_shape_mlmodel
can convert a flexible input shape model into a static input shape model.
- Various other bug fixes, enhancements, clean ups and optimizations
- Special thanks to our external contributors for this release: @sslcandoit @FL33TW00D @dpanshu @timsneath @kasper0406 @lamtrinhdev @valfrom @teelrabbit @igeni @Cyanosite