Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

huawei-noah / bolt Public

Notifications You must be signed in to change notification settings
Fork 159
Star 917

Code
Issues 39
Pull requests 3
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: huawei-noah/bolt

Releases · huawei-noah/bolt

v1.5.2

08 Jul 02:50

zhangjiajin2

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.5.2 Latest

Latest

Fixed

Fix GPU resize bug

Assets 2

Loading

All reactions

v1.5.1

13 Jun 07:14

jianfeifeng

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.5.1

Added

Support Python API
Support AVX-VNNI and ARMv9 instruction set
Support Intel Desktop GPU (float16 and float32)
Support Windows on arm platform
Support more operators : Random, Sin, Cos, Einsum, Elu, UnPooling, Flatten, ConvertColor, BilateralSliceApply, Lut
Support more networks : ViTAE, CMT, EfficientFormer, ConvTT, Wenet, NFM, AFM, ONN, wide&deep, DeepFM, MMOE, etc
Improve multi-threads parallel inference performance on CPU
Add simple chinese deployment guide
Support model file compatibility
Support using outer memory(CPU array or OpenCL cl_mem) by using SetInputOutput API
Support data type and format transform by using C API

Changed

TensorDesc's dim array is changed to 20.
Remove FILE macro usage and warning log under release mode
change enum data and operator parameter size

Fixed

Fix GPU resize bug
Fix GPU concurrent inference bug
Fix ONNX converter bug
Add missed chinese automatic speech recognition model

Assets 2

Loading

chromer030 and kayeeeo reacted with thumbs up emoji

All reactions

👍 2 reactions

2 people reacted

v1.3.0

16 Apr 01:21

jianfeifeng

Compare

Choose a tag to compare

Loading

v1.3.0

Added

Support on-device training for MLP, CNN(lenet, resnet50, mobilnetv1), transformer/bert(text to speech)
Support change model input and output names in X2bolt
Support more graph optimizations : Transpose+Convolution, Swish, Quantization, Power+Scale
Support dynamic output related operators : Shape, ConstantOfShape, GenerateProposals, NonZero, NonMaxSuppression, Reshape, etc
Support more operators : GridSample, CumSum, OneHot, Round, Floor, Ceil
Support more networks on CPU : yolov2, yolov3, yolov4, yolov5, faster-rcnn, mask-rcnn, retinanet, dfsmn, frill, conformer, unet, etc
Support Armv8 int8 to accelerate NLP network
Improve inference performance on avx2 CPU
Support netron to visualize bolt model
Support not to bind CPU core
Add C API MemoryCheck to check bolt memory leak

Changed

X2bolt add -I and -O options to change model input and output names.
X2bolt add -t option to convert model for on-device training.
C API CreateModel and AllocAllResultHandle return value is set to NULL when unsuccessful.
install.sh add --neon option to close arm neon acceleration on old platform.
some operator parameter defination

Fixed

Fix GPU depth2space and deconv bug
Fix GPU preprocess tool on armv8 platform bug
Fix x86 Sigmoid precision
Fix C API CloneResultHandle bug
Fix mobilnetv1 int8 inference
Fix Java API build bug on Windows
Fix ONNX converter deconv, pooling parameter bug

Removed

Equal operator is replaced with Check.

Assets 2

Loading

All reactions

v1.2.1

01 Oct 15:45

jianfeifeng

Compare

Choose a tag to compare

Loading

v1.2.1

Added

Support more graph optimizations : Convolution+Convolution, LayerNorm
Support more operators: ROIAlign, GenerateProposals, Reciprocal, Not, Log, ReductionL2, InstanceNorm, Expand, Gather, Scatter
Support more operators(PReLU) process NCHW input data.
Support ONNX share weight between Linear, MatMul, Gemm and Gather
Support more networks on CPU: vision transformer(ViT, TNT), recommendation networks
Support more networks on GPU : ASR, Faster_RCNN
Support Armv7 int8 to accelerate NLP network(50%+ speed-up)
Support X86 AVX512 int8 to accelerate NLP network(3x+ speed-up)
Support using image on Qualcomm GPU, add GPU image manage methods
Improve inference performance on Qualcomm GPU
Add more kit android/iOS demos : Chinese ASR, Face Detection, Sentiment Analysis
Try to bind core when using GPU

Changed

Replace mali option with gpu in install shell script, and remove default target option setting
Change data format NCWHC4 TO NCHWC4 for GPU
Simplified tensor padding method with OclMemory for GPU
Tool preprocess_ocl produces algofile and xxxlib.so before, for now algofile has been packaged into this xxxlib.so
Add BNN_FP16 option in X2bolt tool to convert ONNX 1-bit model
Replace original INT8 option with INT8_FP16 in post_training_quantization tool to convert int8+float16 hybrid inference model, and add INT8_FP32 option to convert int8+float32 hybrid inference model.
Add shell environment variable BOLT_INT8_STORAGE_ERROR_THRESHOLD to control post_training_quantization convert int8 model, default value is 0.002. post_training_quantization will use int8 storage when when quantization error lower than BOLT_INT8_STORAGE_ERROR_THRESHOLD.

Fixed

Fix PReLU 2d, 3d support
Fix Resize bug on some mode
Fix ONNX converter read Squeeze, UnSqueeze, Deconv parameter bug
Fix Arm Sigmoid precision
Fix ONNX RNN optimizer, and add support for NCHWC8 input data
Fix Concat with weight tensor in onnx converter
Simplify C API example

Assets 2

Loading

All reactions

v1.2.0

07 Apr 06:34

jianfeifeng

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.2.0

Added

Support x86 compilation and cross-compialtion for ios/android on MacOs
Support x86 compilation and cross-compilation for android on Windows
Support MTK armv7 cross compilation toolchains on Linux by using linux-armv7_blank target
Add Gitbook for user reference
Support image nearest Resize and align_corners Resize
Support more graph optimizations : Transpose+Concat+Transpose, Padding+Transpose, HardSwish-Fusion, Relu6-Fusion, Resize-Fusion, SwapTransposeEltwise, SwapPadTranspose, Convolution+Eltwise, Transpose+Matmul
Support more operators: 3D-convolution, Where, SoftPlus, Exp, Split, Tdnn, Dropout, TopK, SpaceToBatchNd, BatchToSpaceNd, Abs, Equal, Sign, Resize(more mode)
Support more networks on CPU: Reactnet, Tdnn, ShuffleNet, DenseNet, Hrnet, Efficientnet, Noah KWS2.0
Support more networks on mali GPU : TinyBert, nmt
Add more kit android/iOS demos : Simple-Image-Classification, Image-SuperResolution, Image-Classification
Support float16, int8 model storage on any hardware
Add Flow Java API

Changed

Change install, GPU library process shell script
Optimize TfSlice with 75%+ speed-up on cpu
Optimize Concat with 50%+ speed-up on cpu
Optimize Deconvolution with 10%+ speed-up on cpu
Optimize YoloDetection network with 15%+ speed-up on cpu
Optimize resnet50 from 90ms+ to 70ms+ on x86, faster than openvino
Optimize mobilenet v1/v2 with 10%+ speed-up on x86
Optimize tts-melgan network from 200ms+ to 160ms on x86
Optimize model read time
Change Java API package name and use com.huawei.noah, split single API file to 6 files.

Fixed

Fix length of op/tensor name > 128 not-supporting bug
Fix Caffe input dims extraction bug
Fix Concat with single input in onnx converter
Fix padding(nhwc) not-supporting bug
Fix relu6 insertion in tflite converter
Fix GRU, LSTM LBR_GRU model converter and inference bug
Fix X86 convolution, fully connected operators inference bug

Removed

Remove third party library FFTW and using FFTS for ASR example

Assets 2

Loading

All reactions

v1.1.0

29 Jan 13:03

jianfeifeng

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

v1.1.0

v1.1.0

Update the installation script on linux

Assets 2

Loading

All reactions

v1.0.0

22 Nov 10:35

jianfeifeng

Compare

Choose a tag to compare

Loading

v1.0.0

v1.0.0

Support fp32 on X86 AVX2 CPU
Support partial fp32 operator(convolution, lstm) multi-threads parallel
Support Tensorflow model
Support more networks(Pointnet, ...)
Support more networks int8 inference(TinyBert, NMT, ASR)
Support time-series data acceleration
Support Apple IOS phone

Assets 2

Loading

All reactions

Boltv0.3.0

08 Jun 05:14

jianfeifeng

Compare

Choose a tag to compare

Loading

Boltv0.3.0

Release v0.3.0

Optimized fp16 on ARM MALI GPU
Support fp32 on ARMv7 CPU
Support int8 PTQ calibration
Support more networks(SSD, ASR, TTS)
Support image classification task on ARM MALI GPU

Assets 2

Loading

All reactions

Bolt v0.2.0

05 Jun 15:09

jianfeifeng

Compare

Choose a tag to compare

Loading

Bolt v0.2.0

Release v0.2.0

Support fp32 on ARM CPU
Support fp16 on ARM MALI GPU
Support memory reuse for feature maps and weight-sharing between operators
Support dynamic input size
Support CPU affinity setting
Support convolution algorithm auto-tuning (runtime or full parameter space search)
Support Java and C API

Assets 2

Loading

All reactions

Bolt v0.1.0

10 Mar 15:17

jianfeifeng

Compare

Choose a tag to compare

Loading

Bolt v0.1.0

Release v0.1.0

Support Caffe/ ONNX/ Tflite
Support fp16/int8/binary
Support Sequential/CNN/LSTM (common models of CV and NLP)

Assets 2

Loading

All reactions

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.