microgemm

General matrix multiplication with custom configuration in Rust.
Supports no_std and no_alloc environments.

The implementation is based on the BLIS microkernel approach.

Content

Install
Usage
Benchmarks
- f32
License

Install

cargo add microgemm

Usage

The Kernel trait is the main abstraction of microgemm. You can implement it yourself or use kernels that are already provided out of the box.

gemm

use microgemm::{kernels::GenericKernel8x8, Kernel as _, MatMut, MatRef, PackSizes};

fn main() {
    let kernel = GenericKernel8x8::<f32>::new();
    let [m, k, n] = [100, 380, 250];

    let [mc, kc, nc] = [m, k / 2, n];
    let pack_sizes = PackSizes { mc, kc, nc };
    let mut packing_buf = vec![0.0; pack_sizes.buf_len()];

    let (alpha, beta) = (2.0, -3.0);

    let a = vec![2.0; m * k];
    let b = vec![3.0; k * n];
    let mut c = vec![4.0; m * n];

    let a = MatRef::row_major(m, k, &a);
    let b = MatRef::row_major(k, n, &b);
    let mut c = MatMut::row_major(m, n, &mut c);

    // c <- alpha a b + beta c
    kernel.gemm(alpha, a, b, beta, &mut c, pack_sizes, &mut packing_buf);
    println!("{:?}", c.as_slice());
}

Also see no_alloc example for use without Vec.

Implemented Kernels

Name	Scalar Types	Target
GenericKernelNxN (N: 2, 4, 8, 16, 32)	T: Copy + Zero + One + Mul + Add	Any
NeonKernel4x4	f32	aarch64 and target feature neon
NeonKernel8x8	f32	aarch64 and target feature neon

Custom Kernel Implementation

use microgemm::{typenum::U4, Kernel, MatMut, MatRef};

struct CustomKernel;

impl Kernel for CustomKernel {
    type Scalar = f64;
    type Mr = U4;
    type Nr = U4;

    // dst <- alpha lhs rhs + beta dst
    fn microkernel(
        &self,
        alpha: f64,
        lhs: MatRef<f64>,
        rhs: MatRef<f64>,
        beta: f64,
        dst: &mut MatMut<f64>,
    ) {
        // lhs is col-major
        assert_eq!(lhs.row_stride(), 1);
        assert_eq!(lhs.nrows(), Self::MR);

        // rhs is row-major
        assert_eq!(rhs.col_stride(), 1);
        assert_eq!(rhs.ncols(), Self::NR);

        // dst is col-major
        assert_eq!(dst.row_stride(), 1);
        assert_eq!(dst.nrows(), Self::MR);
        assert_eq!(dst.ncols(), Self::NR);

        // your microkernel implementation...
    }
}

Benchmarks

All benchmarks are performed in a single thread on square matrices of dimension n.

f32

PackSizes { mc: n, kc: n, nc: n }

aarch64 (M1)

   n  NeonKernel8x8           faer matrixmultiply
 128         64.6µs        256.3µs         49.5µs
 256        419.5µs          3.2ms        518.2µs
 512          2.9ms         16.3ms          2.8ms
1024           23ms        132.7ms         22.5ms
2048        185.5ms             1s        182.8ms

License

Licensed under either of Apache License, Version 2.0 or MIT license at your option.

Name		Name	Last commit message	Last commit date
Latest commit History 208 Commits
.github/workflows		.github/workflows
assets		assets
benches		benches
examples		examples
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.toml		Cargo.toml
LICENSE-APACHE		LICENSE-APACHE
LICENSE-MIT		LICENSE-MIT
Makefile.toml		Makefile.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

microgemm

Content

Install

Usage

gemm

Implemented Kernels

Custom Kernel Implementation

Benchmarks

f32

aarch64 (M1)

License

About

Licenses found

Releases 7

Packages

Languages

License

Licenses found

cospectrum/microgemm

Folders and files

Latest commit

History

Repository files navigation

microgemm

Content

Install

Usage

gemm

Implemented Kernels

Custom Kernel Implementation

Benchmarks

f32

aarch64 (M1)

License

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages