Add sandbox for heap based sparse linear algebra #3969

HechtiDerLachs · 2024-07-24T10:45:08Z

What is this about?

Over a coffee break at the ICMS @flenzen and I discussed a bit about sparse linear algebra. He mentioned that in some application in TDA (topological data analysis) there was a data type for sparse vectors based on heaps which had proven to have advantages in various settings where Gauss-like algorithms are applied to simplify complexes (see e.g. this paper.) I became interested because of my implementation of simplify for complexes of modules: In the computation of derived pushforwards for coherent sheaves, the Gauss algorithm is the bottleneck at the moment.

I suggested to give it a try together in OSCAR. This PR is our sandbox for collaboration.

What's the state here?

I couldn't hold myself back and started implementing a naive version of what I understood the architecture to be. As a test-case I am using matrices over the integers and implemented a simple algorithm to determine an upper-triangular form (upper_triangular_form!). When running the tests it becomes clear that my naive implementation of heap-based data structures for sparse linear algebra clearly looses against the existing implementation via SRow and SMat.

However, I think this PR can still be useful since it provides a blueprint for how to contribute a new experimental section to OSCAR and @flenzen might also have ideas and try to play with and improve the code.

In fact, he already suggested to me that realizing heaps via linked binary trees is probably a dumb idea. Seems like he was right. But hey: If you know better, let's try it out! We have this code for comparison for the moment to build benchmarks.

Update: We now have an implementation of heap-dict based sparse rows as was pointed out by @fingolfin. I also made the HeapSMat type generic so that it can be used with arbitrary concrete types of sparse rows. With these new data types I now get slightly better performance in many cases when compared with the classical SMat and SRow.

It might still be the case that I'm not using enough in-place operations for the latter, though. Feel free to take a stand for the classical data types and improve their benchmark!

Should I care?

If you're interested or an expert in sparse linear algebra over rings, feel free to chime in. Comments are welcome! Otherwise, we will notify the relevant people once we get to the point that we feel that we have produced something useful, or just close this PR.

HechtiDerLachs · 2024-07-27T13:15:15Z

I pushed a first naive implementation. This code can be used for testing:

julia> l1 = [(rand(1:10000000), ZZ(rand(Int))) for i in 1:10000];

julia> l2 = [(rand(1:10000000), ZZ(rand(Int))) for i in 1:10000];

julia> uu = Oscar.HeapSRow(ZZ, l1);

julia> vv = Oscar.HeapSRow(ZZ, l2);

julia> @time ww = addmul!(uu, vv, ZZ(5));
  0.000078 seconds (6 allocations: 128 bytes)

julia> u = sparse_row(ZZ, l1);

julia> v = sparse_row(ZZ, l2);

julia> @time w = u + 5*v;
  0.015723 seconds (20.03 k allocations: 1.249 MiB)

julia> ww2 = Oscar.HeapSRow(w);

julia> @time ww == ww2

The lazy evaluation allows quite quick in-place operations. However, many other operations are, of course, expensive.

@flenzen: Let me know what you think. Is this going in the right direction in any reasonable sense?

joschmitt · 2024-07-29T13:19:31Z

In invariant theory, one needs to compute kernels of huge, but very sparse, matrices over fields. I assume that this is in the scope of the "Gauss-like" things you are trying to improve here? If so, I would be very interested in trying out your approach!

flenzen · 2024-07-29T13:33:30Z

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

joschmitt · 2024-07-29T15:59:38Z

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

Thanks, I will have a look at that!

fingolfin · 2024-07-29T20:08:12Z

experimental/HeapSparseLA/src/Types.jl

+                              # following subtree shall be multiplied 
+                              # with `cofactor` to get the actual values.
+  left::Union{HeapNode{T}, Nothing}
+  right::Union{HeapNode{T}, Nothing}


Just to say, storing a heap as an actual tree like this is extremely inefficient. A classic inline binary heap implemented as a Vector will beat this easily (my guess would be: by at least an order of magnitude in terms of performance, possibly more in terms of allocations). However it won't be able to accommodate your cofactor.

Are you by chance trying to use a heap to speed up repeated pivot reduction (as is common in e.g. GB computations) ? Then have you considered using a heap (storing the terms) plus a dictionary (storing the coefficients) ? This technique is e.g. described in section 4 of https://arxiv.org/abs/1206.6940.

Thank you for the idea. Can't we store a row index-cofactor pair in a vector, or can't pairs be stored contiguously? I'm not very experienced with memory layout in Julia.

I am happy to play, but few questions:

what operations are supported?

should this not be done in Hecke, where the sparse stuff is, or in AA where it should be?

how does it compare with magma? The in my experience fastest sparse Lin alg available in general - and not using plain Gauss

is this any good at finding kernel vectors?
I am in favour of speeding up Lin alg, as this the bottleneck for cohomology....
Not over F_2 though

what operations are supported?

Probably the usual and pivot, pop!, and friends on the rows? I think, it might also be nice to have support for lazy evaluation of linear combinations.

should this not be done in Hecke, where the sparse stuff is, or in AA where it should be?

If we get somewhere, then probably yes. We can move it later, right?

is this any good at finding kernel vectors?

No idea. Actually I don't even recall how kernels are computed over the integers.

fieker · 2024-07-30T05:31:44Z

experimental/HeapSparseLA/src/HeapSparseLA.jl

+function add!(v::HeapSRow{T}, w::HeapSRow{T}) where {T}
+  @assert base_ring(v) === base_ring(w)
+  R = base_ring(v)
+  isempty(v) && return w


This is not an inplace addition, the return value should be, if at all possible in v.

fieker · 2024-07-30T05:34:36Z

Looking at the benchmark: this is not fair. It should use the add!... Functions for the old srow as well
Or implementing the binary ops for the heap.
The interface is choosing memory optimization or not....

joschmitt · 2024-07-30T08:12:57Z

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

Thanks, I will have a look at that!

In this paper, they just work over $\mathbb F_2$ (or $\mathbb Z_2$ for the topologists) from what I can see. I am mainly interested in characteristic zero, which is a different business in my experience. Still, if you get a fast row (or column) reduction (for "generic" fields/rings) implemented, I'm happy to try it.

HechtiDerLachs · 2024-07-30T08:14:14Z

Looking at the benchmark: this is not fair. It should use the add!... Functions for the old srow as well

Completely agreed. Yet, the use of the old SRow wins nevertheless. I didn't use add! here, because I didn't find the methods. Maybe because they're not exported from Hecke?

HechtiDerLachs · 2024-07-30T08:22:54Z

Thanks for all the contributions to the discussion!

From my point of view I would try the following:

I really like the idea of a combined heap-dict vectors suggested by @fingolfin. The authors of the linked paper report that once they use this, it doesn't matter anymore whether they use heaps, tournament trees, or anything else, so it seems to be a reasonable choice.
From the paper linked by @flenzen I would take the idea to implement a generic type for sparse matrices which can take arbitrary (coherent?) types of sparse vectors as rows.
It should (for some applications like e.g. computing the base change matrices?) be useful to have a type which allows for lazy evaluation. In my prototype this was realized by just merging trees without summing up nodes with the same index. But that doesn't seem to be very practical. If we allowed for various concrete types of sparse rows, then we could, for instance, introduce a sparse row which holds an arbitrary linear combination of other sparse rows, but without evaluating them. Pivoting could also be supported on these with reasonable runtime.

HechtiDerLachs · 2024-07-30T12:06:59Z

experimental/HeapSparseLA/test/runtests.jl

+@testset "heap-dict vectors" begin
+  k = 100000
+  i = unique!([rand(1:10*k) for i in 1:k])
+  v = [ZZ(rand(1:10))^10000 for i in 1:length(i)]
+  a = Oscar.HeapDictSRow(ZZ, i, v)
+  aa = sparse_row(a)
+  i = unique!([rand(1:10*k) for i in 1:k])
+  v = [ZZ(rand(1:10))^10000 for i in 1:length(i)]
+  b = Oscar.HeapDictSRow(ZZ, i, v)
+  bb = sparse_row(b)
+
+  # For comparison of timings do 
+  # julia> c = copy(a); @time w = add!(c, b);
+  #   0.042619 seconds (27.40 k allocations: 41.765 MiB, 17.17% gc time)
+  #
+  # julia> cc = copy(aa); @time cc += bb;
+  #   0.226098 seconds (192.12 k allocations: 510.190 MiB, 5.40% gc time)
+


Looks like we are slowly getting somewhere. But this only works for RingElems which are big in the sense that copying them around in memory is expensive. For small elements it seems that the original SRow is still winning. In that case the rehashing while merging the dictionaries is the expensive part.

@joschmitt There are other implementations that use other finite fields with the same approach, e.g. https://link.springer.com/article/10.1007/s41468-021-00071-5. I have no idea how the heap based approach behaves in char zero. But I'd try that. Also, the cohomology computation where this approach is beneficial is quite specific (clique complexes), so it may not generalize. @HechtiDerLachs I'll look at your code soon and will play around with the underlying heap.

codecov · 2024-07-30T13:05:16Z

Codecov Report

Attention: Patch coverage is 81.62345% with 163 lines in your changes missing coverage. Please review.

Project coverage is 84.30%. Comparing base (bfcfee3) to head (d9dace8).
Report is 4 commits behind head on master.

Files	Patch %	Lines
experimental/HeapSparseLA/src/HeapSparseLA.jl	79.43%	123 Missing ⚠️
experimental/HeapSparseLA/src/HeapDictSRow.jl	84.98%	38 Missing ⚠️
experimental/HeapSparseLA/src/Types.jl	94.28%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #3969      +/-   ##
==========================================
- Coverage   84.30%   84.30%   -0.01%     
==========================================
  Files         596      599       +3     
  Lines       81894    82781     +887     
==========================================
+ Hits        69041    69788     +747     
- Misses      12853    12993     +140

Files	Coverage Δ
src/Oscar.jl	`68.75% <100.00%> (+0.39%)`	⬆️
experimental/HeapSparseLA/src/Types.jl	`94.28% <94.28%> (ø)`
experimental/HeapSparseLA/src/HeapDictSRow.jl	`84.98% <84.98%> (ø)`
experimental/HeapSparseLA/src/HeapSparseLA.jl	`79.43% <79.43%> (ø)`

... and 1 file with indirect coverage changes

HechtiDerLachs · 2024-07-30T19:30:54Z

Now upper_triangular_form! and the sparse matrices run for the heap-dict sparse rows.

It is my impression that the new data structure performs slightly better, but not by orders of magnitude. Still: I managed to get something almost as performant as stuff done by the experts.

lgoettgens · 2024-07-31T19:56:23Z

The error in the 1.6-short-ubuntu CI job seems to be due to some julia feature not being available in this version. But I hope that this has a trivial fix

fieker · 2024-08-06T10:09:22Z

the advantage of the heap actually does not make too much sense: the sorting is only called on creation of a row - if asked for. Once the rows are there, addition just iterates through both, keeping the order....
But eventually, when I'm back, we'll benchmark it together, possibly in Max's workshop on benchmarking

flenzen · 2024-08-06T10:16:23Z

I think this might be beneficial only over division rings. In this case, you can merge a multiple of one row into the other, without touching the elements of the latter. If you repeatedly add to one row, it may get denser and denser, which makes not iterating over it's entries appealing. For sorted lists, you'll have to iterate both lists to produce a new, sorted one.

HechtiDerLachs · 2024-08-06T11:00:30Z

the advantage of the heap actually does not make too much sense: the sorting is only called on creation of a row - if asked for. Once the rows are there, addition just iterates through both, keeping the order....

That's true. But it is my impression that even the in-place operations for addition of SRows a += b needs to allocate
a Vector{Tuple{Int, T}} of the size size(a) + size(b) to do the merging. Merging can not be done in place unless for
each insertion of an element c into the vector of a one has to move the whole tail of that vector, i.e. whatever came after c in a. This should also be expensive.

With the dictionary approach merging seems to come at the cost of eventually doing a rehashing. When the ring elements are big (polynomials over QQ rather than elements of GF(p)), then rehashing tends to be cheaper than merging two lists of Tuple{Int, T}. Hence I do expect speedup from the implementation here.

But eventually, when I'm back, we'll benchmark it together, possibly in Max's workshop on benchmarking

That should be a great opportunity! Looking forward to this.

HechtiDerLachs marked this pull request as draft July 24, 2024 10:45

fingolfin reviewed Jul 29, 2024

View reviewed changes

fieker reviewed Jul 30, 2024

View reviewed changes

HechtiDerLachs force-pushed the heap_sparse_la branch from 175b4b3 to 1f2849a Compare July 30, 2024 12:05

HechtiDerLachs commented Jul 30, 2024

View reviewed changes

HechtiDerLachs added 15 commits July 31, 2024 08:58

Add sandbox for heap based sparse linear algebra.

3ef1c36

WIP.

8831508

A first naive attempt.

b055275

Some repairs.

3f7b60f

Add type declarations.

26c05fd

Change ordering assumption.

fe49bf7

Progress.

afc1776

Clean up the code.

16b9cf0

Add documentation.

c4dde01

Remove deprecated code.

2429974

Update README.

c4d6bfa

Extend functionality to univariate rings.

cde7352

Add implementation for heap-dict SRows.

8081675

Make sparse matrices generic.

6a7d2ba

Use the new sparse rows in upper_triangular_form.

a562cec

Some tweaks.

d9dace8

HechtiDerLachs force-pushed the heap_sparse_la branch from 68b412d to d9dace8 Compare July 31, 2024 07:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sandbox for heap based sparse linear algebra #3969

Add sandbox for heap based sparse linear algebra #3969

HechtiDerLachs commented Jul 24, 2024 •

edited

Loading

HechtiDerLachs commented Jul 27, 2024

joschmitt commented Jul 29, 2024

flenzen commented Jul 29, 2024

joschmitt commented Jul 29, 2024

fingolfin Jul 29, 2024

flenzen Jul 29, 2024

fieker Jul 30, 2024

HechtiDerLachs Jul 30, 2024 •

edited

Loading

fieker Jul 30, 2024

fieker commented Jul 30, 2024

joschmitt commented Jul 30, 2024

HechtiDerLachs commented Jul 30, 2024

HechtiDerLachs commented Jul 30, 2024 •

edited

Loading

HechtiDerLachs Jul 30, 2024

flenzen Jul 30, 2024

codecov bot commented Jul 30, 2024 •

edited

Loading

HechtiDerLachs commented Jul 30, 2024 •

edited

Loading

lgoettgens commented Jul 31, 2024

fieker commented Aug 6, 2024

flenzen commented Aug 6, 2024 via email

HechtiDerLachs commented Aug 6, 2024

Add sandbox for heap based sparse linear algebra #3969

Are you sure you want to change the base?

Add sandbox for heap based sparse linear algebra #3969

Conversation

HechtiDerLachs commented Jul 24, 2024 • edited Loading

What is this about?

What's the state here?

Should I care?

HechtiDerLachs commented Jul 27, 2024

joschmitt commented Jul 29, 2024

flenzen commented Jul 29, 2024

joschmitt commented Jul 29, 2024

fingolfin Jul 29, 2024

Choose a reason for hiding this comment

flenzen Jul 29, 2024

Choose a reason for hiding this comment

fieker Jul 30, 2024

Choose a reason for hiding this comment

HechtiDerLachs Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

fieker Jul 30, 2024

Choose a reason for hiding this comment

fieker commented Jul 30, 2024

joschmitt commented Jul 30, 2024

HechtiDerLachs commented Jul 30, 2024

HechtiDerLachs commented Jul 30, 2024 • edited Loading

HechtiDerLachs Jul 30, 2024

Choose a reason for hiding this comment

flenzen Jul 30, 2024

Choose a reason for hiding this comment

codecov bot commented Jul 30, 2024 • edited Loading

Codecov Report

HechtiDerLachs commented Jul 30, 2024 • edited Loading

lgoettgens commented Jul 31, 2024

fieker commented Aug 6, 2024

flenzen commented Aug 6, 2024 via email

HechtiDerLachs commented Aug 6, 2024

HechtiDerLachs commented Jul 24, 2024 •

edited

Loading

HechtiDerLachs Jul 30, 2024 •

edited

Loading

HechtiDerLachs commented Jul 30, 2024 •

edited

Loading

codecov bot commented Jul 30, 2024 •

edited

Loading

HechtiDerLachs commented Jul 30, 2024 •

edited

Loading