Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sandbox for heap based sparse linear algebra #3969

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

HechtiDerLachs
Copy link
Collaborator

@HechtiDerLachs HechtiDerLachs commented Jul 24, 2024

What is this about?

Over a coffee break at the ICMS @flenzen and I discussed a bit about sparse linear algebra. He mentioned that in some application in TDA (topological data analysis) there was a data type for sparse vectors based on heaps which had proven to have advantages in various settings where Gauss-like algorithms are applied to simplify complexes (see e.g. this paper.) I became interested because of my implementation of simplify for complexes of modules: In the computation of derived pushforwards for coherent sheaves, the Gauss algorithm is the bottleneck at the moment.

I suggested to give it a try together in OSCAR. This PR is our sandbox for collaboration.

What's the state here?

I couldn't hold myself back and started implementing a naive version of what I understood the architecture to be. As a test-case I am using matrices over the integers and implemented a simple algorithm to determine an upper-triangular form (upper_triangular_form!). When running the tests it becomes clear that my naive implementation of heap-based data structures for sparse linear algebra clearly looses against the existing implementation via SRow and SMat.

However, I think this PR can still be useful since it provides a blueprint for how to contribute a new experimental section to OSCAR and @flenzen might also have ideas and try to play with and improve the code.

In fact, he already suggested to me that realizing heaps via linked binary trees is probably a dumb idea. Seems like he was right. But hey: If you know better, let's try it out! We have this code for comparison for the moment to build benchmarks.

Update: We now have an implementation of heap-dict based sparse rows as was pointed out by @fingolfin. I also made the HeapSMat type generic so that it can be used with arbitrary concrete types of sparse rows. With these new data types I now get slightly better performance in many cases when compared with the classical SMat and SRow.

It might still be the case that I'm not using enough in-place operations for the latter, though. Feel free to take a stand for the classical data types and improve their benchmark!

Should I care?

If you're interested or an expert in sparse linear algebra over rings, feel free to chime in. Comments are welcome! Otherwise, we will notify the relevant people once we get to the point that we feel that we have produced something useful, or just close this PR.

@HechtiDerLachs HechtiDerLachs marked this pull request as draft July 24, 2024 10:45
@HechtiDerLachs
Copy link
Collaborator Author

I pushed a first naive implementation. This code can be used for testing:

julia> l1 = [(rand(1:10000000), ZZ(rand(Int))) for i in 1:10000];

julia> l2 = [(rand(1:10000000), ZZ(rand(Int))) for i in 1:10000];

julia> uu = Oscar.HeapSRow(ZZ, l1);

julia> vv = Oscar.HeapSRow(ZZ, l2);

julia> @time ww = addmul!(uu, vv, ZZ(5));
  0.000078 seconds (6 allocations: 128 bytes)

julia> u = sparse_row(ZZ, l1);

julia> v = sparse_row(ZZ, l2);

julia> @time w = u + 5*v;
  0.015723 seconds (20.03 k allocations: 1.249 MiB)

julia> ww2 = Oscar.HeapSRow(w);

julia> @time ww == ww2

The lazy evaluation allows quite quick in-place operations. However, many other operations are, of course, expensive.

@flenzen: Let me know what you think. Is this going in the right direction in any reasonable sense?

@joschmitt
Copy link
Member

In invariant theory, one needs to compute kernels of huge, but very sparse, matrices over fields. I assume that this is in the scope of the "Gauss-like" things you are trying to improve here? If so, I would be very interested in trying out your approach!

@flenzen
Copy link

flenzen commented Jul 29, 2024

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

@joschmitt
Copy link
Member

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

Thanks, I will have a look at that!

# following subtree shall be multiplied
# with `cofactor` to get the actual values.
left::Union{HeapNode{T}, Nothing}
right::Union{HeapNode{T}, Nothing}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to say, storing a heap as an actual tree like this is extremely inefficient. A classic inline binary heap implemented as a Vector will beat this easily (my guess would be: by at least an order of magnitude in terms of performance, possibly more in terms of allocations). However it won't be able to accommodate your cofactor.

Are you by chance trying to use a heap to speed up repeated pivot reduction (as is common in e.g. GB computations) ? Then have you considered using a heap (storing the terms) plus a dictionary (storing the coefficients) ? This technique is e.g. described in section 4 of https://arxiv.org/abs/1206.6940.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the idea. Can't we store a row index-cofactor pair in a vector, or can't pairs be stored contiguously? I'm not very experienced with memory layout in Julia.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to play, but few questions:

  • what operations are supported?
  • should this not be done in Hecke, where the sparse stuff is, or in AA where it should be?
  • how does it compare with magma? The in my experience fastest sparse Lin alg available in general - and not using plain Gauss
  • is this any good at finding kernel vectors?
    I am in favour of speeding up Lin alg, as this the bottleneck for cohomology....
    Not over F_2 though

Copy link
Collaborator Author

@HechtiDerLachs HechtiDerLachs Jul 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what operations are supported?

Probably the usual and pivot, pop!, and friends on the rows? I think, it might also be nice to have support for lazy evaluation of linear combinations.

should this not be done in Hecke, where the sparse stuff is, or in AA where it should be?

If we get somewhere, then probably yes. We can move it later, right?

is this any good at finding kernel vectors?

No idea. Actually I don't even recall how kernels are computed over the integers.

function add!(v::HeapSRow{T}, w::HeapSRow{T}) where {T}
@assert base_ring(v) === base_ring(w)
R = base_ring(v)
isempty(v) && return w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an inplace addition, the return value should be, if at all possible in v.

@fieker
Copy link
Contributor

fieker commented Jul 30, 2024

Looking at the benchmark: this is not fair. It should use the add!... Functions for the old srow as well
Or implementing the binary ops for the heap.
The interface is choosing memory optimization or not....

@joschmitt
Copy link
Member

Yes, people doing cohomology computations for clique complexes made some good experiences with this way of implementing sparse matrices for kernel computations. See, for example, https://doi.org/10.1016/j.jsc.2016.03.008 for an overview of this application and for run times in a persistent homology context.

Thanks, I will have a look at that!

In this paper, they just work over $\mathbb F_2$ (or $\mathbb Z_2$ for the topologists) from what I can see. I am mainly interested in characteristic zero, which is a different business in my experience. Still, if you get a fast row (or column) reduction (for "generic" fields/rings) implemented, I'm happy to try it.

@HechtiDerLachs
Copy link
Collaborator Author

Looking at the benchmark: this is not fair. It should use the add!... Functions for the old srow as well

Completely agreed. Yet, the use of the old SRow wins nevertheless. I didn't use add! here, because I didn't find the methods. Maybe because they're not exported from Hecke?

@HechtiDerLachs
Copy link
Collaborator Author

HechtiDerLachs commented Jul 30, 2024

Thanks for all the contributions to the discussion!

From my point of view I would try the following:

  • I really like the idea of a combined heap-dict vectors suggested by @fingolfin. The authors of the linked paper report that once they use this, it doesn't matter anymore whether they use heaps, tournament trees, or anything else, so it seems to be a reasonable choice.
  • From the paper linked by @flenzen I would take the idea to implement a generic type for sparse matrices which can take arbitrary (coherent?) types of sparse vectors as rows.
  • It should (for some applications like e.g. computing the base change matrices?) be useful to have a type which allows for lazy evaluation. In my prototype this was realized by just merging trees without summing up nodes with the same index. But that doesn't seem to be very practical. If we allowed for various concrete types of sparse rows, then we could, for instance, introduce a sparse row which holds an arbitrary linear combination of other sparse rows, but without evaluating them. Pivoting could also be supported on these with reasonable runtime.

Comment on lines +84 to +101
@testset "heap-dict vectors" begin
k = 100000
i = unique!([rand(1:10*k) for i in 1:k])
v = [ZZ(rand(1:10))^10000 for i in 1:length(i)]
a = Oscar.HeapDictSRow(ZZ, i, v)
aa = sparse_row(a)
i = unique!([rand(1:10*k) for i in 1:k])
v = [ZZ(rand(1:10))^10000 for i in 1:length(i)]
b = Oscar.HeapDictSRow(ZZ, i, v)
bb = sparse_row(b)

# For comparison of timings do
# julia> c = copy(a); @time w = add!(c, b);
# 0.042619 seconds (27.40 k allocations: 41.765 MiB, 17.17% gc time)
#
# julia> cc = copy(aa); @time cc += bb;
# 0.226098 seconds (192.12 k allocations: 510.190 MiB, 5.40% gc time)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are slowly getting somewhere. But this only works for RingElems which are big in the sense that copying them around in memory is expensive. For small elements it seems that the original SRow is still winning. In that case the rehashing while merging the dictionaries is the expensive part.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joschmitt There are other implementations that use other finite fields with the same approach, e.g. https://link.springer.com/article/10.1007/s41468-021-00071-5. I have no idea how the heap based approach behaves in char zero. But I'd try that. Also, the cohomology computation where this approach is beneficial is quite specific (clique complexes), so it may not generalize. @HechtiDerLachs I'll look at your code soon and will play around with the underlying heap.

Copy link

codecov bot commented Jul 30, 2024

Codecov Report

Attention: Patch coverage is 81.62345% with 163 lines in your changes missing coverage. Please review.

Project coverage is 84.30%. Comparing base (bfcfee3) to head (d9dace8).
Report is 4 commits behind head on master.

Files Patch % Lines
experimental/HeapSparseLA/src/HeapSparseLA.jl 79.43% 123 Missing ⚠️
experimental/HeapSparseLA/src/HeapDictSRow.jl 84.98% 38 Missing ⚠️
experimental/HeapSparseLA/src/Types.jl 94.28% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3969      +/-   ##
==========================================
- Coverage   84.30%   84.30%   -0.01%     
==========================================
  Files         596      599       +3     
  Lines       81894    82781     +887     
==========================================
+ Hits        69041    69788     +747     
- Misses      12853    12993     +140     
Files Coverage Δ
src/Oscar.jl 68.75% <100.00%> (+0.39%) ⬆️
experimental/HeapSparseLA/src/Types.jl 94.28% <94.28%> (ø)
experimental/HeapSparseLA/src/HeapDictSRow.jl 84.98% <84.98%> (ø)
experimental/HeapSparseLA/src/HeapSparseLA.jl 79.43% <79.43%> (ø)

... and 1 file with indirect coverage changes

@HechtiDerLachs
Copy link
Collaborator Author

HechtiDerLachs commented Jul 30, 2024

Now upper_triangular_form! and the sparse matrices run for the heap-dict sparse rows.

It is my impression that the new data structure performs slightly better, but not by orders of magnitude. Still: I managed to get something almost as performant as stuff done by the experts.

@lgoettgens
Copy link
Member

The error in the 1.6-short-ubuntu CI job seems to be due to some julia feature not being available in this version. But I hope that this has a trivial fix

@fieker
Copy link
Contributor

fieker commented Aug 6, 2024

the advantage of the heap actually does not make too much sense: the sorting is only called on creation of a row - if asked for. Once the rows are there, addition just iterates through both, keeping the order....
But eventually, when I'm back, we'll benchmark it together, possibly in Max's workshop on benchmarking

@flenzen
Copy link

flenzen commented Aug 6, 2024 via email

@HechtiDerLachs
Copy link
Collaborator Author

the advantage of the heap actually does not make too much sense: the sorting is only called on creation of a row - if asked for. Once the rows are there, addition just iterates through both, keeping the order....

That's true. But it is my impression that even the in-place operations for addition of SRows a += b needs to allocate
a Vector{Tuple{Int, T}} of the size size(a) + size(b) to do the merging. Merging can not be done in place unless for
each insertion of an element c into the vector of a one has to move the whole tail of that vector, i.e. whatever came after c in a. This should also be expensive.

With the dictionary approach merging seems to come at the cost of eventually doing a rehashing. When the ring elements are big (polynomials over QQ rather than elements of GF(p)), then rehashing tends to be cheaper than merging two lists of Tuple{Int, T}. Hence I do expect speedup from the implementation here.

But eventually, when I'm back, we'll benchmark it together, possibly in Max's workshop on benchmarking

That should be a great opportunity! Looking forward to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants