Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking progress for merging the dofhandlers #629

Closed
kimauth opened this issue Mar 22, 2023 · 15 comments
Closed

Tracking progress for merging the dofhandlers #629

kimauth opened this issue Mar 22, 2023 · 15 comments
Milestone

Comments

@kimauth
Copy link
Member

kimauth commented Mar 22, 2023

Before merging the dofhandlers, the MixedDofHandler must be able to do everything that the DofHandler does and should ideally be equally fast. This issue keeps track of how far we've come along that way.

Progress tracking

The focus here is to compare how a MixedDofHandler performs on a concrete grid with all fields on the full domain, i.e. how well MixedDofHandler works as a drop-in replacement of DofHandler.

Benchmarking code should go in the comments, ideally one category at a time. However, if a method does not work / does not perform well, open a separate issue about it and reference this one. That way it will be easier to keep an overview and do small reviewable PRs to fix issues.

Syntax: Given syntax works and yields correct results for MixedDofHandler
Performance: performance with MixedDofHandler is comparable to DofHandler + there is a benchmark of it!

Syntax Performance
Construction
DofHandler(grid)
add!(dh, name, dim[, ip])
close!(dh)
Basic functionality
ndofs(dh)
ndofs_per_cell(dh[, cell])
dof_range(dh, field_name)
celldofs(dh, i)
celldofs!(dofs, dh, i)
Renumbering
renumber!(dh, order)
renumber!(dh, DofOrder.FieldWise())
renumber!(dh, DofOrder.ComponentWise())
Sparsity pattern
create_sparsity_pattern(dh)
create_sparsity_pattern(dh; coupling)
create_symmetric_sparsity_pattern(dh)
create_symmetric_sparsity_pattern(dh; coupling)
Constraints
ConstraintHandler(dh)
add!(ch, dbc::Dirichlet)
add!(ch, ac::AffineConstraint)
add!(ch, pdbc::PeriodicDirichlet)
close!(ch)
apply_analytical!(a, dh, fieldname, f, cellset)
CellIterator
CellCache(dh)
reinit!(cc, i)
CellIterator(dh, cellset)
Postprocessing
get_point_values(ph, dh, dof_values[, fieldname])
reshape_to_nodes(dh, u, fieldname)
vtk_point_data(vtk, dh, u)

Note that all benchmarks might not be equally important. We should discuss based on the results where regressions are acceptable and where they aren't.

Benchmarking

A base set-up for benchmarking can look like this:

using Ferrite

grid = generate_grid(Quadrilateral, (10, 10))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)
@kimauth
Copy link
Member Author

kimauth commented Mar 22, 2023

Construction

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000))
ip_v = Lagrange{2,RefCube,2}()
ip_s = Lagrange{2,RefCube,1}()

@btime DofHandler($grid); # 124.676 ns (7 allocations: 320 bytes)
@btime MixedDofHandler($grid); # 136.625 μs (8 allocations: 15.26 MiB)

@btime add!(Ref(dh)[], $(:v), $2, $ip_v) setup=(dh=DofHandler($grid)) evals=1; # 41.000 ns (3 allocations: 240 bytes)
@btime add!(dh, $(:v), $2, $ip_v) setup=(dh=MixedDofHandler($grid)) evals=1; # 11.676 ms (11 allocations: 18.00 MiB)

function setup_close(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}()) 
    return dh
end

@btime close!(dh) setup=(dh=setup_dh($grid, $DofHandler)) evals=1; # 647.299 ms (224 allocations: 565.04 MiB)
@btime close!(dh) setup=(dh=setup_dh($grid, $MixedDofHandler)) evals=1; # 873.991 ms (297 allocations: 791.71 MiB)

Edit:
Update construction

After @fredrikekre efforts last night (#637, #639, #642, #643) the full construction of both dofhandlers is about twice as fast now while consuming roughly half as much memory.

The same set-up as above gives

function full_construction(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}()) 
    close!(dh)
end

@btime full_construction($grid, $DofHandler); # 346.665 ms (131 allocations: 363.19 MiB)
@btime full_construction($grid, $MixedDofHandler); # 356.083 ms (146 allocations: 496.37 MiB)

(Note that the fine-grained benchmarks from above indeed are not that meaningful as some memory allocations happen in different functions for DofHandler and MixedDofHandler).

@termi-official
Copy link
Member

termi-official commented Mar 22, 2023

Constraints

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));
∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:v, ∂Ω, (x, t) -> [0, 0]);
ip_v = Lagrange{2,RefCube,2}();
ip_s = Lagrange{2,RefCube,1}();

function setup_dhclosed(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return dh
end

function setup_ch(grid, T)
    dh = setup_dhclosed(grid, T)
    return ConstraintHandler(dh)
end

function setup_ch2(grid, T, dbc)
    dh = setup_dhclosed(grid, T)
    ch = ConstraintHandler(dh)
    add!(ch, dbc)
    return ch
end

@btime ConstraintHandler(dh)  setup=(dh=setup_dhclosed($grid, $DofHandler)) evals=1; # 9.560 μs (12 allocations: 992 bytes)
@btime ConstraintHandler(dh)  setup=(dh=setup_dhclosed($grid, $MixedDofHandler)) evals=1; # 10.130 μs (12 allocations: 992 bytes)

@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $DofHandler)) evals=1; # 38.962 ms (8108 allocations: 21.91 MiB)
@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $MixedDofHandler)) evals=1; # 3.617 ms (8124 allocations: 4.04 MiB)

@btime close!(ch) setup=(ch = setup_ch2($grid, $DofHandler, $dbc)) evals=1; # 2.288 s (12071 allocations: 288.21 MiB)
@btime close!(ch) setup=(ch = setup_ch2($grid, $MixedDofHandler, $dbc)) evals=1; # 2.198 s (12071 allocations: 288.21 MiB)

@termi-official
Copy link
Member

termi-official commented Mar 22, 2023

CellIterator

Microbenchmark

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));
function setup_dhclosed(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return dh
end

function setup_cc(grid, T, flags)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return CellCache(dh, flags)
end

@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 1.062 μs (4 allocations: 480 bytes)
@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 629.000 ns (4 allocations: 480 bytes)

@btime CellIterator(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 857.000 ns (4 allocations: 480 bytes)
@btime CellIterator(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 3.763 μs (4 allocations: 480 bytes)

@btime reinit!(cc, 1) setup=(cc=setup_cc($grid, $DofHandler, $UpdateFlags(true, true, true))); # 54.031 ns (0 allocations: 0 bytes)
@btime reinit!(cc, 1) setup=(cc=setup_cc($grid, $MixedDofHandler, $UpdateFlags(true, true, true))); # 57.543 ns (0 allocations: 0 bytes)

Integrated test (Poisson)

using Ferrite, SparseArrays

function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellScalarValues)
    n_basefuncs = getnbasefunctions(cellvalues)
    fill!(Ke, 0)
    fill!(fe, 0)
    for q_point in 1:getnquadpoints(cellvalues)
        dΩ = getdetJdV(cellvalues, q_point)
        for i in 1:n_basefuncs
            δu  = shape_value(cellvalues, q_point, i)
            ∇δu = shape_gradient(cellvalues, q_point, i)
            fe[i] += δu *for j in 1:n_basefuncs
                ∇u = shape_gradient(cellvalues, q_point, j)
                Ke[i, j] += (∇δu  ∇u) *end
        end
    end
    return Ke, fe
end

function assemble_global(cellvalues::CellScalarValues, K::SparseMatrixCSC, dh::Union{DofHandler, MixedDofHandler})
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    f = zeros(ndofs(dh))
    assembler = start_assemble(K, f)
    for cell in CellIterator(dh)
        reinit!(cellvalues, cell)
        assemble_element!(Ke, fe, cellvalues)
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return K, f
end

function assemble_heat(T)
    grid = generate_grid(Quadrilateral, (100, 100));

    dim = 2
    ip = Lagrange{dim, RefCube, 1}()
    qr = QuadratureRule{dim, RefCube}(2)
    cellvalues = CellScalarValues(qr, ip);

    dh = T(grid)
    add!(dh, :u, 1)
    close!(dh);

    K = create_sparsity_pattern(dh)
    ch = ConstraintHandler(dh);

    ∂Ω = union(
        getfaceset(grid, "left"),
        getfaceset(grid, "right"),
        getfaceset(grid, "top"),
        getfaceset(grid, "bottom"),
    );
    dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
    add!(ch, dbc);
    close!(ch)

    @btime assemble_global($cellvalues, $K, $dh);
end

assemble_heat(DofHandler) # 4.724 ms (12 allocations: 80.69 KiB)
assemble_heat(MixedDofHandler) # 4.711 ms (12 allocations: 80.69 KiB)

@termi-official
Copy link
Member

I think bringing down the constructor times for MixedDofHandler(grid) down to DofHandler(grid) is not worth it. We can move the allocation into close, but it won't change total time.

@fredrikekre
Copy link
Member

Yea, and also I am not sure it is very useful to compare with such granularity. I would setup a benchmark that does all of constructing, adding fields, distribute in close!. The time distribution between those doesn't really matter since you will always do all of them. I guess it might be useful to track down differences though, but as #629 (comment) shows it is all in close! (which make sense, the constructor and adding fields are more or less no-ops).

@termi-official
Copy link
Member

Not sure if I can fully agree here. I think it makes sense to at least check that we do not have severe performance regression in some simple operations (e.g. due to type instability or unwanted allocs).

@fredrikekre fredrikekre added this to the 0.4.0 milestone Mar 23, 2023
@kimauth
Copy link
Member Author

kimauth commented Mar 23, 2023

Basic functionality

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (10, 10))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

# does the same thing anyways
@btime ndofs($dh); # 2.125 ns (0 allocations: 0 bytes)
@btime ndofs($mixed_dh); # 2.125 ns (0 allocations: 0 bytes)

@btime ndofs_per_cell($dh); # 2.125 ns (0 allocations: 0 bytes)
@btime ndofs_per_cell($mixed_dh); # 2.083 ns (0 allocations: 0 bytes)

@btime dof_range($dh, $(:v)); # 37.298 ns (0 allocations: 0 bytes)
@btime dof_range($mixed_dh, $(:v)); # 47.781 ns (0 allocations: 0 bytes)

@btime celldofs($dh, $15); # 51.756 ns (1 allocation: 240 bytes)
@btime celldofs($mixed_dh, $15); # 46.249 ns (1 allocation: 240 bytes)

dofs = Vector{Int}(undef, ndofs_per_cell(dh, 15)); 
@btime celldofs!($dofs, $dh, $15); # 7.675 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 24.072 ns (0 allocations: 0 bytes)

Edit: The time difference between the celldofs! methods is fixed by either of using julia 1.9-rc1 or #636 .

@kimauth
Copy link
Member Author

kimauth commented Mar 23, 2023

Postprocessing

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

u = rand(ndofs(dh))

# point evaluation
points = [2*rand(Vec{2})-ones(Vec{2}) for _ in 1:1000]
ph = PointEvalHandler(grid, points)
@btime get_point_values($ph, $dh, $u, $(:v)); # 98.417 μs (28 allocations: 17.44 KiB)
@btime get_point_values($ph, $mixed_dh, $u, $(:v)); # 142.250 μs (31 allocations: 17.56 KiB)

# reshaping from dof-order to nodal order (part of vtk export)
@btime reshape_to_nodes($dh, $u, $(:v)); # 22.863 ms (6 allocations: 22.93 MiB)
@btime reshape_to_nodes($mixed_dh, $u, $(:v)) # 234.516 ms (10 allocations: 22.93 MiB)

# vtk export
filename = joinpath(tempdir(), "test")
@btime vtk_point_data(vtk, $dh, $u) setup=(vtk=vtk_grid($filename, $grid)) evals=1; # 852.222 ms (147 allocations: 52.54 MiB)
@btime vtk_point_data(vtk, $mixed_dh, $u) setup=(vtk=vtk_grid($filename, $grid)) evals=1; # 1.268 s (158 allocations: 52.54 MiB)

Edit: Fixing #631 is likely to fix the performance gaps in reshape_to_nodes and vtk_point_data.

@fredrikekre
Copy link
Member

fredrikekre commented Mar 24, 2023

@btime celldofs!($dofs, $dh, $15); # 7.675 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 24.072 ns (0 allocations: 0 bytes)

Those are equally fast for me:

@btime celldofs!($dofs, $dh, $15); # 6.619 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 6.393 ns (0 allocations: 0 bytes)

@termi-official
Copy link
Member

Can this be explained by a diff in Julia version or the used machines?

@kimauth
Copy link
Member Author

kimauth commented Mar 25, 2023

Sparsity pattern

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (100, 100))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

# without coupling
@btime create_sparsity_pattern($dh); # 34.338 ms (24 allocations: 246.05 MiB)
@btime create_sparsity_pattern($mixed_dh); # 34.367 ms (24 allocations: 246.05 MiB)

@btime create_symmetric_sparsity_pattern($dh); # 18.820 ms (24 allocations: 130.69 MiB)
@btime create_symmetric_sparsity_pattern($mixed_dh); # 18.739 ms (24 allocations: 130.69 MiB)

# with coupling

@btime create_sparsity_pattern($dh; coupling=$field_coupling); # 26.282 ms (31 allocations: 175.80 MiB)
@btime create_sparsity_pattern($mixed_dh; coupling=$field_coupling); # 27.622 ms (36 allocations: 175.80 MiB)

@btime create_symmetric_sparsity_pattern($dh; coupling=$field_coupling); # 15.894 ms (31 allocations: 95.57 MiB)
@btime create_symmetric_sparsity_pattern($mixed_dh; coupling=$field_coupling); # 15.278 ms (36 allocations: 95.57 MiB)

Edit: Coupling benchmarks updated after #650 .

@kimauth
Copy link
Member Author

kimauth commented Mar 25, 2023

CellIterator II

@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 1.062 μs (4 allocations: 480 bytes)
@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 629.000 ns (4 allocations: 480 bytes)

Can't reproduce the difference in constructing CellCache

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

@btime CellCache($dh); # 74.359 ns (4 allocations: 480 bytes)
@btime CellCache($mixed_dh); # 73.665 ns (4 allocations: 480 bytes)

@btime CellIterator($dh); # 74.820 ns (4 allocations: 480 bytes)
@btime CellIterator($mixed_dh); # 9.708 μs (4 allocations: 480 bytes)

cc_dh = CellCache(dh);
cc_mixed_dh = CellCache(mixed_dh)
@btime reinit!($cc_dh, $1); # 20.938 ns (0 allocations: 0 bytes)
@btime reinit!($cc_mixed_dh, $1); # 22.503 ns (0 allocations: 0 bytes)

@kimauth
Copy link
Member Author

kimauth commented Mar 25, 2023

Constraints: apply_analytical, affine constraints, periodic bcs

f(x) = x  x
u = zeros(ndofs(dh))

@btime apply_analytical!($u, $dh, $(:s), $f); # 25.878 ms (14 allocations: 2.27 KiB)
@btime apply_analytical!($u, $mixed_dh, $(:s), $f); # 323.590 ms (76 allocations: 34.50 MiB)

lc = AffineConstraint(1, [2 => 5.0, 3 => 3.0], 1.0)
@btime add!(ch, $lc) setup=(ch = ConstraintHandler($dh)); # 11.596 ns (0 allocations: 0 bytes)
@btime add!(ch, $lc) setup=(ch = ConstraintHandler($mixed_dh)); # 11.177 ns (0 allocations: 0 bytes)

φ(x) = x - Vec{2}((1.0, 0.0))
face_mapping = collect_periodic_faces(grid, "left", "right", φ)
pdbc = PeriodicDirichlet(:v, face_mapping, [1, 2])
# Add the constraint to the constraint handler
@btime add!(ch, $pdbc) setup=(ch=ConstraintHandler($dh)); # 737.750 μs (4091 allocations: 1.24 MiB)
@btime add!(ch, $pdbc) setup=(ch=ConstraintHandler($mixed_dh)); # 735.458 μs (4091 allocations: 1.24 MiB)

The difference in apply_analytical seems to a significant part be caused by poor memory alignment and thus should mostly be fixed by fixing #631 .

@kimauth
Copy link
Member Author

kimauth commented Mar 25, 2023

Renumbering

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (100, 100))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

@btime renumber!($dh, $(ndofs(dh):-1:1)); # 596.500 μs (4 allocations: 22.56 KiB)
@btime renumber!($mixed_dh, $(ndofs(mixed_dh):-1:1)); # 598.459 μs (4 allocations: 22.56 KiB)

@btime renumber!($dh, $(DofOrder.FieldWise())); # 6.330 ms (94 allocations: 5.87 MiB)
@btime renumber!($mixed_dh, $(DofOrder.FieldWise())); # 6.067 ms (95 allocations: 5.87 MiB)

@btime renumber!($dh, $(DofOrder.ComponentWise())); # 6.016 ms (113 allocations: 4.33 MiB)
@btime renumber!($mixed_dh, $(DofOrder.ComponentWise())); # 5.875 ms (114 allocations: 4.33 MiB)

Edit: Updated with new benchmarks after #645.

fredrikekre added a commit that referenced this issue Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
DofHandler merge and rework of FieldHandler/SubDofHandler.
fredrikekre added a commit that referenced this issue Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this issue Mar 29, 2023
This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead
of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates, and dofs are stored in ascending
cell order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this issue Mar 29, 2023
This changes `FieldHandler.cellset` to be a `BitSet` (which is sorted)
instead of a `Set`. This ensures that loops over sub-domains are done in
ascending cell order.

Since e.g. cells, node coordinates and dofs are stored in ascending cell
order this gives a significant performance boost to loops over
sub-domains, i.e. assembly-style loops. In particular, this removes the
performance gap between `MixedDofHandler` and `DofHandler` in the
`create_sparsity_pattern` benchmark in #629.

This is a minimal/initial step towards #625 that can be done before the
`DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.
fredrikekre added a commit that referenced this issue Mar 31, 2023
This patch uses `BitSet` in `apply_analytical!` and `reshape_to_nodes`
for `MixedDofHandler`. The benefit here is twofold: computing the
intersection is much faster (basically just bitwise `&`) and the
subsequent looping over the cells are done in ascending cell order.

This closes the performance gap between `MixedDofHandler` and
`DofHandler` in benchmarks from #629 of `apply_analytical!`,
`reshape_to_nodes`, and `vtk_point_data`. For example, here is the
benchmark results for `apply_analytical!`:
```
387.853 ms (72 allocations: 34.50 MiB)  # MixedDofHandler master
 55.262 ms (38 allocations: 553.45 KiB) # MixedDofHandler patch
 41.861 ms (14 allocations: 2.27 KiB)   # DofHandler master/patch
```
@fredrikekre
Copy link
Member

After #660 I think this issue can be closed since the MixedDofHandler is now equally as performant (at least where it really matters). 🎉

fredrikekre added a commit that referenced this issue Mar 31, 2023
This patch uses `BitSet` in `apply_analytical!` and `reshape_to_nodes`
for `MixedDofHandler`. The benefit here is twofold: computing the
intersection is much faster (basically just bitwise `&`) and the
subsequent looping over the cells are done in ascending cell order.

This closes the performance gap between `MixedDofHandler` and
`DofHandler` in benchmarks from #629 of `apply_analytical!`,
`reshape_to_nodes`, and `vtk_point_data`. For example, here is the
benchmark results for `apply_analytical!`:
```
387.853 ms (72 allocations: 34.50 MiB)  # MixedDofHandler master
 55.262 ms (38 allocations: 553.45 KiB) # MixedDofHandler patch
 41.861 ms (14 allocations: 2.27 KiB)   # DofHandler master/patch
```
fredrikekre added a commit that referenced this issue Mar 31, 2023
This patch uses `BitSet` in `apply_analytical!` and `reshape_to_nodes`
for `MixedDofHandler`. The benefit here is twofold: computing the
intersection is much faster (basically just bitwise `&`) and the
subsequent looping over the cells are done in ascending cell order.

This closes the performance gap between `MixedDofHandler` and
`DofHandler` in benchmarks from #629 of `apply_analytical!`,
`reshape_to_nodes`, and `vtk_point_data`. For example, here is the
benchmark results for `apply_analytical!`:
```
387.853 ms (72 allocations: 34.50 MiB)  # MixedDofHandler master
 55.262 ms (38 allocations: 553.45 KiB) # MixedDofHandler patch
 41.861 ms (14 allocations: 2.27 KiB)   # DofHandler master/patch
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants