Tracking progress for merging the dofhandlers #629

kimauth · 2023-03-22T18:15:19Z

Before merging the dofhandlers, the MixedDofHandler must be able to do everything that the DofHandler does and should ideally be equally fast. This issue keeps track of how far we've come along that way.

Progress tracking

The focus here is to compare how a MixedDofHandler performs on a concrete grid with all fields on the full domain, i.e. how well MixedDofHandler works as a drop-in replacement of DofHandler.

Benchmarking code should go in the comments, ideally one category at a time. However, if a method does not work / does not perform well, open a separate issue about it and reference this one. That way it will be easier to keep an overview and do small reviewable PRs to fix issues.

Syntax: Given syntax works and yields correct results for MixedDofHandler
Performance: performance with MixedDofHandler is comparable to DofHandler + there is a benchmark of it!

	Syntax	Performance

Construction		✅
`DofHandler(grid)`	✅
`add!(dh, name, dim[, ip])`	✅
`close!(dh)`	✅

Basic functionality
`ndofs(dh)`	✅	✅
`ndofs_per_cell(dh[, cell])`	✅	✅
`dof_range(dh, field_name)`	✅	❓
`celldofs(dh, i)`	✅	✅
`celldofs!(dofs, dh, i)`	✅	✅

Renumbering
`renumber!(dh, order)`	✅	✅
`renumber!(dh, DofOrder.FieldWise())`	✅	✅
`renumber!(dh, DofOrder.ComponentWise())`	✅	✅

Sparsity pattern
`create_sparsity_pattern(dh)`	✅	✅
`create_sparsity_pattern(dh; coupling)`	✅	✅
`create_symmetric_sparsity_pattern(dh)`	✅	✅
`create_symmetric_sparsity_pattern(dh; coupling)`	✅	✅

Constraints
`ConstraintHandler(dh)`	✅	✅
`add!(ch, dbc::Dirichlet)`	✅	✅
`add!(ch, ac::AffineConstraint)`	✅	✅
`add!(ch, pdbc::PeriodicDirichlet)`	✅	✅
`close!(ch)`	✅	✅
`apply_analytical!(a, dh, fieldname, f, cellset)`	✅	❌

CellIterator
`CellCache(dh)`	✅	✅
`reinit!(cc, i)`	✅	✅
`CellIterator(dh, cellset)`	✅	❌

Postprocessing
`get_point_values(ph, dh, dof_values[, fieldname])`	✅	❓
`reshape_to_nodes(dh, u, fieldname)`	✅	❌
`vtk_point_data(vtk, dh, u)`	✅	❌

Note that all benchmarks might not be equally important. We should discuss based on the results where regressions are acceptable and where they aren't.

Benchmarking

A base set-up for benchmarking can look like this:

using Ferrite

grid = generate_grid(Quadrilateral, (10, 10))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

The text was updated successfully, but these errors were encountered:

kimauth · 2023-03-22T18:16:14Z

Construction

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000))
ip_v = Lagrange{2,RefCube,2}()
ip_s = Lagrange{2,RefCube,1}()

@btime DofHandler($grid); # 124.676 ns (7 allocations: 320 bytes)
@btime MixedDofHandler($grid); # 136.625 μs (8 allocations: 15.26 MiB)

@btime add!(Ref(dh)[], $(:v), $2, $ip_v) setup=(dh=DofHandler($grid)) evals=1; # 41.000 ns (3 allocations: 240 bytes)
@btime add!(dh, $(:v), $2, $ip_v) setup=(dh=MixedDofHandler($grid)) evals=1; # 11.676 ms (11 allocations: 18.00 MiB)

function setup_close(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}()) 
    return dh
end

@btime close!(dh) setup=(dh=setup_dh($grid, $DofHandler)) evals=1; # 647.299 ms (224 allocations: 565.04 MiB)
@btime close!(dh) setup=(dh=setup_dh($grid, $MixedDofHandler)) evals=1; # 873.991 ms (297 allocations: 791.71 MiB)

Edit:
Update construction

After @fredrikekre efforts last night (#637, #639, #642, #643) the full construction of both dofhandlers is about twice as fast now while consuming roughly half as much memory.

The same set-up as above gives

function full_construction(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}()) 
    close!(dh)
end

@btime full_construction($grid, $DofHandler); # 346.665 ms (131 allocations: 363.19 MiB)
@btime full_construction($grid, $MixedDofHandler); # 356.083 ms (146 allocations: 496.37 MiB)

(Note that the fine-grained benchmarks from above indeed are not that meaningful as some memory allocations happen in different functions for DofHandler and MixedDofHandler).

termi-official · 2023-03-22T20:14:20Z

Constraints

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));
∂Ω = union(
    getfaceset(grid, "left"),
    getfaceset(grid, "right"),
    getfaceset(grid, "top"),
    getfaceset(grid, "bottom"),
);
dbc = Dirichlet(:v, ∂Ω, (x, t) -> [0, 0]);
ip_v = Lagrange{2,RefCube,2}();
ip_s = Lagrange{2,RefCube,1}();

function setup_dhclosed(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return dh
end

function setup_ch(grid, T)
    dh = setup_dhclosed(grid, T)
    return ConstraintHandler(dh)
end

function setup_ch2(grid, T, dbc)
    dh = setup_dhclosed(grid, T)
    ch = ConstraintHandler(dh)
    add!(ch, dbc)
    return ch
end

@btime ConstraintHandler(dh)  setup=(dh=setup_dhclosed($grid, $DofHandler)) evals=1; # 9.560 μs (12 allocations: 992 bytes)
@btime ConstraintHandler(dh)  setup=(dh=setup_dhclosed($grid, $MixedDofHandler)) evals=1; # 10.130 μs (12 allocations: 992 bytes)

@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $DofHandler)) evals=1; # 38.962 ms (8108 allocations: 21.91 MiB)
@btime add!(ch, $dbc) setup=(ch = setup_ch($grid, $MixedDofHandler)) evals=1; # 3.617 ms (8124 allocations: 4.04 MiB)

@btime close!(ch) setup=(ch = setup_ch2($grid, $DofHandler, $dbc)) evals=1; # 2.288 s (12071 allocations: 288.21 MiB)
@btime close!(ch) setup=(ch = setup_ch2($grid, $MixedDofHandler, $dbc)) evals=1; # 2.198 s (12071 allocations: 288.21 MiB)

termi-official · 2023-03-22T21:15:32Z

CellIterator

Microbenchmark

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));
function setup_dhclosed(grid, T)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return dh
end

function setup_cc(grid, T, flags)
    dh = T(grid)
    add!(dh, :v, 2, Lagrange{2,RefCube,2}()) 
    add!(dh, :s, 1, Lagrange{2,RefCube,1}())
    close!(dh)
    return CellCache(dh, flags)
end

@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 1.062 μs (4 allocations: 480 bytes)
@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 629.000 ns (4 allocations: 480 bytes)

@btime CellIterator(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 857.000 ns (4 allocations: 480 bytes)
@btime CellIterator(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 3.763 μs (4 allocations: 480 bytes)

@btime reinit!(cc, 1) setup=(cc=setup_cc($grid, $DofHandler, $UpdateFlags(true, true, true))); # 54.031 ns (0 allocations: 0 bytes)
@btime reinit!(cc, 1) setup=(cc=setup_cc($grid, $MixedDofHandler, $UpdateFlags(true, true, true))); # 57.543 ns (0 allocations: 0 bytes)

Integrated test (Poisson)

using Ferrite, SparseArrays

function assemble_element!(Ke::Matrix, fe::Vector, cellvalues::CellScalarValues)
    n_basefuncs = getnbasefunctions(cellvalues)
    fill!(Ke, 0)
    fill!(fe, 0)
    for q_point in 1:getnquadpoints(cellvalues)
        dΩ = getdetJdV(cellvalues, q_point)
        for i in 1:n_basefuncs
            δu  = shape_value(cellvalues, q_point, i)
            ∇δu = shape_gradient(cellvalues, q_point, i)
            fe[i] += δu * dΩ
            for j in 1:n_basefuncs
                ∇u = shape_gradient(cellvalues, q_point, j)
                Ke[i, j] += (∇δu ⋅ ∇u) * dΩ
            end
        end
    end
    return Ke, fe
end

function assemble_global(cellvalues::CellScalarValues, K::SparseMatrixCSC, dh::Union{DofHandler, MixedDofHandler})
    n_basefuncs = getnbasefunctions(cellvalues)
    Ke = zeros(n_basefuncs, n_basefuncs)
    fe = zeros(n_basefuncs)
    f = zeros(ndofs(dh))
    assembler = start_assemble(K, f)
    for cell in CellIterator(dh)
        reinit!(cellvalues, cell)
        assemble_element!(Ke, fe, cellvalues)
        assemble!(assembler, celldofs(cell), Ke, fe)
    end
    return K, f
end

function assemble_heat(T)
    grid = generate_grid(Quadrilateral, (100, 100));

    dim = 2
    ip = Lagrange{dim, RefCube, 1}()
    qr = QuadratureRule{dim, RefCube}(2)
    cellvalues = CellScalarValues(qr, ip);

    dh = T(grid)
    add!(dh, :u, 1)
    close!(dh);

    K = create_sparsity_pattern(dh)
    ch = ConstraintHandler(dh);

    ∂Ω = union(
        getfaceset(grid, "left"),
        getfaceset(grid, "right"),
        getfaceset(grid, "top"),
        getfaceset(grid, "bottom"),
    );
    dbc = Dirichlet(:u, ∂Ω, (x, t) -> 0)
    add!(ch, dbc);
    close!(ch)

    @btime assemble_global($cellvalues, $K, $dh);
end

assemble_heat(DofHandler) # 4.724 ms (12 allocations: 80.69 KiB)
assemble_heat(MixedDofHandler) # 4.711 ms (12 allocations: 80.69 KiB)

termi-official · 2023-03-22T22:12:41Z

I think bringing down the constructor times for MixedDofHandler(grid) down to DofHandler(grid) is not worth it. We can move the allocation into close, but it won't change total time.

fredrikekre · 2023-03-22T22:30:25Z

Yea, and also I am not sure it is very useful to compare with such granularity. I would setup a benchmark that does all of constructing, adding fields, distribute in close!. The time distribution between those doesn't really matter since you will always do all of them. I guess it might be useful to track down differences though, but as #629 (comment) shows it is all in close! (which make sense, the constructor and adding fields are more or less no-ops).

termi-official · 2023-03-22T22:36:45Z

Not sure if I can fully agree here. I think it makes sense to at least check that we do not have severe performance regression in some simple operations (e.g. due to type instability or unwanted allocs).

kimauth · 2023-03-23T17:35:02Z

Basic functionality

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (10, 10))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

# does the same thing anyways
@btime ndofs($dh); # 2.125 ns (0 allocations: 0 bytes)
@btime ndofs($mixed_dh); # 2.125 ns (0 allocations: 0 bytes)

@btime ndofs_per_cell($dh); # 2.125 ns (0 allocations: 0 bytes)
@btime ndofs_per_cell($mixed_dh); # 2.083 ns (0 allocations: 0 bytes)

@btime dof_range($dh, $(:v)); # 37.298 ns (0 allocations: 0 bytes)
@btime dof_range($mixed_dh, $(:v)); # 47.781 ns (0 allocations: 0 bytes)

@btime celldofs($dh, $15); # 51.756 ns (1 allocation: 240 bytes)
@btime celldofs($mixed_dh, $15); # 46.249 ns (1 allocation: 240 bytes)

dofs = Vector{Int}(undef, ndofs_per_cell(dh, 15)); 
@btime celldofs!($dofs, $dh, $15); # 7.675 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 24.072 ns (0 allocations: 0 bytes)

Edit: The time difference between the celldofs! methods is fixed by either of using julia 1.9-rc1 or #636 .

kimauth · 2023-03-23T18:07:34Z

Postprocessing

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

u = rand(ndofs(dh))

# point evaluation
points = [2*rand(Vec{2})-ones(Vec{2}) for _ in 1:1000]
ph = PointEvalHandler(grid, points)
@btime get_point_values($ph, $dh, $u, $(:v)); # 98.417 μs (28 allocations: 17.44 KiB)
@btime get_point_values($ph, $mixed_dh, $u, $(:v)); # 142.250 μs (31 allocations: 17.56 KiB)

# reshaping from dof-order to nodal order (part of vtk export)
@btime reshape_to_nodes($dh, $u, $(:v)); # 22.863 ms (6 allocations: 22.93 MiB)
@btime reshape_to_nodes($mixed_dh, $u, $(:v)) # 234.516 ms (10 allocations: 22.93 MiB)

# vtk export
filename = joinpath(tempdir(), "test")
@btime vtk_point_data(vtk, $dh, $u) setup=(vtk=vtk_grid($filename, $grid)) evals=1; # 852.222 ms (147 allocations: 52.54 MiB)
@btime vtk_point_data(vtk, $mixed_dh, $u) setup=(vtk=vtk_grid($filename, $grid)) evals=1; # 1.268 s (158 allocations: 52.54 MiB)

Edit: Fixing #631 is likely to fix the performance gaps in reshape_to_nodes and vtk_point_data.

fredrikekre · 2023-03-24T00:38:28Z

@btime celldofs!($dofs, $dh, $15); # 7.675 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 24.072 ns (0 allocations: 0 bytes)

Those are equally fast for me:

@btime celldofs!($dofs, $dh, $15); # 6.619 ns (0 allocations: 0 bytes)
@btime celldofs!($dofs, $mixed_dh, $15); # 6.393 ns (0 allocations: 0 bytes)

termi-official · 2023-03-24T10:42:12Z

Can this be explained by a diff in Julia version or the used machines?

kimauth · 2023-03-25T14:38:49Z

Sparsity pattern

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (100, 100))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

# without coupling
@btime create_sparsity_pattern($dh); # 34.338 ms (24 allocations: 246.05 MiB)
@btime create_sparsity_pattern($mixed_dh); # 34.367 ms (24 allocations: 246.05 MiB)

@btime create_symmetric_sparsity_pattern($dh); # 18.820 ms (24 allocations: 130.69 MiB)
@btime create_symmetric_sparsity_pattern($mixed_dh); # 18.739 ms (24 allocations: 130.69 MiB)

# with coupling

@btime create_sparsity_pattern($dh; coupling=$field_coupling); # 26.282 ms (31 allocations: 175.80 MiB)
@btime create_sparsity_pattern($mixed_dh; coupling=$field_coupling); # 27.622 ms (36 allocations: 175.80 MiB)

@btime create_symmetric_sparsity_pattern($dh; coupling=$field_coupling); # 15.894 ms (31 allocations: 95.57 MiB)
@btime create_symmetric_sparsity_pattern($mixed_dh; coupling=$field_coupling); # 15.278 ms (36 allocations: 95.57 MiB)

Edit: Coupling benchmarks updated after #650 .

kimauth · 2023-03-25T15:00:14Z

CellIterator II

@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $DofHandler)); # 1.062 μs (4 allocations: 480 bytes)
@btime CellCache(dh) setup=(dh=setup_dhclosed($grid, $MixedDofHandler)); # 629.000 ns (4 allocations: 480 bytes)

Can't reproduce the difference in constructing CellCache

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (1000, 1000));

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

@btime CellCache($dh); # 74.359 ns (4 allocations: 480 bytes)
@btime CellCache($mixed_dh); # 73.665 ns (4 allocations: 480 bytes)

@btime CellIterator($dh); # 74.820 ns (4 allocations: 480 bytes)
@btime CellIterator($mixed_dh); # 9.708 μs (4 allocations: 480 bytes)

cc_dh = CellCache(dh);
cc_mixed_dh = CellCache(mixed_dh)
@btime reinit!($cc_dh, $1); # 20.938 ns (0 allocations: 0 bytes)
@btime reinit!($cc_mixed_dh, $1); # 22.503 ns (0 allocations: 0 bytes)

kimauth · 2023-03-25T15:32:25Z

Constraints: apply_analytical, affine constraints, periodic bcs

f(x) = x ⋅ x
u = zeros(ndofs(dh))

@btime apply_analytical!($u, $dh, $(:s), $f); # 25.878 ms (14 allocations: 2.27 KiB)
@btime apply_analytical!($u, $mixed_dh, $(:s), $f); # 323.590 ms (76 allocations: 34.50 MiB)

lc = AffineConstraint(1, [2 => 5.0, 3 => 3.0], 1.0)
@btime add!(ch, $lc) setup=(ch = ConstraintHandler($dh)); # 11.596 ns (0 allocations: 0 bytes)
@btime add!(ch, $lc) setup=(ch = ConstraintHandler($mixed_dh)); # 11.177 ns (0 allocations: 0 bytes)

φ(x) = x - Vec{2}((1.0, 0.0))
face_mapping = collect_periodic_faces(grid, "left", "right", φ)
pdbc = PeriodicDirichlet(:v, face_mapping, [1, 2])
# Add the constraint to the constraint handler
@btime add!(ch, $pdbc) setup=(ch=ConstraintHandler($dh)); # 737.750 μs (4091 allocations: 1.24 MiB)
@btime add!(ch, $pdbc) setup=(ch=ConstraintHandler($mixed_dh)); # 735.458 μs (4091 allocations: 1.24 MiB)

The difference in apply_analytical seems to a significant part be caused by poor memory alignment and thus should mostly be fixed by fixing #631 .

kimauth · 2023-03-25T15:46:01Z

Renumbering

using Ferrite
using BenchmarkTools

grid = generate_grid(Quadrilateral, (100, 100))

dh = DofHandler(grid)
add!(dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(dh)

mixed_dh = MixedDofHandler(grid)
add!(mixed_dh, :v, 2, Lagrange{2,RefCube,2}()) # quadratic vector field
add!(mixed_dh, :s, 1, Lagrange{2,RefCube,1}()) # linear scalar field
close!(mixed_dh)

@btime renumber!($dh, $(ndofs(dh):-1:1)); # 596.500 μs (4 allocations: 22.56 KiB)
@btime renumber!($mixed_dh, $(ndofs(mixed_dh):-1:1)); # 598.459 μs (4 allocations: 22.56 KiB)

@btime renumber!($dh, $(DofOrder.FieldWise())); # 6.330 ms (94 allocations: 5.87 MiB)
@btime renumber!($mixed_dh, $(DofOrder.FieldWise())); # 6.067 ms (95 allocations: 5.87 MiB)

@btime renumber!($dh, $(DofOrder.ComponentWise())); # 6.016 ms (113 allocations: 4.33 MiB)
@btime renumber!($mixed_dh, $(DofOrder.ComponentWise())); # 5.875 ms (114 allocations: 4.33 MiB)

Edit: Updated with new benchmarks after #645.

This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates, and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the DofHandler merge and rework of FieldHandler/SubDofHandler.

This changes `FieldHandler.cellset` to be a sorted `OrderedSet` instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates, and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the `DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.

This changes `FieldHandler.cellset` to be a `BitSet` (which is sorted) instead of a `Set`. This ensures that loops over sub-domains are done in ascending cell order. Since e.g. cells, node coordinates and dofs are stored in ascending cell order this gives a significant performance boost to loops over sub-domains, i.e. assembly-style loops. In particular, this removes the performance gap between `MixedDofHandler` and `DofHandler` in the `create_sparsity_pattern` benchmark in #629. This is a minimal/initial step towards #625 that can be done before the `DofHandler` merge and rework of `FieldHandler`/`SubDofHandler`.

This patch uses `BitSet` in `apply_analytical!` and `reshape_to_nodes` for `MixedDofHandler`. The benefit here is twofold: computing the intersection is much faster (basically just bitwise `&`) and the subsequent looping over the cells are done in ascending cell order. This closes the performance gap between `MixedDofHandler` and `DofHandler` in benchmarks from #629 of `apply_analytical!`, `reshape_to_nodes`, and `vtk_point_data`. For example, here is the benchmark results for `apply_analytical!`: ``` 387.853 ms (72 allocations: 34.50 MiB) # MixedDofHandler master 55.262 ms (38 allocations: 553.45 KiB) # MixedDofHandler patch 41.861 ms (14 allocations: 2.27 KiB) # DofHandler master/patch ```

fredrikekre · 2023-03-31T15:29:36Z

After #660 I think this issue can be closed since the MixedDofHandler is now equally as performant (at least where it really matters). 🎉

This patch uses `BitSet` in `apply_analytical!` and `reshape_to_nodes` for `MixedDofHandler`. The benefit here is twofold: computing the intersection is much faster (basically just bitwise `&`) and the subsequent looping over the cells are done in ascending cell order. This closes the performance gap between `MixedDofHandler` and `DofHandler` in benchmarks from #629 of `apply_analytical!`, `reshape_to_nodes`, and `vtk_point_data`. For example, here is the benchmark results for `apply_analytical!`: ``` 387.853 ms (72 allocations: 34.50 MiB) # MixedDofHandler master 55.262 ms (38 allocations: 553.45 KiB) # MixedDofHandler patch 41.861 ms (14 allocations: 2.27 KiB) # DofHandler master/patch ```

termi-official mentioned this issue Mar 22, 2023

RFC: Relax FieldHandler #625

Closed

4 tasks

termi-official mentioned this issue Mar 22, 2023

Add local constraint coupling for MixedDofHandler without subdomain handling #630

Closed

2 tasks

termi-official mentioned this issue Mar 22, 2023

Dof distribution slow #632

Closed

fredrikekre added this to the 0.4.0 milestone Mar 23, 2023

fredrikekre mentioned this issue Mar 29, 2023

Use sorted OrderedSet in FieldHandler #654

Closed

fredrikekre mentioned this issue Mar 31, 2023

Use BitSet in apply_analytical! and reshape_to_nodes #660

Merged

fredrikekre closed this as completed Apr 1, 2023

kimauth mentioned this issue Apr 3, 2023

Merge DofHandler and MixedDofHandler #667

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking progress for merging the dofhandlers #629

Tracking progress for merging the dofhandlers #629

kimauth commented Mar 22, 2023 •

edited

Loading

kimauth commented Mar 22, 2023 •

edited

Loading

termi-official commented Mar 22, 2023 •

edited

Loading

termi-official commented Mar 22, 2023 •

edited

Loading

termi-official commented Mar 22, 2023

fredrikekre commented Mar 22, 2023

termi-official commented Mar 22, 2023

kimauth commented Mar 23, 2023 •

edited

Loading

kimauth commented Mar 23, 2023 •

edited

Loading

fredrikekre commented Mar 24, 2023 •

edited

Loading

termi-official commented Mar 24, 2023

kimauth commented Mar 25, 2023 •

edited

Loading

kimauth commented Mar 25, 2023

kimauth commented Mar 25, 2023

kimauth commented Mar 25, 2023 •

edited

Loading

fredrikekre commented Mar 31, 2023

Tracking progress for merging the dofhandlers #629

Tracking progress for merging the dofhandlers #629

Comments

kimauth commented Mar 22, 2023 • edited Loading

Progress tracking

Benchmarking

kimauth commented Mar 22, 2023 • edited Loading

termi-official commented Mar 22, 2023 • edited Loading

termi-official commented Mar 22, 2023 • edited Loading

termi-official commented Mar 22, 2023

fredrikekre commented Mar 22, 2023

termi-official commented Mar 22, 2023

kimauth commented Mar 23, 2023 • edited Loading

kimauth commented Mar 23, 2023 • edited Loading

fredrikekre commented Mar 24, 2023 • edited Loading

termi-official commented Mar 24, 2023

kimauth commented Mar 25, 2023 • edited Loading

kimauth commented Mar 25, 2023

kimauth commented Mar 25, 2023

kimauth commented Mar 25, 2023 • edited Loading

fredrikekre commented Mar 31, 2023

kimauth commented Mar 22, 2023 •

edited

Loading

kimauth commented Mar 22, 2023 •

edited

Loading

termi-official commented Mar 22, 2023 •

edited

Loading

termi-official commented Mar 22, 2023 •

edited

Loading

kimauth commented Mar 23, 2023 •

edited

Loading

kimauth commented Mar 23, 2023 •

edited

Loading

fredrikekre commented Mar 24, 2023 •

edited

Loading

kimauth commented Mar 25, 2023 •

edited

Loading

kimauth commented Mar 25, 2023 •

edited

Loading