-
Notifications
You must be signed in to change notification settings - Fork 20
/
TODO
104 lines (90 loc) · 2.71 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
-- Immediate --
* Rank items in this list according to importance
Debug images >= 3
Remove nspawn
.h to .cxx
Ewald on multipoles
Search source files for unnecessary functions
Search TODO list and extract priority tasks
-- later --
Add back time regression
Fix parallel Biot-Savart
Fix nightly
Test debug option
Remove 1e-5 from bound_box
- PetIGA no macro
- Helmholtz breaks for very low P
- Helmholtz breaks for -D
- With strumpack
- Use EXAFMM_CLUSTER in md_distributed
- Add kernel.cxx vec.cxx hss/laplace.cxx hss/helmholtz.cxx to nightly test
- Full simdList, threadList, configList in nightly.py
- Remove make cleanall from build
- Extend list based traversal to adaptive tree (UVWX-list)
-- MPI --
- LBT MPI <- ICELL is being calculated from localBounds
MPI-3.0 lock all
cycle counter based weights
2:1 refinement for precomputation
non-orthogonal recursive bisection
-- automake --
- Add m4/ax_mpi variants from PVFMM
Use docker to create virtual environment for buildbot
Representative case plot generator
-- types --
- logger.h split to print.h
- __attribute__((always_inline))
bodies -> bodies + fields, bodyPos -> bodies, bodyAcc -> fields
AoS, SoA union by Strzodka
Use compressed Cell struct of Bonsai
Remove M, L from Cell struct
Calculate EPS from test
-- tree build --
- Separate key manipulation namespace, e.g. interleaveMorton(), deinterleaveHilbert() (controllable key_t)
- Morton, Hilbert in build_tree folder
Separate different algorithms into different files (create folder for build tree, partition, etc.)
-- kernels --
- Calculate Flops, Cycle counter kernel timer
- Separate LaplaceCartesianRecursion, LaplaceCartesianTemplate
- Precomputation of translation matrix (import from PVFMM)
- Hack vecmathlib and import essential features
- Add back Stokes kernels
- Theoretical error bound using kernel.cxx
Use getIndex for NO_P2P
Kahan + fixed precision
Solid harmonics kernel
FX10 sin, cos, exp intrinsic
Use kvec3 in Biot-Savart
-- driver --
- System benchmarking, HPCC, SPEC, for CPU, GPU, MPI performance
- TBB -> OpenMP with atomics -> flush
- OMP_PLACES {threads,cores,sockets}
MemAxes: Visualizing Memory Traffic
Periodic B.C. by one precomputed translation matrix
Teng's BH MAC with M2P option during DTT
charmm2: remove repartition inside Ewald & VdW
cutoff based traversal for md_distributed
-- GPU --
CUDA 6.0 debug
Unify dataset
.h -> .cxx/.cu
MPI Bonsai
Ewald, VdW
Zero softening
-- comparisons --
DTT vs. UVWX-list
Separate +- tree vs. Single tree
Cartesian vs. Spherical vs. Planewave
ORB vs. HOT
OpenMP vs. TBB vs. Cilk vs. MThreads
Geometric vs. Algebraic Mat-Vec
-- things that were removed --
TBB, Cilk, MassiveThreads
GPU
2d
Rmax
Ropt
Mass
-- documentation --
- JOSS paper
ipython notebook tutorial (from Andreas)