-
Notifications
You must be signed in to change notification settings - Fork 299
Description
The 2D and 3D shape computations are massive switch statements on the element type and the order.
In that form it likely does not make sense to inline them for (potential) perf gainz
But we could switch (or duplicate during a transition period at least) the signature from
template <FEFamily T>
Real fe_lagrange_2D_shape(const ElemType type,
const Elem * elem,
const Order order,
const unsigned int i,
const Point & p)
to
template <FEFamily T, Order order, ElemType type>
inline
Real fe_2D_shape(const Elem * elem,
const unsigned int i,
const Point & p)
and get rid of most the switch statements. I don't think it would be much more code overall.
If we templated on i as well, we could get rid of all the switches. Potentially cool for vectorizing and GPUs
The potential gains downstream in MOOSE are about 1-2% in 2D (for now, other things are going down so 1-2% might go up). Possibly more in 3D, and potentially less with higher order
flat flat% sum% cum cum%
9.12s 22.57% 22.57% 9.35s 23.14% tcmalloc::CentralFreeList::Populate
1.87s 4.63% 27.20% 2.24s 5.54% MatSetValues_SeqAIJ
1.69s 4.18% 31.38% 39.78s 98.44% [libsasl2.2.dylib]
1.61s 3.98% 35.36% 2.56s 6.34% MooseMesh::cacheInfo
1.42s 3.51% 38.88% 2.75s 6.81% libMesh::BoundaryInfo::boundary_ids
1.07s 2.65% 41.52% 1.07s 2.65% libMesh::fe_lagrange_1D_linear_shape
0.99s 2.45% 43.97% 0.99s 2.45% hypre_BoomerAMGRelaxHybridGaussSeidel_core
0.91s 2.25% 46.23% 0.97s 2.40% libMesh::Elem::which_child_am_i
0.85s 2.10% 48.33% 1.48s 3.66% MooseVariableData::computeValuesInternal
0.61s 1.51% 49.84% 0.61s 1.51% libMesh::FEMap::compute_single_point_map
0.60s 1.48% 51.32% 0.60s 1.48% libMesh::H1FETransformation::map_dphi
0.55s 1.36% 52.68% 0.60s 1.48% libMesh::Elem::contains_vertex_of
0.54s 1.34% 54.02% 1.61s 3.98% (anonymous namespace)::fe_lagrange_2D_shape
0.54s 1.34% 55.36% 0.99s 2.45% Kernel::computeResidual
0.47s 1.16% 56.52% 0.47s 1.16% libMesh::Face::dim
0.46s 1.14% 57.66% 0.46s 1.14% MooseMesh::getNodeBlockIds
0.40s 0.99% 58.65% 1.14s 2.82% libMesh::FEMap::compute_affine_map
3.98% = 2.65% from 1D shape calc and 1.34% from the 2D, which is the switch statement (there's nothing else there)