OpenFOAM Parallel Numerical Linear Algebra Post
Goal
We provide background information allowing an informed discussion on the parallel performance of the basic solvers laplacianFOAM, scalarTransportFOAM and potentialFOAM. Most our discussion carries over to more complex solvers that adopts a segregated solution approach. The solver simpleFOAM is an example. The discussion on these more complex solvers is postponed to a later stage. We wish to study at which location in the code the global parallel communications occur. Our motivation is to identity ways to reduce the amount of global communications (by reducing the amount of iterations required to reach convergence or by replacing a convergence test by a predefined number of iterations) or to mitigate the effect of these global communications (by overlapping communication and computation, by using pipelined versions of Krylov methods or by replacing BiCG with modern versions such as IDR for instance).
We aim at comparing the performance of the basic solvers of OpenFOAM with a package that provides similar basic solver and that is build on a modern library for parallel scientific computations. An example of such a solver that provides similar capabilities as laplacianFOAM and scalarTransportFOAM is the first tutorial example provided with the libmesh library for instance.
We aim at obtaining the following
- for laplacianFOAM: a genuine algebraic multigrid with a fast (aggressive, inaccurate) coarsening scheme as a preconditioner for CG, possibly a pipelined CG implementation; need to look into what can be done for parallel computations; study the reuse of the precondition in case of a non-linear problem with (default) explicit treatment;
- for scalarTransportFOAM: a ILU preconditioner allowing some level of fill-in combined with an AMD reordering strategy as a preconditioner for GMRES. Possibly a pipeline GMRES implementation. Possibly using IDR instead; need to look into what can be done for parallel computations;
- for potentialFOAM: an implementation allowing to place the construction of the preconditioners outside the solver iteration. Possibly need to remove the construction of the preconditioner from PCG to the construction of the solver object. Possibly allowing an outer Krylov acceleration; need to look into what can be done for parallel computations;
- look into whether the arrangement of data in the vector and matrix classes comes with a penalty or not;
Our ultimate goal is to render OpenFOAM more versatile in solving large scale problems in parallel by providing more advanced preconditioned iterative solvers.
These notes are structured in the following parts:
1/15: Non-Overlapping Domain Decomposition, Partitioned Vectors and Partitioned Matrices: we discuss the partitioning of the computational domain into non-overlapping subdomains, the subsequent partitioning of vectors and matrices into subvectors and submatrices;
2/15: Pstream Communication Software Layer: we discuss the encapsulation of MPI into a Pstreams software layer that OpenFOAM adopts to perform the parallel communication between subdomains;
3/15: Parallel Mesh Distribution
4/15: Interprocess Interfaces:
5/15: Parallel Assembly of Matrix and Right-Hand Side Vector
6/15: Passing Matrix and Right-Hand Side Vector to Linear System Solver
7/15: Parallel Vector Inner Product and Norm using gSumProd and gSumMag: we discuss how parallel vectors are defined and how the norm and inner product of parallel decomposed vectors are computed;
8/15: The Templated Class LduMatrix and Non-Templated Class lduMatrix: we discuss how parallel matrices are defined;
9/15: Parallel Matrix-Vector product using lduMatrix member function Amul: we discuss how the parallel matrix-vector is defined;
10/15: Unpreconditioned CG algorithm: we discuss how the above is used in solving linear systems that arise in laplacianFOAM;
11/15: Unpreconditioned BiCGStab algorithm: we discuss how the above is used in solving linear systems that arise in scalarTransportFOAM;
12/15: GAMG Solver
13/15: Non-Linear Problems, Multiple Fields and Miscellaneous Issues
14/15: Parallel Write of Solution to file
15/15: References
1/15: Domain Decomposition, Partitioned Vectors and Partitioned Matrices
1/1/15: Non-Overlapping Domain decomposition
In a non-overlapping domain decomposition approach, the global computational mesh on domain is decomposed into nsub local submeshes on the non-overlapping subdomains. To this end OpenFOAM provides both a so-called simple decomposition algorithm as well as an interface to packages such as Zoltan and Metis. Each local submesh is typically owned by a different processor. A communication between subdomains will be required during computations. Assume that the mesh \Omega (polymesh in OpenFOAM) is decomposed into submeshes \Omega_i where 1 <= i <= nsubs. Below we will use subdomains and submeshes as synonyms.
Distinct subdomains \Omega_i and Omega_j share an interface \Gamma_{ij}. This interface consists of a set of intervals in case that the computational domain is 2D and of a set of polygons in the 3D case. OpenFOAM employs a cell-centered discretization. For i <> j, \Omega_i and \Omega_j fail to share cell centers.
OpenFOAM employs a cell-centered discretization in which discretization is be performed by a loop over faces in \Omega (more on discretization later in these notes) (how are faces on the boundary taken into account?). The loop over all faces can be decomposed into nsub loops over faces on each subdomain \Omega_i for 1 <= i <= nsubs. Let us here adopt a \Omega_i centered view. The subdomain \Omega_i contains a set of faces that lie either on the interior of \Omega_i, on the interface with \Omega_j or the boundary of the computational domain. By performing a loop over all faces in \Omega_i all these faces are included. We assume that in performing a loop over these faces all the geometrical information requires to construct the discrete operator (discrete diffusion and discrete convection-diffusion operators) is stored on processor i. Assume x_k to be a coordinate of a cell center adjacent to the interface \Gamma_{j} belonging to the neighboring subdomain \Omega_j. The requirement of availability of information during the discretization process can be met by storing the coordinates of all x_k nodes on the processor \Omega_i. This can be arranged in a pre-processing stage prior to computations. The discretization of a linear diffusion and convection-diffusion differential operator can thus be performed without *any* communication of information between subdomains.
OpenFOAM employs preconditioned Krylov subspace methods to solve the linear systems. The methods employ vector (BLAS Level 1) and matrix-vector (BLAS Level 2) operations. Required vector operations are the vector inner product, the vector norm and the vector updates. Required matrix operations are the matrix-vector multiplication with the coefficient matrix and the linear system solve with the preconditioner. To describe these operations in more detail, we will distinguish global and local communication between the subdomains.
A global communication is a communication that involves all processors charged with computation. An example is the computation of the inner product of two parallel distributed vectors. In this operation all processors computed the inner product of the sub-vectors that they own, and send the intermediate result to a single processor (typically processor 0 or there master node). The master process then sums up all the individual parts. In a vector update as in e.g. y += \alpha x, the master process sends the value for \alpha from one processor to all processors.
A local communication is a communication that involves a set of processors charged with computation. An example is the sparse parallel matrix-vector computation. In this computation information needs to be send from \Omega_i to its neighbors \Omega_j (or vice versa). Local communications exploit the interconnection between the subdomains via interfaces. This interconnection is coded by listing for each domain \Omega_i, the set of neighbors \Omega_j. OpenFOAM distinguishes between neighbors with index j < i and j> i, the so-called lower and upper neighbors, respectively. To further describe the local communication, let as before \Gamma_{ij} denote the interface between the subdomains \Omega_i and \Omega_j. Let x_k denotes the cell center of a cell adjacent to the interface \Gamma_{ij} lying outside of \Omega_i. Let X_{ij} denote all these cell centers. Assume u to be the scalar field solved for and assume U_ij to be the set of values of u evaluated in X_ij. The parallel matrix-vector multiplication involves the local communication of all the sets X_ij from \Omega_j to \Omega_i in a pre-processing stage (and of a similar set in the reverse direction from \Omega_i to all neighbors \Omega_j).
2/1/15: Parallel Partitioned Vectors
The domain decomposition results in the decomposition of a global vector (volScalarField in OpenFOAM) into local (per subdomain) subvectors. The cell-centered discretization approach implies that a global vector v is decomposed into local subvectors v_i for 1 <= i <= nsubs in such a way that the local subvectors v_i fail to share components. The global enumeration of the degrees of freedom is decomposed into nsub *disjoint* subsets. Each processor i owns the subvector v_i of the vector v.
3/1/15: Parallel Partitioned Matrices
The domain decomposition results in the decomposition of a global matrix A (lduMatrix in OpenFOAM) into local submatrices into submatrices A_{ij}, where 1 <= i,j <= nsubs. Thus matrix decomposed into submatrices A_{ij}, where 1 <= i,j <= nsubs. Each subdomain \Omega_i thus has a local matrix A_{ii} and coupling matrices A_{ij} for j <> i to the other subdomains. For i<j (i>j) he subblocks A_{ij} contain a strictly lower (upper) triangular part only. OpenFOAM partitions matrix according to its rows over the various processors (as PETSc does as well). By the above discussion and assuming linear operators, processor i is able to construct the subblocks A_{ij} for 1 <= j <= N of the matrix A *without* any parallel communication. Each processor i owns the subblocks A_{ij} for 1 <= j <= N during the course of simulations.
2/15: Pstream Communication Software Layer
A domain decomposition approach requires communication between the Nsub processors that each own a subdomain. An illustration of this process is given here https://www.comsol.com/blogs/using-the-domain-decomposition-solver-in-comsol-multiphysics. OpenFOAM encapsulated the parallel communication into a Pstreams set of functions. These functions form a layer of software between the Message Passing Interface (MPI, lower in hierarchy) and OpenFOAM (higher in hierarchy). The Pstreams layer has at least two functions. The first function is to hide details of MPI from the OpenFOAM programmer. The second is to allow the interchangeable use of various implementation of MPI. Pstreams defines the namespace PstreamGlobals that encapsulates the MPI communicator among other variables. It defines the templated class AllReduce. The use of this class will be illustrated in the next section on vector operations. Pstreams also provides functionality for the parallel input (IStreams) and output (OStreams) for data. The Pstream set of functions is defined here: https://github.com/OpenFOAM/OpenFOAM-dev/tree/master/src/Pstream/mpi
For example, if you were conducting experiments to compare several MPI libraries, such as Open-MPI 1.4.3 and 1.5.3 vs MPICH2 1.4.1, you would then have the following folders in "$FOAM_LIBBIN":
- dummy
- openmpi-1.4.3
- openmpi-1.5.3
- mpich2-1.4.1
Inside each one you will then have a dedicated Pstream library, along with other MPI dependent libraries.
Further reading
- Sequential case: loop over matrix elements via loop over faces. Parallel case: use of boundaryCoeffs and internallCoeffs on processor patches to handle parallelism: what is a patch of a boundary mesh: https://www.cfd-online.com/Forums/openfoam-programming-development/216676-ldumatrix-building-indexing-parallel.html#post732053
- mesh.cellCells() gets neighbour cell list of cell given at input. No send/receive across processors is being used. : https://www.cfd-online.com/Forums/openfoam-programming-development/101429-accessing-neighbour-cells-parallel-computing.html
- example of send and receive across processors: https://www.cfd-online.com/Forums/openfoam-programming-development/139210-how-do-communication-across-processors.html
- another example of send and receive across processors: https://www.cfd-online.com/Forums/openfoam-programming-development/77465-combining-lists-distributed-amongst-processors.html
3/15: Parallel Mesh Distribution and Local-to-Global Numbering
Describe how mesh is distributed and written to file per processor. Interfaces are implemented at the level of the mesh. fvmesh inherits its interfaces from lduMesh (need more details here). We imagine that in the lduMesh class hold for each processor i a set of interfaces \Gamma_{ij}. The number of interfaces is equal to the number of processors j that share an inter-processor boundary with processor i. The interface \Gamma_{ij} holds the coordinates of the cell center (and global cell index?) of the cells on processor j that share a face processor i.
1/3/15: Local-to-Global Numbering
A methodology to recover the local-to-global numbering of the cells that uses a field with the global cell IDs is described here https://www.cfd-online.com/Forums/openfoam-programming-development/212702-keeping-global-ldu-addressing-using-decomposepar.html
An extension of the above to boundaries, cells, faces and points is described here https://www.cfd-online.com/Forums/openfoam-programming-development/161366-global-index-cells-facess-parallel-computation.html#post571164
2/3/15: Further reading
Get local mesh on processor and gather the meshes into PtrList on master processor.
- source: https://www.cfd-online.com/Forums/openfoam-programming-development/130762-read-parallel-meshes.html
4/15: Interprocess Interfaces
Interprocessor interface are required on both the level of the mesh and on the level of the matrix. They play a role in both the discretization and the linear solve process
LduInterface: list here inheritance diagram, private and public member functions and operations on these member functions;
LduInterfaceField: list here inheritance diagram, private and public member functions and operations on these member functions; an interfacefield and an virtual update matrix function; more;
Two functions InitMatrixInterfaces and UpdateMatrixInterfaces (https://www.openfoam.com/documentation/guides/latest/api/lduMatrixUpdateMatrixInterfaces_8C_source.html)
1/4/15: Further reading
- https://www.cfd-online.com/Forums/openfoam-programming-development/212702-keeping-global-ldu-addressing-using-decomposepar.html
- patchNeighbourField operators and how to transfer data between processes. I suggest you to study the class of fvPatchField https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/finiteVolume/fields/fvPatchFields/fvPatchField/fvPatchField.H and coupledFvPatchField because they provide the information needed to assemble matrices
- https://www.cfd-online.com/Forums/openfoam/92046-boundary-conditions-fvmatrix-multiplication-make-me-despair.html
- interesting small example: https://www.cfd-online.com/Forums/openfoam/81200-fvmatrix-internalcoeffs_-boundarycoeffs_.html
- https://www.cfd-online.com/Forums/openfoam/90148-coupleboucoeffs-coupleintcoeffs-interfaces-oh-my.html
5/15: Parallel Assembly of Matrix and Right-Hand Side Vector
Describe the discretization of Laplacian, divergence and gradient. Describe explicit and implicit discretization of the source term.
1/5/15: Data structure
The LDU storage format facilitates the cell-centered finite volume discretization process in OpenFOAM. Indeed, all entries of D can be constructed by a single loop over the cells. All entries of L and U can be constructed by a single loop over the faces. The LDU matrix format holds a reference to the mesh. The required memory for L, D and U can thus easily be allocated.
2/5/15: Orthogonal Mesh
This is well explained in Moukalled-Darwish for single processor. The Laplacian is constructed using a loop over faces owned by processor to construct lower and upper diagonal entries. Use of Halo to be able to treat faces on the processor boundaries. Use of NegSumDiag to construct diagonal entries. For multiple processors, a face can either be a physical boundary or an inter-processor boundary. For an inter-processor boundary the coordinates of the neighboring cells are stored in the interprocessor interface. For multiple processors, more details on how the interprocess interfaces are taken into account needs to be provided.
3/5/15: Non-Orthogonal Corrections
Describe how above is modified to take non-orthogonal corrections into account
4/5/15: Visualizing Matrix and Right-Hand Side vector
After discretization, the coefficient matrix and right-hand vector can be visualized. Various tools exist.
- sequential code only https://www.cfd-online.com/Forums/openfoam/167028-extracting-matrix-data-fvvectormatrix.html
- slide 5 of https://courses.physics.ucsd.edu/2009/Spring/physics142/Labs/Lab8/Tutorial2.pdf
6/15: Passing Matrix and Right-Hand Side Vector to Linear System Solver
Describe hierarchy of 4 classes: fvScalarMatrix, lduMatrix, PCG and preconditioner (builds on lduMatrix)
7/15: Parallel Vector Inner Product and Norm using gSumProd and gSumMag
In this section we discuss the parallel implementation of operations on subdomain decomposed vectors.
A vector in OpenFOAM is defined in terms of a a field. Details on fields are here: https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/fields/Fields/Field/Field.H
Assume that v and w are two global vectors (fields in OpenFOAM) decomposed into subvectors v_i and w_i for 1 <= i <= nsubs. Then inner product of v and w and the norm of v can be computed by the following two-step procedure
1/7/15: compute the inner product of v_i and w_i or the norm of v_i on each processor i locally
2/7/15: communicate (using MPI_Send) the local results to a single processor and sum up all contributions.
This second step in implemented in a AllReduce operation.
The fact that the decomposition of vectors is non-overlapping (disjoint sets of components) is such that the process above can be carried out without attention for interface. The above procedure simplifies to the usual inner product and norm in the case of a single processor computation.
The implementation of gSumProd and gSumMag can be found here: https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/fields/Fields/Field/FieldFunctions.H
8/15: The Templated Class LduMatrix and Non-Templated Class lduMatrix
In this section we discuss how the sparse matrix is stored in OpenFOAM.
The OpenFOAM lduMatrix is a C++ object, i.e., a bag of data (a class) on which a set of operations (member functions) can be performed. The object encapsulates more than merely the matrix entries (more details on this will follow). This object is encapsulated in a later object fvScalarMatrix.
The lduMatrix stores the matrix A as A = D+L+U. The size of A is ncells-by-ncells, independent of how the boundary are taken into consideration (either explicitly or implicitly). D is a vector of scalars of length nCells. This vector contains the diagonal entries of A. L (and U) contains the strictly lower (and upper) contain triangular part of A. L and U are stored by means of one label vector and one vector of scalars of size nFaces. For L (and U) these vectors are called lPtr and lowerPtr (uPtr and upperPtr). The vector lPtr and uPtr code the face-to-cell connectivity. That is, for each face k, lPtr[k] (owner) and uPtr[k] (neighbour) contains the index of the cell adjacent to either side of the face k. By convention we have that lPtr[k] < uPtr[k]. For each face k, lowerPtr[k] and upperPtr[k] contain a lower and upper diagonal entry of the matrix A. By construction, A is structurally symmetric. This construction limits the approximation of the flux across a face to neighbouring cells. The lduMatrix format has been documented in e.g. [Culpo, Moukalled e.a.].
OpenFOAM solve function can, depending on the class being used, a virtual function: input is the discretized problem as matrix and rhs-vector; output is the solver performance;
We question whether the OpenFOAM design choice is optimal from a (parallel) numerical linear algebra point of view. This question motivates this post.
LduMatrix templated class
The LduMatrix templated class is defined here: https://github.com/OpenFOAM/OpenFOAM-dev/blob/master/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduMatrix.H and implemented in the corresponding *.C file.
In line 85 to 105 of the H-file, one reads that the LduMatrix class defines eight member data, namely
- lduMesh: the underlining mesh. We assume here that the per processor part of the mesh is referred not. We are not clear how this reference can be useful (possibly in conjunction with parallel processing). We fail to provide examples of this use;
- diagPtr: the diagonal of the matrix as a pointer (vector) of type Field<DType> (D stands for type here);
- upperPtr and lowerPtr: the lower and upper diagonal part on the matrix as pointers (vectors) of type Field<LUType>;
- sourcePtr: the right-hand side of the linear system as a pointer (vector) of type Field<Type>;
- interfaces: field interfaces of type LduInterfaceFieldPtrsList<Type>; this is a list of LduInterfaceField (an interface for each processor boundary) defined in the separate header file https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduInterfaceField/LduInterfaceFieldPtrsList.H : defines a list of LduinterfaceFields that in turn is defined in https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduInterfaceField/LduInterfaceField.H : these are used to communicate values of the field accross processor boundaries.
- interfacesUpper and interfacesLower: matrix off-diagonal coefficients for interfaces (to be further documented and clarified);
The nCells-by-nCells linear system in OpenFOAM is thus written as (D+L+U) psi = sourcePtr.
lduMatrix class
The ludMatrix (non-templated) class is defined here: https://github.com/OpenFOAM/OpenFOAM-dev/blob/master/src/OpenFOAM/matrices/LduMatrix/LduMatrix/lduMatrices.H
In line 31 to 35 of this H-file, one reads five applications of this broader (templated class) definition. One reads five definitions of the lduMatrix class depending on whether the sourcePtr and interface is of type scalar, vector, etc.
Further reading:
- lduMatrix Class Reference: OpenFOAM API Guide: https://www.openfoam.com/documentation/guides/latest/api/classFoam_1_1lduMatrix.html#a7a2e0a380ac70181d4d957ffb1d5caf1
9/15: Matrix-vector product using lduMatrix member function Amul
In this section we discuss the parallel matrix-vector multiplication. OpenFOAM follows a general procedure for the parallel matrix-vector multiplication that is described e.g. at the top of the page 2 of [Wolf]. OpenFOAM implements the matrix vector multiplication in a member function of the LduMatrix class called Amul [Culpo].
The definition of the Amul as a member function of LduMatrix.H is in line 601 of https://github.com/OpenFOAM/OpenFOAM-dev/blob/master/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduMatrix.H Its implementation in given in line 62 to line 113 of https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduMatrixATmul.C The implementation shows three steps, namely (1/3) the initialization of the processor interfaces, (2/3) the actual computation and (3/3) the update of the process interfaces. In the following we discuss each of these steps in more details.
1/9/15: Initialization of process interfaces
Why is interface initiziallized with psi and Apsi?
This is step 1 in [Wolf]. The function initMatrixInterfaces is called with three arguments, namely interfacesUpper (what is this?, why only upper?), psi (the input for the matrix-vector multiplication) and Apsi (the output of the matrix-vector multiplication). If called on the subdomain \Omega_i, the function updates all the interface \Gamma_{ij} for all neighbours j. This involves a local communications from \Omega_j (sender) to \Omega_i (receiver) for each of the \Omega_j’s connected to \Omega_i. The function initMatrixInterfaces indeed performs a loop over all interfaces as described in line 45 to 58 of https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/LduMatrix/LduMatrix/LduMatrixUpdateMatrixInterfaces.C The init of a single interface is performed using the member function initInterfaceMatrixUpdate of the class lduInterfaceField. What happens precisely remains ambiguous to some extend: who sends/receives how much data of which type? An implementation of an update of a single interface however can be found here: https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/finiteVolume/fields/fvPatchFields/constraint/processor/processorFvPatchScalarField.C
2/9/15: Actual computation
This is step 2 in [Wolf]. The rows of the matrix A stored on subdomain \Omega_i are multiplied with psi to obtain Apsi. Computations are performed in two loops. The first loop is a loop over cell-centers and takes care of the diagonal entries of A stored on \Omega_i. The second loop is a loop over faces and takes care of the lower and upper diagonal entries of A stored on \Omega_i. The number of cells and faces in the loop is local to \Omega_i. This is how a serial (single processor) code would look like.
3/9/15: Update of the process interfaces
Why is interface updated with psi and Apsi?
This is the combination of step 3 and 4 in [Wolf]. The function updateMatrixInterfaces is called with the same three arguments at initMatrixInterfaces. Similar to initMatrixInterfaces. It is clear from the implementation in https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/finiteVolume/fields/fvPatchFields/constraint/processor/processorFvPatchScalarField.C that sums are being computed as described in [Wolf].
Observe that the Tmul member function calls initMatrixInterfaces and updateMatrixInterfaces (not yet clear why).
10/15: Unpreconditioned CG algorithm
The above ingredients are used in implementation of the unpreconditioned CG method. This method is the Krylov method of choice to solve the symmetric positive definite linear systems that arise in laplacianFOAM. The unpreconditioned variant requires two inner products in case that no convergence test is used. In case that a convergence test is used, the norm of the residual vector is computed. In that case, three global communication step per iteration need to be performed. The source code for the CG method is here https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/lduMatrix/solvers/PCG/PCG.H . We will discuss the incorporation of the preconditioner (i.e. solver the linear system M z = r at each CG iteration) at a later stage.
11/15: Unpreconditioned BiCGStab algorithm
The above ingredients are also used in the implementation of the BiCG and BiCGSTAB Krylov methods. These methods allow to solve the non-symmetric linear systems that arise in scalarTransportFOAM. This method requires global communication in the vector inner products. The source code for the BiCGSTAB method is here https://github.com/OpenFOAM/OpenFOAM-dev/blob/8ca408bf6c83999c71db180f5b2f306472f20b5c/src/OpenFOAM/matrices/lduMatrix/solvers/PBiCGStab/PBiCGStab.H .
12/15: Preconditioners
Preconditioners are called from
1/12/15: Diagonal preconditioner
2/12/15: Smoother preconditioner: can ordering of the smoother be specified? At least this paper mentions a symmetric lower-upper Gauss-Seidel algorithm https://arc.aiaa.org/doi/abs/10.2514/6.2018-1955
3/12/15: GAMG Solver preconditioner
See [solution-algoroithm-control].
Set-up phase: agglomeration based using MGridGenGamgAgglomeration (https://glaros.dtc.umn.edu/gkhome/mgridgen/overview). Speed of agglomeration (mergeLevels) and number of cells on the coarsest level (nCellsInCoarsestLevel) can be controlled.
Solve phase: details given on the smoother are cryptic. Details on the intergrid transfer operator seem to be missing
Smoothing is specified by the smoother as described in section 4.5.1.3. The number of sweeps used by the smoother at different levels of mesh density are specified by the following optional entries.
- nPreSweeps: number of sweeps as the algorithm is coarsening (default 0).
- preSweepsLevelMultiplier: multiplier for the the number of sweeps between each coarsening level (default 1).
- maxPreSweeps: maximum number of sweeps as the algorithm is coarsening (default 4).
- nPostSweeps: number of sweeps as the algorithm is refining (default 2).
- postSweepsLevelMultiplier: multiplier for the the number of sweeps between each refinement level (default 1).
- maxPostSweeps: maximum number of sweeps as the algorithm is refining (default 4).
- nFinestSweeps: number of sweeps at finest level (default 2).
4/12/15: GAMG Preconditioner: Describe how GAMG is used as a preconditioner instead of a solver.
13/15: Non-Linear Problems, Multiple Fields and Miscellaneous Issues
- information on how the time-stepping is taken into account;
- information on how fields not solved for are passed from one iteration to the next in a segregated solver
14/15: Parallel Write of Solution to file
How is the parallel solution written to multiple files per processor or a single file.
15/15: References
- M. Culpo, “Current Bottlenecks in the Scalability of OpenFOAM on Massively Parallel Computers”, CINECA technical report available at www.prace-ri.eu : this reference provides examples and listings that illustrate the above discussion;
- J.H. Ferziger and M. Peric, “Computational Methods in Computational Fluid Mechanics”, Third Edition, Springer, 2002: especially Section 11.5 entitled “Parallel Computing in CFD” is relevant for the above discussion;
- Moukalled, Manga and Darwish, The Finite Volume Method in Computational Fluid Dynamics, Springer 2015 , https://www.springer.com/gp/book/9783319168739
- M. Wolf, E. Boman and B. Hendrikson, “Optimizing Parallel Sparse Matrix-Vector Multiplication by Corner Partitioning”, https://www.sandia.gov/~egboman/papers/PARA08.pdf
- https://www.openfoam.com/documentation/cpp-guide/html/index.html
- https://cfd.direct/openfoam/user-guide/v6-fvsolution/
- S. Bnà, I. Spisso, M. Olesen, G. Rossi PETSc4FOAM: A Library to plug-in PETSc into the OpenFOAM Framework https://prace-ri.eu/wp-content/uploads/WP294-PETSc4FOAM-A-Library-to-plug-in-PETSc-into-the-OpenFOAM-Framework.pdf
Aerospace Engineer at the Liquid Propulsion Laboratory of the Institute of Aeronautics and Space (IAE)
1 年Guilherme Araujo Lima da Silva Jayme Teixeira
部门经理
3 年really wonderful job! i am following you!
部门经理
3 年wonderful!!!!!!!!!
部门经理
4 年brilliant job!!
Great job and achievements to optimize OF and make it more accessable to public. I belief it has the same implications similar to Linux OS for PC's back then. I'm glad to get the chance to withness the early development and application in industrial simulations at Domenico's team. Keep up the good work!