Abstract:
This work describes a method for solving large, positive-defined, block-structured, dense linear systems with multiple right-hand sides that uses high-performance parallel computing. The system solution is obtained through a generalized Levinson recursion that uses the linear combination of smaller forward and backward solutions associated with lower order subsystems. The new implementation is described for parallel computing and is based on a partitioned matrix algorithm. The algorithm was separated into two subroutines, the first that computes the backward solution and the error energy matrix for smaller orders, and the second that computes the solution recursively. The algorithm was implemented for three types of systems: shared memory systems, distributed memory systems, and GPU systems. In each case, the lowest order systems were calculated using appropriate libraries. In the first, the OpenBLAS or MKL library was used; in the second, SCALAPACK; and finally, for systems with GPUs, we implemented an OUT-OF-CORE algorithm, in which the lowest order systems were calculated using MAGMA. In all three cases, the final solution is compared with the complete system solution using LAPACK, SCALAPACK, and MAGMA, respectively. In all three cases, the first part of the algorithm proved to be more computationally expensive compared to the Cholesky decomposition. However, the second part that computes the solution proved to be more efficient than the successive solution of two triangular systems when the right side of the system has a significant size, generally a few times the value of N. The error in the estimated model does not present significant variations compared to the reference solution. Finally, we present the use of the algorithm in frequency-domain seismic wave modeling, which involves the solution of large, sparse linear systems. These results show a disadvantage of the algorithm in sparse non-Toeplitz systems, as it increases the computational cost and memory consumption.