-
Notifications
You must be signed in to change notification settings - Fork 8
/
improvement.txt
264 lines (193 loc) · 12.2 KB
/
improvement.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
Summary of improvements since LAPACK 3.0
========================================
1. Improvements in LAPACK 3.1.0
------------------------------
See link:lapack-3.1.0.changes[Release Notes for 3.1.0] for more details
1. *Hessenberg QR algorithm with the small bulge multi-shift QR algorithm
together with aggressive early deflation.* This is an implementation of the
2003 SIAM SIAG LA Prize winning algorithm of Braman, Byers and Mathias,
that significantly speeds up the nonsymmetric eigenproblem.
2. *Improvements of the Hessenberg reduction subroutines.* These accelerate the
first phase of the nonsymmetric eigenvalue problem.
3. *New MRRR eigenvalue algorithms that also support subset computations.*
These implementations of the 2006 SIAM SIAG LA Prize winning algorithm of
Dhillon and Parlett are also significantly more accurate than the version
in LAPACK 3.0.
4. *Mixed precision iterative refinement subroutines for exploiting fast single
precision hardware.* On platforms like the Cell processor that do single
precision much faster than double, linear systems can be solved many times
faster. Even on commodity processors there is a factor of 2 in speed
between single and double precision. These are prototype routines in the
sense that their interfaces might changed based on user feedback.
5. *New partial column norm updating strategy for QR factorization with column
pivoting*. This fixes a subtle numerical bug dating back to LINPACK that can
give completely wrong results.
6. *Thread safety:* Removed all the SAVE and DATA statements (or provided
alternate routines without those statements), increasing reliability
on SMPs.
7. *Additional support for matrices with NaN/subnormal elements, optimization
of the balancing subroutines, improving reliability.*
2. Improvements in LAPACK 3.1.1
--------------------------------
See link:lapack-3.1.1.changes[Release Notes for 3.1.1] for more details
1. *Add BLAS routines so that the BLAS provided is complete*
2. *Provide 5 flavours of SECOND and DSECND and ddd the TIMER variable in the make.inc to chose which routine to use*
2. Improvements in LAPACK 3.2.0
--------------------------------
See link:lapack-3.2.html[Release Notes for 3.2] for more details
1. *Extra Precise Iterative Refinement:* New linear solvers that
``guarantee'' fully accurate answers (or give a warning that the
answer cannot be trusted). The matrix types supported in this
release are: GE (general), SY (symmetric), PO (positive definite),
HE (Hermitian), and GB (general band) in all the relevant
precisions.
2. *XBLAS, or portable ``extra precise BLAS'':* Our new linear
solvers in (1) depend on these to perform iterative refinement.
The XBLAS will be released in a separate
package.
3. *Non-Negative Diagonals from Householder QR:* The QR
factorization routines now guarantee that the diagonal is both real
and non-negative. Factoring a uniformly random matrix now correctly
generates an orthogonal Q from the Haar distribution.
4. *High Performance QR and Householder Reflections on Low-Profile
Matrices:* The auxiliary routines to apply Householder reflections
(e.g. DLARFB) automatically reduce the cost of QR from O(n^3^) to
O(n^2^+nb^2^) for matrices stored in a dense format for band
matrices with bandwidth b with no user interface changes. Other
users of these routines can see similar benefits in particular on
``narrow profile'' matrices.
5. *New fast and accurate Jacobi SVD:* High accuracy SVD routine for
dense matrices, which can compute tiny singular values to many more
correct digits than xGESVD when the matrix has columns differing
widely in norm, and usually runs faster than xGESVD too.
6. *Routines for Rectangular Full Packed format:* The RFP format
(SF, HF, PF, TF) enables efficient routines with optimal storage for
symmetric, Hermitian or triangular matrices. Since these routines
utilize the Level 3 BLAS, they are generally much more efficient
than the existing packed storage routines (SP, HP, PP, TP).
7. *Pivoted Cholesky:* The Cholesky factorization with diagonal
pivoting for symmetric positive semi-definite matrices. Pivoting is
required for reliable rank detection.
8. *Mixed precision iterative refinement* routines for exploiting
fast single precision hardware: On platforms like the Cell processor
that do single precision much faster than double, linear systems can
be solved many times faster. Even on commodity processors there is a
factor of 2 in speed between single and double precision. The
matrix types supported in this release are: GE (general), PO
(positive definite).
9. *Some new variants added for the one sided factorization:* LU
gets right-looking, left-looking, Crout and Recursive), QR gets
right-looking and left-looking, Cholesky gets left-looking,
right-looking and top-looking. Depending on the computer
architecture (or speed of the underlying BLAS), one of these
variants may be faster than the original LAPACK implementation.
10. *More robust DQDS algorithm:* Fixed some rare convergence
failures for the bidiagonal DQDS SVD routine.
11. *Better documentation for the multishift Hessenberg QR algorithm
with early aggressive deflation, and various improvements of the
code.*
3. Improvements in LAPACK 3.2.1
--------------------------------
See link:lapack-3.2.1.html[Release Notes for 3.2.1] for more details
1. *change xLARRD (MRRR routines) to deal with some matrices from Godunov*
4. Improvements in LAPACK 3.3.0
--------------------------------
See link:lapack-3.3.0.html[Release Notes for 3.3.0] for more details
1. *Standard C language APIs for LAPACK:* The standard interfaces include
support for Fortran and C data formats (column-major and row-major) to ease
interoperability with, and migration of, Fortran code. Full LAPACK
functionality is now accessible in a C-friendly manner. These new interfaces
are a key feature of the new LAPACK 3.3 release and the just released Intel
MKL 10.3 product. PLASMA 2.3 relies on it.
2. *Computing the complete CS decomposition:* Compute the complete CS
decomposition of a partitioned unitary matrix. The algorithm is designed
for numerical stability. See: Brian D. Sutton. Computing the complete CS
decomposition. Numer. Algorithms. 50 (2009), no. 1, 33–65.
3. *Level-3 BLAS symmetric indefinite solve and symmetric indefinite inversion*:
This enables better performance for xSYTRF and xSYTRI.
4. *Thread safe xLAMCH*: SLAMCH and DLAMCH were the only two routines not thread-safe
in LAPACK-3.2. This is fixed: all routines in LAPACK-3.3 are now thread-safe.
4. Improvements in LAPACK 3.3.1
--------------------------------
See link:lapack-3.3.1.html[Release Notes for 3.3.1] for more details
*Improve the CMAKE build system.*
* Added a FindBLAS module to easily search for and link against various optimized BLAS implementations
* Added proper default compiler flags for various commercial Fortran compilers
* Check for and disable SIGFPE handlers (LAPACK code handles these internally) on known compilers
* Remove testing drivers and BLAS source from nightly code coverage to get a more accurate representation of the LAPACK code getting exercised
* Suppress harmless warning messages for various platforms and compilers
* Added nightly builds (http://my.cdash.org/index.php?project=LAPACK)
* Standardized binary output locations to play nicely when building Windows DLLs
* Cleaned up CMake packaging to allow find_package(LAPACK) in applications
* Added missing link dependencies for test libraries
5. Improvements in LAPACK 3.4.0
-------------------------------
See link:lapack-3.4.0.html[Release Notes for 3.4.0] for more details
1. *xGEQRT: QR factorization (improved interface).* Contribution by Rodney
James (UC Denver). xGEQRT is analogous to xGEQRF with a modified interface which
enables better performance when the blocked reflectors need to be reused. The
companion subroutines xGEMQRT apply the reflectors.
2. *xGEQRT3: recursive QR factorization.* Contribution by Rodney James (UC
Denver). The recursive QR factorization enables cache-oblivious and enables high
performance on tall and skinny matrices.
3. *xTPQRT: Communication-Avoiding QR sequential kernels.* Contribution by
Rodney James (UC Denver). These subroutines are useful for updating a QR
factorization and are used in sequential and parallel Communication Avoiding
QR. These subroutines support the general case Triangle on top of Pentagon
which includes as special cases so-called Triangle on top of Triangle and
Triangle on top of Square. This is the right-looking version of the subroutines
and the subroutines are blocked. The T matrices and the block size are part of
the interface. The companion subroutines xTPMQRT apply the reflectors.
4. *CMAKE build system.* We are striving to help our users install our
libraries seamlessly on their machines. The CMAKE team contributed to our
effort to port LAPACK and ScaLAPACK under the CMAKE build system. Building
under Windows has never been easier. This also allows us to release dll for
Windows, so users no longer need a Fortran compiler to use LAPACK under
Windows.
5. *Doxygen documentation.* LAPACK routine documentation has never been more
accessible. See http://www.netlib.org/lapack/explore-html/.
6. *New website allowing for easier navigation.*
7. *LAPACKE - Standard C language APIs for LAPACK.* Since LAPACK 3.3.0, LAPACK
includes new C interfaces. With the LAPACK 3.4.0 release, LAPACKE is directly
integrated within the LAPACK library and has been enriched by the full set of
LAPACK subroutines. See link:lapacke.html[here].
6. Improvements in LAPACK 3.4.1
-------------------------------
See link:lapack-3.4.1.html[Release Notes for 3.4.1] for more details
1. *BLAS test codes now using F90 machine epsilon, optimization now works*
2. *Fix Thread safety issue.* xLARFT routines modified so that V in now [in] only instead of [in/out], now thread safe
3. *LAPACK test suite modified so that all tests will pass with gfortran (and ifort with -O0, and hopefully others)*
7. Improvements in LAPACK 3.4.2
-------------------------------
See link:lapack-3.4.2.html[Release Notes for 3.4.2] for more details
* *Modified all of the matrix norm routines*. Now, if a matrix contains a NaN, a NaN will be returned for all choices of norm.
Previously, if a matrix contained a NaN, the norm routines would not return necessarily a NaN. (They would return regular floating-point numbers in some cases.)
// vim: set syntax=asciidoc:
8. Improvements in LAPACK 3.5.0
-------------------------------
See link:lapack-3.5.0.html[Release Notes for 3.5.0] for more details
1. *xSYSV_ROOK: LDL^T^ with rook pivoting.* Contribution by Craig Lucas (University
of Manchester and NAG) and Sven Hammarling (NAG). These subroutines enable
better stability than the Bunch-Kaufman pivoting scheme currently used in
LAPACK xSYSV solvers; also, the elements of L are bounded, which is important
in some applications. The computational time of xSYSV_ROOK is slightly higher than
the one of xSYSV.
2. *2-by-1 CSD to be used for tall and skinny matrix with orthonormal columns* (in LAPCK 3.4.0, we already integrated CSD of a full square orthogonal matrix)
More info here: http://faculty.rmc.edu/bsutton/simultaneousbidiagonalization.html
3. *New stopping criteria for balancing*.
4. *New complex division algorithm*.
Following the algorithm of Michael Baudin, Robert L. Smith (see arxiv.1210.4539 )
5. *Other improvements:*
* slasyf.f, clasyf.f, clahef.f, dlasyf.f, zlahef.f, zlasyf.f :: various improvements by Igor Kozachenko
* dhgeqz.f, shgeqz.f :: improvements by Duncan Po (MathWorks), committed by Rodney James
* dlasd4.f, slasd4.f :: improvements by Sergey Kuznetsov (Intel), committed by Rodney James
* clarfb.f, dlarfb.f, slarfb.f, zlarfb.f :: improvements by Rodney James
9. Improvements in LAPACK 3.6.0
-------------------------------
See link:lapack-3.6.0.html[Release Notes for 3.6.0] for more details
10. Improvements in LAPACK 3.7.0
-------------------------------
See link:lapack-3.7.0.html[Release Notes for 3.7.0] for more details
11. Improvements in LAPACK 3.8.0
-------------------------------
See link:lapack-3.8.0.html[Release Notes for 3.8.0] for more details