@@ -49,68 +49,49 @@ bibliography: paper.bib
49
49
50
50
51
51
# Summary
52
- fortran-src is an open source Haskell library and command-line application
52
+ fortran-src is an open- source Haskell library and command-line application
53
53
for the lexing, parsing,
54
54
and static analysis of Fortran source code. It provides an
55
55
interface to build other Fortran language tools, e.g., for
56
56
static analysis, automated refactoring, verification, and compilation.
57
- The library provides multiple parsers which support Fortran code conforming to the
58
- FORTRAN 66, FORTRAN 77, Fortran 90, Fortran 95 standards,
59
- some legacy extensions, and partial Fortran 2003 support.
60
- The parsers generate a shared Abstract
61
- Syntax Tree representation (AST), over which a variety
62
- of core static analyses are defined to facilitate development of
63
- analysis and language tools.
57
+ The library supports FORTRAN 66, FORTRAN 77, Fortran 90, Fortran 95,
58
+ some legacy extensions, and partially Fortran 2003, with
59
+ a shared Abstract Syntax Tree representation.
64
60
The library has been deployed in several
65
61
language tool projects in academia and industry.
66
62
67
63
# Statement of need
68
64
69
65
As one of the oldest surviving programming languages [ @backus1978history ] , Fortran
70
- underpins a vast amount of software still in deployment. Fortran is not only a mainstay
71
- of legacy software, but is also used to write new software,
72
- particularly in the sciences. Given the importance of numerical
73
- models in science, verifying the correctness of such models is
74
- critical for scientific integrity and progress. However, doing so is
75
- difficult, even more so than for traditional software; for
76
- computational models, the expected program behaviour is often unknown,
77
- uncertainty is the rule, and approximations are pervasive. Despite
78
- decades of progress in program verification within computer science,
79
- few formal verification techniques are applied in scientific software. To
80
- facilitate a step-change in the effectiveness of verification for
81
- computational science, a subset of the authors of this paper
82
- developed a suite of verification and static
83
- analysis tools named CamFort to explore lightweight verification methods
84
- (requiring little or no programmer effort), targeted at
85
- scientific programming [ @contrastin2016lightning ] . We chose Fortran as it remains a popular
66
+ underpins a vast amount of software; Fortran is not only a mainstay
67
+ of legacy software, but is also used to write new software. Fortran remains a popular
86
68
language in the international scientific community; @vanderbauwhede2022making
87
69
reports data from 2016 on the UK's \`\` Archer'' supercomputer, showing the
88
70
vast majority of use being Fortran code. Fortran is
89
71
particularly notable for its prevalence in earth sciences, e.g., for
90
72
implementing global climate models that then inform international policy
91
73
decisions [ @mendez2014climate ] . In 2024, Fortran re-entered the Top 10 programming languages in
92
74
the [ TIOBE Index] ( https://www.tiobe.com/tiobe-index/ ) , showing its enduring popularity.
93
-
94
- The continued use of Fortran, particualarly in
75
+ The continued use of Fortran, particularly in
95
76
scientific contexts, was the catalyst for this software package.
96
77
97
78
A challenge in writing language tools for Fortran is its long
98
79
history. There have been several major language standards (FORTRAN
99
80
I-IV, FORTRAN 66 and 77, Fortran 90, 95, 2003, 2008, etc.) or
100
81
_ restandardisations_ . Newer standards often deprecate features
101
- which were known to be a ready source of errors, or were difficult to
82
+ which are known to be a ready source of errors, or were difficult to
102
83
specify or understand. However, compilers often support an amalgam of features across
103
- language standards, including deprecated features (@urmaetal2014 ).
84
+ standards (@urmaetal2014 ).
104
85
This enables developers to keep using deprecated features, or mix
105
- a variety of language standards.
86
+ language standards.
106
87
This complicates the task of developing new tools for manipulating Fortran
107
88
source code; one must tame the weight of decades of language evolution.
108
89
109
90
This package, fortran-src, provides an open-source unified core for
110
91
statically analysing Fortran code across language standards, with
111
92
a focus on legacy code over cutting-edge modern Fortran. It is both
112
93
a standalone tool and a library, providing
113
- a suite of standard static analyses and tools to be used as a basis for
94
+ a suite of standard static analyses as a basis for
114
95
further programming language tools and systems.
115
96
116
97
## Related software
@@ -126,24 +107,27 @@ but does not provide more general static analysis facilities.
126
107
More recent work has developed open source
127
108
tools for refactoring Fortran [ @vanderbauwhede2022making ] :
128
109
[ RefactorF4Acc] ( https://github.com/wimvanderbauwhede/RefactorF4Acc ) \footnote{\url{https://github.com/wimvanderbauwhede/RefactorF4Acc}} is an
129
- open source tool for upgrading FORTRAN 77 code to Fortran 95.
110
+ open-source tool for upgrading FORTRAN 77 code to Fortran 95.
111
+
112
+ No comprehensive lexing, parsing, and analysis library was available from which to
113
+ build new tools.
130
114
131
115
# Functionality
132
- fortran-src provides the following functionality:
133
116
134
- * lexing and parsing Fortran to an expressive abstract syntax tree;
135
- * perform various static analyses;
136
- * pretty printing;
137
- * "reprinting", or patching sections of source code without removing secondary
117
+ * Lexing and parsing of Fortran to an expressive Abstract Syntax Tree;
118
+ * Various static analyses, e.g., data flow analysis;
119
+ * Type checking;
120
+ * Pretty printing;
121
+ * "Reprinting", or patching sections of source code without removing secondary
138
122
notation such as comments;
139
- * exporting to JSON.
123
+ * Exporting to JSON.
140
124
141
- fortran-src is primarily a Haskell library, but it also packages a command-line
142
- tool for running and inspecting analyses . By exporting parsed code to JSON, the
125
+ fortran-src is primarily a Haskell library but it also packages a command-line
126
+ tool for analysis . By exporting parsed code to JSON, the
143
127
parsing and standard analyses that fortran-src provides may be utilized by
144
128
non-Haskell tools.
145
129
146
- The library's top-level module is ` Language.Fortran ` ; all submodules are within that namespace .
130
+ The library's top-level module is ` Language.Fortran ` .
147
131
148
132
## Lexing and parsing
149
133
@@ -155,7 +139,7 @@ accepted by major compilers. fortran-src takes roughly the latter
155
139
approach, though it also has an extended Fortran 77 mode for supporting
156
140
legacy extensions influenced by vendor-specific compilers that have been popular in the past.
157
141
158
- Furthermore, the Fortran language has evolved through two broad syntactic forms:
142
+ The Fortran language has evolved through two broad syntactic forms:
159
143
160
144
* _ fixed source form_ , used by FORTRAN 66 and FORTRAN 77 standards, where each
161
145
line of source code follows a strict format (motivated by its original use
@@ -172,6 +156,7 @@ versions of the language: FORTRAN 66 and FORTRAN 77 (and additional
172
156
` Legacy ` and ` Extended ` modes), and the free form lexer, for Fortran
173
157
90 onwards.
174
158
159
+ <!--
175
160
The fixed form lexer (`Language.Fortran.Parser.Fixed.Lexer`) handles
176
161
the expectation that the first 6 columns of a line are reserved for
177
162
code labels and continuation line markers, with code starting at
@@ -182,17 +167,21 @@ ignored).
182
167
The free form lexer (`Language.Fortran.Parser.Free.Lexer`) is less
183
168
constrained but still has to manage continuation-line markers which
184
169
break statements across multiple lines.
170
+ -->
185
171
186
- fortran-src then defines one parser per supported standard (with the exception of
187
- FORTRAN 77, for which we define extra parsers handling non-standard extended features).
172
+ fortran-src defines one parser per supported standard (grouped
173
+ under ` Language.Fortran.Parser.Fixed ` and ` Language.Fortran.Parser.Free ` depending
174
+ on the lexing form), plus a parser
175
+ for handling non-standard extended features.
188
176
Each parser uses the source form that its standard specifies.
189
- Later Fortran standards such as Fortran 2003 are generally comparable to Fortran
190
- 90, but with additional syntactic constructs. The fortran-src parsers reflect
191
- this, gating certain features by the language standard being parsed. Parsers are grouped by
177
+ Later standards such as Fortran 2003 are generally comparable to Fortran
178
+ 90, but with additional syntactic constructs. The parser `gate' certain features by the language standard being parsed.
179
+
180
+ <!-- Parsers are grouped by
192
181
fixed or free form, thus parsers for FORTRAN 66 and FORTRAN 77 are
193
182
within the `Language.Fortran.Parser.Fixed` namespace and the rest are within
194
183
`Language.Fortran.Parser.Free`. A top-level module (`Language.Fortran.Parser`)
195
- provides a unified point of access to the underlying parsers.
184
+ provides a unified point of access to the underlying parsers. -->
196
185
197
186
The lexers are auto-generated via the [ ` alex ` ] ( https://github.com/haskell/alex ) tool.
198
187
The suite of parsers is automatically generated from
@@ -202,39 +191,38 @@ CPP (the C pre-processor) can be run prior to lexing or parsing.
202
191
203
192
## Unified Fortran AST
204
193
205
- The parsers all share a common abstract syntax tree (AST) representation
206
- ( ` Language.Fortran.AST ` ) via a group of mutually-recursive data
194
+ The parsers share a common abstract syntax tree (AST) representation ( ` Language.Fortran.AST ` )
195
+ defined via mutually-recursive data
207
196
types. All such data types are _ parametric data types_ , parameterised by
208
197
the type of "annotations" that can be stored in the nodes of the
209
198
tree. For example, the top-level of the AST is the ` ProgramFile a `
210
199
type, which comprises a list of ` ProgramUnit a ` values, parameterised
211
200
by the annotation type ` a ` (i.e., that is the generic type parameter).
212
- The annotation facility is useful for,
213
- for example, collecting information about types within the nodes
201
+ The annotation facility is useful for collecting information about types within the nodes
214
202
of the tree, or flagging whether the particular node of the tree has been
215
203
rewritten or refactored.
216
204
217
205
Some simple transformations are provided on ASTs:
218
206
219
- * Grouping transformation, turning unstructured ASTs into structured ASTs
207
+ * Grouping transformation, turning unstructured ASTs into structured ASTs;
220
208
(` Language.Fortran.Transformation.Grouping ` );
221
209
* Disambiguation of array indexing vs. function calls (as they share
222
- the same syntax in Fortran) (` Language.Fortran.Transformation.Disambiguation ` )
223
- and intrinsic calls from regular function calls
224
- (` Language.Fortran.Transformation.Disambiguation.Intrinsic ` ), e.g.
210
+ the same syntax in Fortran) (` Language.Fortran.Transformation.Disambiguation ` );
211
+ and intrinsic calls from regular function calls,
212
+ (` Language.Fortran.Transformation.Disambiguation.Intrinsic ` ),
213
+ e.g.
225
214
` a(i) ` is both the syntax for indexing array ` a ` at index ` i ` and
226
215
for calling a function named ` a ` with argument ` i ` ;
227
- * Fresh name transformation (obeying scoping)
228
- (` Language.Fortran.Analysis.Renaming ` ).
216
+ * Fresh name transformation (obeying scoping) (` Language.Fortran.Analysis.Renaming ` ).
229
217
230
- All of these transformations are applied to the ASTs following
218
+ These transformations are applied to the AST following
231
219
parsing (with some slight permutations on the grouping transformations
232
220
depending on whether the code is FORTRAN 66 or not).
233
221
234
222
## Static analyses
235
223
236
224
The table below summarises the current static analysis techniques
237
- available within fortran-src (grouped under ` Language.Fortran.Analysis ` ).
225
+ available within fortran-src, (grouped under ` Language.Fortran.Analysis ` ).
238
226
239
227
* Control-flow analysis (building a super graph) (` Language.Fortran.Analysis.BBlocks ` );
240
228
* General data flow analyses (` Language.Fortran.Analysis.DataFlow ` ), including:
@@ -253,8 +241,8 @@ is provided for evaluation of expressions and for semantic analysis
253
241
(` Language.Fortran.Repr.Eval.Value ` ) leverages this representation
254
242
and enables some symbolic manipulation too, essentially providing some partial evaluation.
255
243
256
- For a demonstration of using fortran-src for static analysis, there
257
- is a small demo tool which detects if an allocatable array is used
244
+ A demonstration of fortran-src for static analysis is provided
245
+ by a small demo tool which detects if an allocatable array is used
258
246
before it has been allocated.\footnote{\url{https://github.com/camfort/allocate-analysis-example}}
259
247
260
248
## Pretty printing, reprinting, and rewriting
@@ -266,44 +254,34 @@ code from the internal AST (`Language.Fortran.PrettyPrint`).
266
254
Furthermore, fortran-src provides a diff-like patching feature for
267
255
(unparsed) Fortran source code that accounts for the fixed form style,
268
256
handling the fixed form lexing of lines, and comments in its
269
- application of patches (` Language.Fortran.Rewriter ` ). This aids in the
270
- development of refactoring tools.
271
-
272
- The associated \texttt{CamFort} package\footnote{\url{https://github.com/camfort/camfort}} which builds heavily on fortran-src provides a related "reprinting" algorithm (@clarke2017scrap )
273
- that fuses a depth-first traversal of the AST with a textual diff algorithm
274
- on the original source code. The reprinter is parameterised by ` reprintings `
275
- which hook into each node and allow nodes which have been refactored by CamFort
276
- to have the pretty printer applied to them. The resulting outputs from each
277
- node are stitched into the position from which they originated in the
278
- input source file. This further enables the
279
- development of refactoring tools that need to perform transformations on source code text.
257
+ application of patches (` Language.Fortran.Rewriter ` ). This aids in the development of refactoring tools.
280
258
281
259
# Work building on fortran-src
282
260
283
261
## CamFort
284
262
285
- As mentioned in the introduction, the origin of fortran-src was
286
- in the CamFort project and its suite of tools. The aim of the
287
- CamFort project\footnote{Funded from 2015-18
263
+ The fortran-src package originated in
264
+ the CamFort project\footnote{Funded from 2015-18
288
265
by the EPSRC under the project title \emph{CamFort: Automated evolution and
289
266
verification of computational science
290
267
models} \url{https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/M026124/1}}
291
- was to develop practical tools for scientists to
268
+ whose aim was to (1) develop practical tools for scientists to
292
269
help reduce the accidental complexity of models through
293
- evolving a code base, as well as tools for automatically verifying
270
+ evolving a code base, and (2) provide tools for automatically verifying
294
271
that any maintenance/evolution activity preserves the model's
295
- behaviour. The work resulted in the CamFort verification tool for
296
- Fortran\footnote{\url{https://github.com/camfort/camfort}} of which
297
- fortran-src was the core infrastructure developed for the tool.
272
+ behaviour. The work resulted in the CamFort tool
273
+ of which fortran-src was the core infrastructure.
298
274
299
- CamFort provides some facilities for automatically refactoring
275
+ CamFort provides facilities for automatically refactoring
300
276
deprecated or dangerous programming patterns, with the goal of helping
301
277
to meet core quality requirements, such as maintainability
302
278
(@DBLP : conf /oopsla/OrchardR13). For example, it can rewrite
303
279
EQUIVALENCE and COMMON blocks (both of which were deprecated in the
304
- Fortran 90 standard) into more modern Fortran style. These
280
+ Fortran 90 standard) into more modern Fortran style.
281
+
282
+ <!-- These
305
283
refactorings also help expose any programming bugs arising from bad
306
- programming practices.
284
+ programming practices. -->
307
285
308
286
The bulk of the features are however focussed on code analysis and
309
287
lightweight verification (@contrastin2016lightning ). Source-code
@@ -313,18 +291,35 @@ conforms to these specifications. CamFort can also suggest places to
313
291
insert specifications and, in some cases, infer the specifications
314
292
of existing code. Facilities include: units-of-measure typing
315
293
(@DBLP : journals /corr/abs-2011-06094,@DBLP : journals /jocs/OrchardRO15,@danish2024incremental ),
316
- array access patterns (for capturing the shape of stencil computations
317
- that can be complex, involving intricate index manipulations)
294
+ array access patterns (for capturing the shape of stencil computations)
318
295
(@orchard2017verifying ), deductive reasoning via pre- and
319
- post-conditions in Hoare logic style, and various code safety checks
296
+ post-conditions in Hoare logic style, and various code safety checks.
297
+
298
+ <!-- >
320
299
such as memory safety by ensuring every ALLOCATE has a DEALLOCATE,
321
300
robustness by analysing the use of conditionals on floating-point
322
301
numbers, and performance bug checks on arrays (e.g., that the order of
323
302
array indexing using induction variables matches the order of enclosing
324
- loops defining those induction variables). CamFort has been previously
325
- deployed at the Met Office, with its analysing tooling run on the Unified
303
+ loops defining those induction variables).
304
+ -->
305
+
306
+ CamFort also provides an advanced rewriting alogrithm for
307
+ that fuses a depth-first traversal of the AST with a textual diff algorithm
308
+ on the original source code, called "reprinting" (@clarke2017scrap ).
309
+
310
+ CamFort has been previously
311
+ deployed at the Met Office, with its analysis tooling run on the Unified
326
312
Model (@walters2017met ) to ensure internal code quality standards are met.
327
313
314
+ <!--
315
+ The reprinter is parameterised by `reprintings`
316
+ which hook into each node and allow nodes which have been refactored by CamFort
317
+ to have the pretty printer applied to them. The resulting outputs from each
318
+ node are stitched into the position from which they originated in the
319
+ input source file. This further enables the
320
+ development of refactoring tools that need to perform transformations on source code text.
321
+ -->
322
+
328
323
## fortran-vars memory model library
329
324
330
325
` fortran-vars ` is a static analysis library built on top of ` fortran-src ` . Many
@@ -341,7 +336,7 @@ instead of variable names.
341
336
342
337
## Nonstandard INTEGER refactoring
343
338
344
- Outside of CamFort, fortran-src has been used to build other (closed
339
+ fortran-src has been used to build other (closed
345
340
source) refactoring tools to help migration and improve the quality
346
341
of large legacy codebases, building on top of the library's AST, analysis, and
347
342
reprinting features.
0 commit comments