Skip to content

Commit 98e7da2

Browse files
committed
new shortened version
1 parent a4fb7fe commit 98e7da2

File tree

1 file changed

+84
-89
lines changed

1 file changed

+84
-89
lines changed

joss/paper.md

Lines changed: 84 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -49,68 +49,49 @@ bibliography: paper.bib
4949

5050

5151
# Summary
52-
fortran-src is an open source Haskell library and command-line application
52+
fortran-src is an open-source Haskell library and command-line application
5353
for the lexing, parsing,
5454
and static analysis of Fortran source code. It provides an
5555
interface to build other Fortran language tools, e.g., for
5656
static analysis, automated refactoring, verification, and compilation.
57-
The library provides multiple parsers which support Fortran code conforming to the
58-
FORTRAN 66, FORTRAN 77, Fortran 90, Fortran 95 standards,
59-
some legacy extensions, and partial Fortran 2003 support.
60-
The parsers generate a shared Abstract
61-
Syntax Tree representation (AST), over which a variety
62-
of core static analyses are defined to facilitate development of
63-
analysis and language tools.
57+
The library supports FORTRAN 66, FORTRAN 77, Fortran 90, Fortran 95,
58+
some legacy extensions, and partially Fortran 2003, with
59+
a shared Abstract Syntax Tree representation.
6460
The library has been deployed in several
6561
language tool projects in academia and industry.
6662

6763
# Statement of need
6864

6965
As one of the oldest surviving programming languages [@backus1978history], Fortran
70-
underpins a vast amount of software still in deployment. Fortran is not only a mainstay
71-
of legacy software, but is also used to write new software,
72-
particularly in the sciences. Given the importance of numerical
73-
models in science, verifying the correctness of such models is
74-
critical for scientific integrity and progress. However, doing so is
75-
difficult, even more so than for traditional software; for
76-
computational models, the expected program behaviour is often unknown,
77-
uncertainty is the rule, and approximations are pervasive. Despite
78-
decades of progress in program verification within computer science,
79-
few formal verification techniques are applied in scientific software. To
80-
facilitate a step-change in the effectiveness of verification for
81-
computational science, a subset of the authors of this paper
82-
developed a suite of verification and static
83-
analysis tools named CamFort to explore lightweight verification methods
84-
(requiring little or no programmer effort), targeted at
85-
scientific programming [@contrastin2016lightning]. We chose Fortran as it remains a popular
66+
underpins a vast amount of software; Fortran is not only a mainstay
67+
of legacy software, but is also used to write new software. Fortran remains a popular
8668
language in the international scientific community; @vanderbauwhede2022making
8769
reports data from 2016 on the UK's \`\`Archer'' supercomputer, showing the
8870
vast majority of use being Fortran code. Fortran is
8971
particularly notable for its prevalence in earth sciences, e.g., for
9072
implementing global climate models that then inform international policy
9173
decisions [@mendez2014climate]. In 2024, Fortran re-entered the Top 10 programming languages in
9274
the [TIOBE Index](https://www.tiobe.com/tiobe-index/), showing its enduring popularity.
93-
94-
The continued use of Fortran, particualarly in
75+
The continued use of Fortran, particularly in
9576
scientific contexts, was the catalyst for this software package.
9677

9778
A challenge in writing language tools for Fortran is its long
9879
history. There have been several major language standards (FORTRAN
9980
I-IV, FORTRAN 66 and 77, Fortran 90, 95, 2003, 2008, etc.) or
10081
_restandardisations_. Newer standards often deprecate features
101-
which were known to be a ready source of errors, or were difficult to
82+
which are known to be a ready source of errors, or were difficult to
10283
specify or understand. However, compilers often support an amalgam of features across
103-
language standards, including deprecated features (@urmaetal2014).
84+
standards (@urmaetal2014).
10485
This enables developers to keep using deprecated features, or mix
105-
a variety of language standards.
86+
language standards.
10687
This complicates the task of developing new tools for manipulating Fortran
10788
source code; one must tame the weight of decades of language evolution.
10889

10990
This package, fortran-src, provides an open-source unified core for
11091
statically analysing Fortran code across language standards, with
11192
a focus on legacy code over cutting-edge modern Fortran. It is both
11293
a standalone tool and a library, providing
113-
a suite of standard static analyses and tools to be used as a basis for
94+
a suite of standard static analyses as a basis for
11495
further programming language tools and systems.
11596

11697
## Related software
@@ -126,24 +107,27 @@ but does not provide more general static analysis facilities.
126107
More recent work has developed open source
127108
tools for refactoring Fortran [@vanderbauwhede2022making]:
128109
[RefactorF4Acc](https://github.com/wimvanderbauwhede/RefactorF4Acc)\footnote{\url{https://github.com/wimvanderbauwhede/RefactorF4Acc}} is an
129-
open source tool for upgrading FORTRAN 77 code to Fortran 95.
110+
open-source tool for upgrading FORTRAN 77 code to Fortran 95.
111+
112+
No comprehensive lexing, parsing, and analysis library was available from which to
113+
build new tools.
130114

131115
# Functionality
132-
fortran-src provides the following functionality:
133116

134-
* lexing and parsing Fortran to an expressive abstract syntax tree;
135-
* perform various static analyses;
136-
* pretty printing;
137-
* "reprinting", or patching sections of source code without removing secondary
117+
* Lexing and parsing of Fortran to an expressive Abstract Syntax Tree;
118+
* Various static analyses, e.g., data flow analysis;
119+
* Type checking;
120+
* Pretty printing;
121+
* "Reprinting", or patching sections of source code without removing secondary
138122
notation such as comments;
139-
* exporting to JSON.
123+
* Exporting to JSON.
140124

141-
fortran-src is primarily a Haskell library, but it also packages a command-line
142-
tool for running and inspecting analyses. By exporting parsed code to JSON, the
125+
fortran-src is primarily a Haskell library but it also packages a command-line
126+
tool for analysis. By exporting parsed code to JSON, the
143127
parsing and standard analyses that fortran-src provides may be utilized by
144128
non-Haskell tools.
145129

146-
The library's top-level module is `Language.Fortran`; all submodules are within that namespace.
130+
The library's top-level module is `Language.Fortran`.
147131

148132
## Lexing and parsing
149133

@@ -155,7 +139,7 @@ accepted by major compilers. fortran-src takes roughly the latter
155139
approach, though it also has an extended Fortran 77 mode for supporting
156140
legacy extensions influenced by vendor-specific compilers that have been popular in the past.
157141

158-
Furthermore, the Fortran language has evolved through two broad syntactic forms:
142+
The Fortran language has evolved through two broad syntactic forms:
159143

160144
* _fixed source form_, used by FORTRAN 66 and FORTRAN 77 standards, where each
161145
line of source code follows a strict format (motivated by its original use
@@ -172,6 +156,7 @@ versions of the language: FORTRAN 66 and FORTRAN 77 (and additional
172156
`Legacy` and `Extended` modes), and the free form lexer, for Fortran
173157
90 onwards.
174158

159+
<!--
175160
The fixed form lexer (`Language.Fortran.Parser.Fixed.Lexer`) handles
176161
the expectation that the first 6 columns of a line are reserved for
177162
code labels and continuation line markers, with code starting at
@@ -182,17 +167,21 @@ ignored).
182167
The free form lexer (`Language.Fortran.Parser.Free.Lexer`) is less
183168
constrained but still has to manage continuation-line markers which
184169
break statements across multiple lines.
170+
-->
185171

186-
fortran-src then defines one parser per supported standard (with the exception of
187-
FORTRAN 77, for which we define extra parsers handling non-standard extended features).
172+
fortran-src defines one parser per supported standard (grouped
173+
under `Language.Fortran.Parser.Fixed` and `Language.Fortran.Parser.Free` depending
174+
on the lexing form), plus a parser
175+
for handling non-standard extended features.
188176
Each parser uses the source form that its standard specifies.
189-
Later Fortran standards such as Fortran 2003 are generally comparable to Fortran
190-
90, but with additional syntactic constructs. The fortran-src parsers reflect
191-
this, gating certain features by the language standard being parsed. Parsers are grouped by
177+
Later standards such as Fortran 2003 are generally comparable to Fortran
178+
90, but with additional syntactic constructs. The parser `gate' certain features by the language standard being parsed.
179+
180+
<!-- Parsers are grouped by
192181
fixed or free form, thus parsers for FORTRAN 66 and FORTRAN 77 are
193182
within the `Language.Fortran.Parser.Fixed` namespace and the rest are within
194183
`Language.Fortran.Parser.Free`. A top-level module (`Language.Fortran.Parser`)
195-
provides a unified point of access to the underlying parsers.
184+
provides a unified point of access to the underlying parsers. -->
196185

197186
The lexers are auto-generated via the [`alex`](https://github.com/haskell/alex) tool.
198187
The suite of parsers is automatically generated from
@@ -202,39 +191,38 @@ CPP (the C pre-processor) can be run prior to lexing or parsing.
202191

203192
## Unified Fortran AST
204193

205-
The parsers all share a common abstract syntax tree (AST) representation
206-
(`Language.Fortran.AST`) via a group of mutually-recursive data
194+
The parsers share a common abstract syntax tree (AST) representation (`Language.Fortran.AST`)
195+
defined via mutually-recursive data
207196
types. All such data types are _parametric data types_, parameterised by
208197
the type of "annotations" that can be stored in the nodes of the
209198
tree. For example, the top-level of the AST is the `ProgramFile a`
210199
type, which comprises a list of `ProgramUnit a` values, parameterised
211200
by the annotation type `a` (i.e., that is the generic type parameter).
212-
The annotation facility is useful for,
213-
for example, collecting information about types within the nodes
201+
The annotation facility is useful for collecting information about types within the nodes
214202
of the tree, or flagging whether the particular node of the tree has been
215203
rewritten or refactored.
216204

217205
Some simple transformations are provided on ASTs:
218206

219-
* Grouping transformation, turning unstructured ASTs into structured ASTs
207+
* Grouping transformation, turning unstructured ASTs into structured ASTs;
220208
(`Language.Fortran.Transformation.Grouping`);
221209
* Disambiguation of array indexing vs. function calls (as they share
222-
the same syntax in Fortran) (`Language.Fortran.Transformation.Disambiguation`)
223-
and intrinsic calls from regular function calls
224-
(`Language.Fortran.Transformation.Disambiguation.Intrinsic`), e.g.
210+
the same syntax in Fortran) (`Language.Fortran.Transformation.Disambiguation`);
211+
and intrinsic calls from regular function calls,
212+
(`Language.Fortran.Transformation.Disambiguation.Intrinsic`),
213+
e.g.
225214
`a(i)` is both the syntax for indexing array `a` at index `i` and
226215
for calling a function named `a` with argument `i`;
227-
* Fresh name transformation (obeying scoping)
228-
(`Language.Fortran.Analysis.Renaming`).
216+
* Fresh name transformation (obeying scoping) (`Language.Fortran.Analysis.Renaming`).
229217

230-
All of these transformations are applied to the ASTs following
218+
These transformations are applied to the AST following
231219
parsing (with some slight permutations on the grouping transformations
232220
depending on whether the code is FORTRAN 66 or not).
233221

234222
## Static analyses
235223

236224
The table below summarises the current static analysis techniques
237-
available within fortran-src (grouped under `Language.Fortran.Analysis`).
225+
available within fortran-src, (grouped under `Language.Fortran.Analysis`).
238226

239227
* Control-flow analysis (building a super graph) (`Language.Fortran.Analysis.BBlocks`);
240228
* General data flow analyses (`Language.Fortran.Analysis.DataFlow`), including:
@@ -253,8 +241,8 @@ is provided for evaluation of expressions and for semantic analysis
253241
(`Language.Fortran.Repr.Eval.Value`) leverages this representation
254242
and enables some symbolic manipulation too, essentially providing some partial evaluation.
255243

256-
For a demonstration of using fortran-src for static analysis, there
257-
is a small demo tool which detects if an allocatable array is used
244+
A demonstration of fortran-src for static analysis is provided
245+
by a small demo tool which detects if an allocatable array is used
258246
before it has been allocated.\footnote{\url{https://github.com/camfort/allocate-analysis-example}}
259247

260248
## Pretty printing, reprinting, and rewriting
@@ -266,44 +254,34 @@ code from the internal AST (`Language.Fortran.PrettyPrint`).
266254
Furthermore, fortran-src provides a diff-like patching feature for
267255
(unparsed) Fortran source code that accounts for the fixed form style,
268256
handling the fixed form lexing of lines, and comments in its
269-
application of patches (`Language.Fortran.Rewriter`). This aids in the
270-
development of refactoring tools.
271-
272-
The associated \texttt{CamFort} package\footnote{\url{https://github.com/camfort/camfort}} which builds heavily on fortran-src provides a related "reprinting" algorithm (@clarke2017scrap)
273-
that fuses a depth-first traversal of the AST with a textual diff algorithm
274-
on the original source code. The reprinter is parameterised by `reprintings`
275-
which hook into each node and allow nodes which have been refactored by CamFort
276-
to have the pretty printer applied to them. The resulting outputs from each
277-
node are stitched into the position from which they originated in the
278-
input source file. This further enables the
279-
development of refactoring tools that need to perform transformations on source code text.
257+
application of patches (`Language.Fortran.Rewriter`). This aids in the development of refactoring tools.
280258

281259
# Work building on fortran-src
282260

283261
## CamFort
284262

285-
As mentioned in the introduction, the origin of fortran-src was
286-
in the CamFort project and its suite of tools. The aim of the
287-
CamFort project\footnote{Funded from 2015-18
263+
The fortran-src package originated in
264+
the CamFort project\footnote{Funded from 2015-18
288265
by the EPSRC under the project title \emph{CamFort: Automated evolution and
289266
verification of computational science
290267
models} \url{https://gow.epsrc.ukri.org/NGBOViewGrant.aspx?GrantRef=EP/M026124/1}}
291-
was to develop practical tools for scientists to
268+
whose aim was to (1) develop practical tools for scientists to
292269
help reduce the accidental complexity of models through
293-
evolving a code base, as well as tools for automatically verifying
270+
evolving a code base, and (2) provide tools for automatically verifying
294271
that any maintenance/evolution activity preserves the model's
295-
behaviour. The work resulted in the CamFort verification tool for
296-
Fortran\footnote{\url{https://github.com/camfort/camfort}} of which
297-
fortran-src was the core infrastructure developed for the tool.
272+
behaviour. The work resulted in the CamFort tool
273+
of which fortran-src was the core infrastructure.
298274

299-
CamFort provides some facilities for automatically refactoring
275+
CamFort provides facilities for automatically refactoring
300276
deprecated or dangerous programming patterns, with the goal of helping
301277
to meet core quality requirements, such as maintainability
302278
(@DBLP:conf/oopsla/OrchardR13). For example, it can rewrite
303279
EQUIVALENCE and COMMON blocks (both of which were deprecated in the
304-
Fortran 90 standard) into more modern Fortran style. These
280+
Fortran 90 standard) into more modern Fortran style.
281+
282+
<!-- These
305283
refactorings also help expose any programming bugs arising from bad
306-
programming practices.
284+
programming practices. -->
307285

308286
The bulk of the features are however focussed on code analysis and
309287
lightweight verification (@contrastin2016lightning). Source-code
@@ -313,18 +291,35 @@ conforms to these specifications. CamFort can also suggest places to
313291
insert specifications and, in some cases, infer the specifications
314292
of existing code. Facilities include: units-of-measure typing
315293
(@DBLP:journals/corr/abs-2011-06094,@DBLP:journals/jocs/OrchardRO15,@danish2024incremental),
316-
array access patterns (for capturing the shape of stencil computations
317-
that can be complex, involving intricate index manipulations)
294+
array access patterns (for capturing the shape of stencil computations)
318295
(@orchard2017verifying), deductive reasoning via pre- and
319-
post-conditions in Hoare logic style, and various code safety checks
296+
post-conditions in Hoare logic style, and various code safety checks.
297+
298+
<!-->
320299
such as memory safety by ensuring every ALLOCATE has a DEALLOCATE,
321300
robustness by analysing the use of conditionals on floating-point
322301
numbers, and performance bug checks on arrays (e.g., that the order of
323302
array indexing using induction variables matches the order of enclosing
324-
loops defining those induction variables). CamFort has been previously
325-
deployed at the Met Office, with its analysing tooling run on the Unified
303+
loops defining those induction variables).
304+
-->
305+
306+
CamFort also provides an advanced rewriting alogrithm for
307+
that fuses a depth-first traversal of the AST with a textual diff algorithm
308+
on the original source code, called "reprinting" (@clarke2017scrap).
309+
310+
CamFort has been previously
311+
deployed at the Met Office, with its analysis tooling run on the Unified
326312
Model (@walters2017met) to ensure internal code quality standards are met.
327313

314+
<!--
315+
The reprinter is parameterised by `reprintings`
316+
which hook into each node and allow nodes which have been refactored by CamFort
317+
to have the pretty printer applied to them. The resulting outputs from each
318+
node are stitched into the position from which they originated in the
319+
input source file. This further enables the
320+
development of refactoring tools that need to perform transformations on source code text.
321+
-->
322+
328323
## fortran-vars memory model library
329324

330325
`fortran-vars` is a static analysis library built on top of `fortran-src`. Many
@@ -341,7 +336,7 @@ instead of variable names.
341336

342337
## Nonstandard INTEGER refactoring
343338

344-
Outside of CamFort, fortran-src has been used to build other (closed
339+
fortran-src has been used to build other (closed
345340
source) refactoring tools to help migration and improve the quality
346341
of large legacy codebases, building on top of the library's AST, analysis, and
347342
reprinting features.

0 commit comments

Comments
 (0)