P90.tex

% This is the aspauthor.tex LaTeX file
% Copyright 2010, Astronomical Society of the Pacific Conference Series

\documentclass[11pt,twoside]{article}
\usepackage{asp2010}

\resetcounters

\bibliographystyle{asp2010}

\markboth{Thomas et al.}{FITS and the needs of modern astronomical research}

\newcommand{\aspconf}{ASP Conf.\ Ser.}

\begin{document}

\title{Why FITS is failing to meet the needs of modern astronomical
  research}
\author{Brian~Thomas$^1$, Frossie~Economou$^1$, Perry~Greenfield$^2$,
Paul~Hirst$^3$, Tim~Jenness$^4$, David~S.~Berry$^5$, Erik~Bray$^2$,
Norman~Gray$^6$, Demitri~Muna$^7$, James~Turner$^8$,
Miguel~de~Val-Borro$^9$, Jaunde~Santander~Vela$^{10}$,
David~Shupe$^{11}$, John~Good$^{12}$, and G.~Bruce~Berriman$^{12}$
\affil{$^1$National Optical Astronomy Observatory, 950 N.\ Cherry Ave,
Tucson, AZ 85719, USA}
\affil{$^2$Space Telescope Science Institute, 3700 San Martin Drive,
  Baltimore, MD 21218, USA}
\affil{$^3$Gemini Observatory, 670 N.\ A`oh\=ok\=u Place, Hilo, HI
  96720, USA}
\affil{$^4$Department of Astronomy, Cornell University, Ithaca NY,
  14853, USA}
\affil{$^5$Joint Astronomy Centre, 660 N.\ A`oh\=ok\=u Place, Hilo, HI
96720, USA}
\affil{$^6$School of Physics \& Astronomy, University of Glasgow,
  Glasgow, G12~8QQ, United Kingdom}
\affil{$^7$Department of Astronomy, Ohio State University, Columbus,
  OH 43210, USA}
\affil{$^8$Gemini Observatory, Casilla 603, La Serena, Chile}
\affil{$^9$Department of Astrophysical Sciences, Princeton University,
Princeton, NY 08544, USA}
\affil{$^{10}$Instituto de Astrof\'{i}sica de Analuc\'{i}a, Glorieta
  de la Astronom\'{i}a s/n, E-18008, Granada, Spain}
\affil{$^{11}$Infrared Processing and Analysis Center, Caltech,
  Pasadena, CA 91125, USA}
}

\begin{abstract}
  The Flexible Image Transport System (FITS) standard has been a great
  boon to astronomy, allowing observatories, scientists and the public
  to exchange astronomical information easily. The FITS standard is,
  however, showing its age. Developed in the late 1970's the FITS
  authors made a number of implementation choices for the format that,
  while common at time, are now seen to limit its utility with modern
  data. The authors of the FITS standard could not appreciate the
  challenges which we would be facing today in astronomical
  computing. Difficulties we now face include, but are not limited to,
  having to address the need to handle an expanded range of
  specialized data product types (data models), being more conducive
  to the networked exchange and storage of data, handling very large
  datasets and the need to capture significantly more complex metadata
  and data relationships.

  There are members of the community today who find some (or all) of
  these limitations unworkable, and have decided to move ahead with
  storing data in other formats. This reaction should be taken as a
  wakeup call to the FITS community to make changes in the FITS
  standard, or to see its usage fall. In this paper we detail some
  selected important problems which exist within the FITS standard
  today.  It is not our intention to prescribe specific remedies to
  these issues; rather, we hope to call attention of the FITS and
  greater astronomical computing communities to these issues in the
  hopes that it will spur action to address them.
\end{abstract}

\section{Introduction}

The Flexible Image Transport System standard
\citep[FITS;][]{1981A&AS...44..363W,2001A&A...376..359H,2010A&A...524A..42P}
has been a fundamental part of astronomical computing for a
significant part of the past 4 decades. FITS has provided the central
means to store and exchange astronomical data and, because of hard
work of the FITS community, it has become a relatively easy exercise
for application writers, archivists and end user scientists to
interchange data and work productively on many computational astronomy
problems. The success of FITS is such that it has even spread to other
domains such as medical imaging and digitizing manuscripts in the
Vatican library
\citep[ascl:1206.013,][]{2006JRASC.100..242W,2012EWASSAlle}

%% EWASS 2012 had multiple papers on FITS in a special session on
%% digiral preservation. Chiappetti (from IAU) and Ammenti (from
%% the Vatican). No write ups available though.

Although there have been significant changes, the FITS standard has
evolved very slowly since its genesis in the late 1970s. FITS has
added new types of metadata conventions such as World Coordinate
System
\citep[WCS;][]{2002A&A...395.1061G,2002A&A...395.1077C,2006A&A...446..747G}
representation, and data serializations such as variable length binary
tables \citep{1995A&AS..113..159C}. Nevertheless, these changes have
not been sufficient to match the greater evolution in astronomical
research over the same period of time.

Astronomical research now goes beyond the paradigm of the original
scientific team consuming only the observational data for which they
proposed. The astronomy community has shifted towards utilizing
observations of others, accessing data from remote archives over the
Internet, and combining these data with the original observations (or
theoretical calculations) in order to obtain better and wider ranging
scientific results. Many research projects now involve many diverse
data sets from a range of sources. Instruments in astronomy now
produce many orders of magnitude larger datasets than were common at
the time FITS was born, in some cases
\citep[e.g.][]{2012ASPC..461..283A} requiring parallelized,
distributed storage systems to provide adequate data rates.

Astronomers have come to increasingly rely on others to write software
to help process and analyze their data. Common libraries, analysis
environments, pipeline processed data, applications and services
provided by third parties form a crucial foundation for many
astronomers' toolboxes. All of this requires that the interchange of
data between different tools needs to be as seamless and automated as
possible, and that complex data models and metadata used in processing
are maintained and understood through the interchange.

These changes in research practices pose new challenges for 21st
century. We must address the need to handle an expanded range of
specialized data product types and models, be more conducive to the
distributed exchange and storage of data, handle very large datasets
and provide a means to capture significantly more complex metadata and
data relationships.

These limitations have caused members of the community to seek more
capable storage formats, both in the past \citep[e.g.\ Starlink
HDS;][]{1982QJRAS..23..485D} and in the present and future
\citep[e.g.\ LOFAR][]{2011ASPC..442...53A} or in developing new
formats of their own. Use of FITS will inevitably fall should it not
adapt to these new challenges.

It is our intended goal in this paper to highlight some selected,
important, problems which exist in the FITS core standard today. It is
not our intention in this paper to prescribe specific remedies to
these issues; rather, we hope to call attention of the FITS and
greater astronomical computing community to these issues in the hopes
that it will spur action to resolve them.

\section{Problems -- Deficiencies of FITS for Modern Astronomical Computing}

There is not enough space in this paper to go into a detailed
description of the deficiencies that we see are present in the current
incarnation of FITS. Instead we will summarize the issues and present
a more detailed examination in a subsequent paper.

\subsection{Lack of versioning, semantics and encodings}

There is no standard way to specify the version of a given FITS file
or what extensions it supports. You must read the file and determine
dynamically which extensions are present and whether they are
understood. The ``once FITS, forever FITS'' maxim gives you some
confidence that you can always read a FITS file but you can't be sure
if there is something that you don't understand. Explicit versioning
of FITS files will help but there also needs to be some way to declare
that a particular data model is being used from variants such as
OIFITS \citep{2006SPIE.6268E.106T}, MBFITS
\citep{2006A&A...454L..25M}, and FITS-IDI \citep{2011AIPS114}, and to
validate the contents against a schema.

The allowed character set in FITS of 7-bit US-ASCII is overly
restrictive in a Unicode world. In the late 1970s this was the correct
choice but in the 21st Century it seems strange that we cannot use an
accent in someone's name or use a degree symbol in a comment.

\subsection{Missing critical data models}

FITS can support tables and multi-dimensional images but there is not
standardised way of grouping these constructs in a related
manner. Determining that a particular image extension contains the
variance or mask for another image relies on string parsing and shared
convention. Allowing related extensions to be grouped with a
dictionary of shared meaning would go a long way to help with this
issue.

The current FITS WCS data models are complex yet incomplete and
inflexible. There needs to be a way to stack mappings in an arbitrary
manner to allow for flexible model development \citep[see
e.g.][]{O35_adassxxii,2012ASPC..461..825B}.


\subsection{Inflexibility in organizing and representing metadata and
  data}

The 80 character fixed format header is extremely restrictive and
encourages awkward extensions (e.g. ESO HIERARCH) that can not
overcome the underlying constraints. Additionally, the lack of
namespaces and uncertainty over inheritance of headers to other images
leads to confusion.

\subsection{Inadequate support for large, distributed data}

Modern data sets can result in files of several terabytes that must be
distributed across multiple file systems. The FITS grouping convention
tries to provide this facility and should be made more robust and
transparent. Additionally, streaming indeterminately sized data sets
to files must be supported.


\section{Summary -- Significant Problems exist in the FITS standard}

The problems which we have described are real and significant. We
don’t wish to recommend a particular solution here. Action to correct
these issues should flow from constructive community discussion, and
offered solutions to these problems. Possible solutions may involve
moving existing FITS conventions into the core standard,
modification of the FITS standard to remove limitations or possibly
transferring the FITS data model over into a new serialisation, or
some selection of these actions.

These technical problems will be solved one way or another. If the
community is not willing to do the hard work of hammering out a
universal (or widely-adopted) approach, individual projects will
continue to make their own ad-hoc solutions. Data formats will become
increasingly fragmented and we will no longer enjoy the easy
interoperability that FITS has provided for many years.


\bibliography{P90}

\end{document}