design-decisions.tex

\section{Design Decisions}
\label{sec:decisions}

LAGraph is for users who want to use graph algorithms that run on top of the GraphBLAS.  
Our overarching design goal is ease of use with flexibility to handle advanced use-cases.
We do not wish to compromise performance, but when the tradeoff between convenience and performance
is unavoidable, we offer both and let the user choose.
LAGraph includes a set of data structures and utility functions 
that make it convenient for developers to write algorithms on top of GraphBLAS with 
an approachable API and consistent user experience.

\begin{listing}
\begin{cplus}
typedef struct LAGraph_Graph_struct
{
    GrB_Matrix   A;      // adjacency matrix of the graph
    LAGraph_Kind kind;   // kind of graph: directed, etc.
    
    // cached properties
    GrB_Matrix   AT;     // transpose of A
    GrB_Vector   row_degree;
    GrB_Vector   col_degree;
    LAGraph_BooleanProperty A_pattern_is_symmetric;
    int64_t      ndiag;  // -1 if unknown
} *LAGraph_Graph;

typedef struct LAGraph_Graph_struct *LAGraph_Graph; |$\label{line:LAGraph_Graph}$|

// creating a graph
GrB_Matrix M;
// ...construction of M omitted

LAGraph_Graph G;
LAGraph_New(&G, &M, LAGRAPH_DIRECTED_ADJACENCY); |$\label{line:CreateGraphObject}$|

// operating on properties
LAGraph_Property_AT(G, msg);  |$\label{line:ComputeTranspose}$|// compute/cache
\end{cplus}
\caption{{\tt LAGraph\_Graph} data structure and methods.}
\label{Lst.graph}
\end{listing}

\subsection{Core data structure}

The main data structure in LAGraph is the \verb'LAGraph_Graph' which consists of primary components
and cached properties. The data structure is not opaque, providing the user with full ability to access and
modify all internal components. This contrasts with the opaque objects in the GraphBLAS.  This data structure is 
shown at the top of Listing~\ref{Lst.graph} and defined ultimately on Line~\ref{line:LAGraph_Graph}.

The primary components of this struct are a GraphBLAS matrix named \verb'A' and an enumeration \verb'kind'.
The kind indicates how the matrix should be interpreted.
Currently, the only kinds defined are \verb'LAGRAPH_ADJACENCY_UNDIRECTED' and 
\verb'LAGRAPH_ADJACENCY_DIRECTED', but more options will be added in the future.  Creating the 
Graph object is performed on Line~\ref{line:CreateGraphObject} of Listing~\ref{Lst.graph}.
Following this call, \verb'M' will be \verb'NULL'. The matrix previously pointed to by \verb'M' now
lives at \verb'G->A'. This ``move'' constructor helps avoid memory-freeing errors.

Cached properties include the transpose of \verb'A', the row degrees, column degrees, etc.
They can be computed from the primary components, but doing so repeatedly for each algorithm
utilizing \verb'A' would be wasteful. Having them live inside the Graph object simplifies
algorithm call signatures. Utility functions exist to compute each cached property.  For example,
Line~\ref{line:ComputeTranspose} of Listing~\ref{Lst.graph} will compute the transpose of \verb'G->A' and store it as \verb'G->AT'.
Following this call, any algorithm which is given \verb'G' will have access to both \verb'A' and its transpose.

Because the Graph object is not opaque, any piece of code may set the transpose as well. For instance, if an algorithm
computes the transpose as part of its normal logic, it could directly set \verb'G->AT'.
The expectation is that the Graph object will always remain consistent.
If \verb'G->A' is modified, all cached properties must be either be set as unknown or modified to reflect the change.
Properties which are not known are set to \verb'NULL' or \verb'LAGRAPH_BOOLEAN_UNKNOWN' in the case of boolean properties.
This expectation is a convention that all LAGraph algorithm implementers are expected to follow.


\subsection{User modes}

Algorithms in LAGraph target two user modes: Basic and Advanced. The Basic user mode is for those who want
things to ``just work'', are less concerned about performance, and may be less experienced with graph libraries.
The Advanced user mode is for those whose primary concern is performance and are willing to conform to stricter
requirements to achieve that goal.

Algorithms targeting the Basic mode typically have limited options. Often, there will only be one function for
a given algorithm. Under the hood, that single algorithm might take different paths depending on the shape or
size of the input graph. The idea is that a basic user wants to compute PageRank or Betweenness Centrality,
but does not want to have to understand the five different ways to compute them. They simply want the correct answer.

Algorithms targeting the Advanced mode are often highly specialized implementations of an algorithm. The Advanced
mode user is expected to understand details such as push-pull~\cite{DBLP:conf/icpp/YangBO18} and batch mode and why different techniques are
better for each graph. Advanced mode algorithms are very strict in their input. If the input does not match the
expected kind, an error will be raised.

Advanced mode algorithms will also raise an error if a cached property is needed by an algorithm, but is not
currently available on the Graph object. While Basic mode algorithms are free to compute and cache properties
on the Graph object, Advanced mode algorithms never will. The idea is to never surprise the user with unexpected
additional computation. An Advanced mode user must opt-in to all computations.

Often, Basic mode algorithms will inspect the input, possibly compute properties or transform the data,
and finally call one of the Advanced mode algorithms to do the actual work on the graph. Having these two user
modes allows LAGraph to target a wider range of users who vary in their experience with graph algorithms.


\subsection{Algorithm calling conventions}

Algorithms in LAGraph follow a general calling convention.

\begin{cplus}
int algorithm
(
    // outputs
    TYPE *out1,
    TYPE *out2,
    ...
    // input/output
    TYPE inout,
    ...
    // inputs
    TYPE input1,
    TYPE input2,
    ...
    // error message holder
    char *msg
)
\end{cplus}

The return value is always an int with the following meaning:

\begin{itemize}
    \item \verb'=0 ->' success
    \item \verb'<0 ->' error
    \item \verb'>0 ->' warning
\end{itemize}

The meaning of a given error or warning value is algorithm-specific
and should be listed in the documentation for the algorithm.

We distinguish three types of arguments:

\begin{itemize}
    \item 
    Outputs appear first and are passed by reference. A pointer should be created by the caller, but
    memory will be allocated by the algorithm. If the output is not needed, a \verb'NULL' is passed and the
    algorithm will not return that output.

    \item Input/Output arguments are passed by value. The expectation is that the object will be modified.
    This supports features such as batch mode in which a frontier is updated and returned to the caller.
    It also supports Basic mode algorithms which may modify a Graph object by adding cached properties.

    \item Inputs are passed by value and should never be changed by the algorithm.
\end{itemize}

The final argument of any LAGraph algorithm holds the error message. This must be \verb'char[]' of
size \verb'LAGRAPH_MSG_LEN'. When the algorithm returns an error or a warning, a message may be placed in
this array as additional information. Because the caller creates this array, the caller must free
the memory or reuse it as appropriate. If the algorithm is successful, it should fill the message array
with an empty string to clear any previous message.


\subsection{Error handling}

Because every algorithm in LAGraph can return an error, the return value of every call should be
checked before proceeding. To make this less burdensome for a C-based library, LAGraph provides a
convenience macro which works similar to try/catch in other languages.

\begin{cplus}
#define LAGraph_TRY(LAGraph_method)
{
    int LAGraph_status = LAGraph_method;
    if (LAGraph_status < 0)
    {
        LAGraph_CATCH (LAGraph_status);
    }
}
\end{cplus}


\verb'LAGraph_CATCH' can be defined before an algorithm and will be called in the event of an error.
This allows for proper freeing of memory and other necessary tasks.

A similar macro, \verb'GrB_TRY', will call \verb'GrB_CATCH' when making GraphBLAS calls which return
a \verb'GrB_Info' value other than \verb'GrB_SUCCESS' or \verb'GrB_NO_VALUE'.

\verb'LAGraph_TRY' and \verb'GrB_TRY' provide an easy to use and easy to read method for dealing with
error checking while writing graph algorithms.


\subsection{Contributing algorithms}

The LAGraph project welcomes contributions from graph practitioners who understand the GraphBLAS vision
of using the language of linear algebra to express graph computations. However, as a matter of
practical concern, many users want a stable experience when using LAGraph for doing real work. To balance
these, the LAGraph repository will have both a stable and an experimental folder.

New algorithms or modifications of existing algorithms will first be added to the experimental folder.
The release schedule of experimental algorithms will generally be much faster than the stable release,
and there is no expectation of a bug-free experience.
The goal is to generate lots of ideas and allow uninhibited contributions to
push the boundary of what is possible with the GraphBLAS. The stable release will be fully tested and
will move much slower, targeting the needs of those who want to use LAGraph as a complete, production-grade
library rather than as a research project.