Skip to content

Rework Core ArcTable towards better memory efficiency and performance #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 30 commits into from
Jul 21, 2025

Conversation

HLWeil
Copy link
Member

@HLWeil HLWeil commented Jul 9, 2025

These changes were necessary for much better memory efficiency, allowing Swate to depict and edit much bigger tables.

In principle this was done in two parts:

  • The ArcTable now contains a "ValueMap", which maps from the cell Hash to the actual CompositeCell. Were previously the CompositeCell was stored we now store this Hash, reducing memory usage for Cells appearing repreadetly in the same table
  • Instead of a 2D canvas, we now store the Table as columns, where a column may also be represented by a constant value. For parameters, it is very often the case that we have a column with only one distinct value. In this case we just need to store it once.

Calling functions were adjusted accordingly

@HLWeil HLWeil requested a review from Freymaurer July 9, 2025 14:13
@@ -4,7 +4,7 @@ open ARCtrl.Helper
open Fable.Core

[<AttachMembers>]
type OntologyAnnotation(?name,?tsr,?tan, ?comments) =
type OntologyAnnotation(?name,?tsr,?tan, ?tanInfo, ?comments) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhh thats a new one! What does it mean?

@@ -19,17 +19,30 @@ type OntologyAnnotation(?name,?tsr,?tan, ?comments) =
| tan -> tan
let mutable _comments : ResizeArray<Comment> = defaultArg comments <| ResizeArray()

let mutable _tanInfo : {| IDSpace : string; LocalID : string |} option =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh i see what it is. But isn't this bad practise? Say we add a tsr and tan then tanInfo gets auto completed?

Why do we need this inside the constructor instead of a GET only member?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not my preferred solution either, but the TanInfo is relatively expensive to compute but quite important as a basis for the HashCode.
By allowing this info to be set from the outside we can omit multiple computations of this value in some cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then there must be clear indication what takes priority over which and how it behaves. say i use the following:

OntologyAnnotation(name = "test",tsr = "TEST",tan = "00001", tanInfo = {|IDSpace = "TST"; LocalID = "22222"|})

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By allowing this info to be set from the outside we can omit multiple computations of this value in some cases.

And can you give me an example for that, because currently i do not really see it

/// column * row index
with get() = _headers
and set(newHeaders) =
// SanityChecks.validate newHeaders values true |> ignore // TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this meant to stay "TODO"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, fixing

@@ -372,42 +391,49 @@ type ArcTable(name: string, headers: ResizeArray<CompositeHeader>, values: Syste

// - Column API - //
// GetColumnAt?
member this.GetColumn(columnIndex:int) : CompositeColumn =
member this.GetColumn(columnIndex:int, ?fillDefault : bool) : CompositeColumn =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fillDefault behaves different to my expectations. Did you discuss this with someone else in the department and decided that we fail if a cell does not exist inside row/column count? I thought we only fail for trying to access outside these params?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, with the fillDefault flag every user can decide. I could add some comment to make it more clear. Or would you wish to reverse the default behavious, and have a failIfMissing flag instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would completly remove this flag. Just return the filled columns, then check if the array contains empty cells if you want.

In my opinion it makes using the function more obscure.

Do you have a valid usecase for this?

And one more thing. F# 9 does increase f# support for null. This could allow us to fill missing cells with null, making its behavior closer to js and python while also increase computation time

| Some c -> c
|]
let cells =
[|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be optimized by first checking if columnIndex points to a Constant columnRef, then we can just fill with the same value

let index = defaultArg index this.RowCount
let cells =
let cells =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this behaves different then my expecatation. Did we not decide on a sparse approach?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will wait for your answers before continuing the review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that, this should just increase the rowCount of the "Values"

@@ -19,17 +19,30 @@ type OntologyAnnotation(?name,?tsr,?tan, ?comments) =
| tan -> tan
let mutable _comments : ResizeArray<Comment> = defaultArg comments <| ResizeArray()

let mutable _tanInfo : {| IDSpace : string; LocalID : string |} option =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then there must be clear indication what takes priority over which and how it behaves. say i use the following:

OntologyAnnotation(name = "test",tsr = "TEST",tan = "00001", tanInfo = {|IDSpace = "TST"; LocalID = "22222"|})

@@ -19,17 +19,30 @@ type OntologyAnnotation(?name,?tsr,?tan, ?comments) =
| tan -> tan
let mutable _comments : ResizeArray<Comment> = defaultArg comments <| ResizeArray()

let mutable _tanInfo : {| IDSpace : string; LocalID : string |} option =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By allowing this info to be set from the outside we can omit multiple computations of this value in some cases.

And can you give me an example for that, because currently i do not really see it

/// column * row index
with get() = _headers
and set(newHeaders) =
SanityChecks.validateArcTableValues newHeaders _values true |> ignore // TODO
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the //TODO

member this.RowCount
with get() = _values.RowCount
and set (newRowCount) =
_values.RowCount <- newRowCount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we set a rowCount we must delete all cells below that rowcount?

Say we have a cell at row 20 and we set row <- 10, then we must delete cell at row 20?

and set (newRowCount) =
_values.RowCount <- newRowCount

static member getRowCount (table:ArcTable) = table.RowCount
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

member this.ColumnCount and static member columnCount (table:ArcTable), then why do we have member this.RowCount and static member getRowCount (table:ArcTable)

| None -> CompositeCell.emptyTerm
| Some (CompositeCell.Unitized _) -> CompositeCell.emptyUnitized
| _ -> failwith "[extendBodyCells] This should never happen, IsTermColumn header must be paired with either term or unitized cell."
let removeRowCells_withIndexChange (rowIndex:int) (values:ArcTableValues) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to have a function like this for multiple, this would increase performance for "RemoveRows" by quite a bit

/// Therefore we need to start with the last column to not overwrite any values we still need to shift
let increaseColumnIndices() =
/// Get last column index
let lastColumnIndex = System.Math.Max(startColCount - 1, 0) // If there are no columns. We get negative last column index. In this case just return 0.
// start with last column index and go down to `index`
for columnIndex = lastColumnIndex downto index do
for rowIndex in 0 .. startRowCount do
moveCellTo(columnIndex,rowIndex,columnIndex+numberOfNewColumns,rowIndex) values
moveColumnCellsTo (columnIndex) (columnIndex + numberOfNewColumns) values
/// Then we can set the new column at `index`
let setNewCells() =
// Not sure if this is intended? If we for example `forceReplace` a single column table with `Input`and 5 rows with a new column of `Input` ..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this comment is outdated to the previous model

setCellAt (columnIndex,rowIndex,lastValue) values

let addRow (index:int) (newCells:CompositeCell []) (headers: ResizeArray<CompositeHeader>) (values:Dictionary<int*int,CompositeCell>) =
let fillMissingCells (headers: ResizeArray<CompositeHeader>) (values:ArcTableValues) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be deleted? Where do we use this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kept the function, it will fill in default cells like before. Might be interesting in cases where we expect to make many random access calls and want to evaluate default values beforehand. Not used by any of our own functions.

|> Seq.toArray
|> Array.collect ArcTable.SplitByProtocolREF
|> Array.map (fun t ->
|> ResizeArray
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rmv? ResizeArray.collect should work on a seq and returns an resizearray?



let dictionary (keyDecoder : Decoder<'key>) (valueDecoder : Decoder<'value>) : Decoder<Dictionary<'key,'value>> =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea what happens here, trust!

@HLWeil HLWeil merged commit 3dcc574 into main Jul 21, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants