Skip to content

Conversation

SeabertYuan
Copy link

Overview

Adds the following to CooMatrix in the nalgebra-sparse crate:

  • filter
  • remove_row
  • remove_column
  • remove_row_column

Details

Since the regular DMatrix supports the following operations:

  • remove_row
  • remove_column

I tried to maintain a consistent API and added these functionalities to the CooMatrix in the nalgebra-sparse crate. Since Vec deletions are costly, this implementation allocates a new CooMatrix based on the old one. There are two trade-offs with this:

  1. Copy must be implemented for T since we need to copy the values from the old matrix to the new one
  2. removing rows and columns is a potentially costly operation both on memory and on time

To offset (2) somewhat, I introduced a remove_row_column function when both a row and a column need to be removed from a sparse matrix to avoid an intermediate allocation.

Differences from filter

Additionally since filter exists in CscMatrix and CsrMatrix I felt that there wasn't any reason it shouldn't be in CooMatrix. One might find it useful to "zero-out" parts of a sparse matrix since the CooMatrix has the default behaviour of summing together duplicate entries.

The filter method (implemented for CscMatrix for example) does not change the structure of the matrix similar to the new filter method for CooMatrix. The filter method is used for the removal functions which change the shape of the matrix.

Notes on testing

I'm not sure the exact testing standards, but I added some tests based on existing tests. I am currently using a forked version of nalgebra-sparse in a project and have used it as a "real-world" test which I have used to update the tests that I've written.

@Andlon
Copy link
Collaborator

Andlon commented Oct 10, 2025

Hi @SeabertYuan, I appreciate the effort on this PR, but I must also admit I'm a little skeptical if these additions carry their own weight.

Could you perhaps say something about possible use cases for these methods? Given that a COO matrix is primarily intended as a way to accumulate matrix entries before conversion to either CSR/COO, I can't really think of a use case where you wouldn't just apply these filtering/removal operations directly while building the COO matrix in the first place.

@SeabertYuan
Copy link
Author

Hi @SeabertYuan, I appreciate the effort on this PR, but I must also admit I'm a little skeptical if these additions carry their own weight.

Could you perhaps say something about possible use cases for these methods? Given that a COO matrix is primarily intended as a way to accumulate matrix entries before conversion to either CSR/COO, I can't really think of a use case where you wouldn't just apply these filtering/removal operations directly while building the COO matrix in the first place.

Absolutely, it's mostly a convenience thing. So in my particular use case, I need to read a lot of data and it's only known when reading the data which rows/columns need to be removed. For convenience, storing this data in memory as the CooMatrix while reading and then once it's determined which row/col needs to be removed, I can conveniently perform that operation afterwards.

Additionally, after the data is read, the user might choose to remove rows and columns multiple times. For example in my instance I am building a sparse matrix to run some sort of simulation result. In my case, the user can say "I want to run the simulation again with x, y, and z removed." I could reconstruct the matrix every time but these functions make supporting those use cases more convenient.

I can definitely see how I could just filter and remove before building the COO matrix but I would argue that even the original remove_row and remove_column functions in DMatrix could also be replaced with a filtering/removal before the matrix is created if that were the case. 😄

@Andlon
Copy link
Collaborator

Andlon commented Oct 12, 2025

I see how these functions are useful to you. I'm just concerned that, well, they might not be useful to anyone else, as it seems to be a very niche use case to me.

My general opinion is that COO is primarily an "in-between" format, used to keep unstructured matrix data until it can be turned into a more suitable format (CSR/CSC at present). Therefore I also don't think we want to expand the API with functions to manipulate the data directly in COO form unless there is some substantial demand for this functionality.

At the moment I'm therefore inclined to not merge this PR at this time. If there's more demand for this kind of functionality in the future, I'd be happy to revisit this decision.

@SeabertYuan
Copy link
Author

Sounds good thanks for your time and input!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants