Skip to content

Conversation

@Smeds
Copy link
Contributor

@Smeds Smeds commented Oct 23, 2025

this pull-request will:

  1. Introduce a pyproject.toml needed due to deprecation of setup.py 2025-10-31
  2. Keep a small setup.py file to handle compilation of c libraries

Should solve issues: #14, #10

this pull-request will:
1. Introduce a pyproject.toml needed due to deprecation of setup.py 2025-10-31
2. Keep a small setup.py file to handle compilation of c libraries
@Smeds
Copy link
Contributor Author

Smeds commented Oct 23, 2025

@blex-max new pull-request with using pyproject.toml

@Smeds
Copy link
Contributor Author

Smeds commented Oct 28, 2025

@blex-max what do you think about this new setup?

@blex-max
Copy link
Contributor

blex-max commented Oct 28, 2025

Hello, apologies, we're really busy at the moment, but I appreciate that you want to get moving on this. Which systems have you tested this approach on? I can confirm this builds and compiles on my arm64 mac, but I haven't tested usage at all. Though in principle the changes look fairly trivial and safe

@blex-max
Copy link
Contributor

N.B. If we merge this I think in principle this constitutes a major version bump as old install methods will no longer work

@Smeds
Copy link
Contributor Author

Smeds commented Oct 28, 2025

Hello, apologies, we're really busy at the moment, but I appreciate that you want to get moving on this. Which systems have you tested this approach on? I can confirm this builds and compiles on my arm64 mac, but I haven't tested usage at all. Though in principle the changes look fairly trivial and safe

I have only tested it on my system where it builds:

$ python3 --version
Python 3.12.11 | packaged by conda-forge | (main, Jun  4 2025, 14:45:31) [GCC 13.3.0] on linux
$ pip --version
pip 25.0.1
$ arch
x86_64
$ uname -a
Linux E1-056347.science.psu.edu 6.14.0-33-generic #33~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Sep 19 17:02:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

@blex-max
Copy link
Contributor

Are you able to test that this build runs as you expect on data? I expect that it will, but since there is no CI or tests available in this repo it's likely worth checking

@Smeds
Copy link
Contributor Author

Smeds commented Oct 30, 2025

I have tested parabam with data now, using the parabam subset command. That works for me

parabam subset --input ~/Downloads/test.bam --rule get_mapped_reads.py 

I also tried to run telomerecat, where parabam is used. That also works

telomerecat bam2length test.bam
telomerecat bam2telbam test.bam

What I haven't tested is parabam stats, it needs a get_blueprint function and I don't really know what it should do (haven't been able to find any documentation about it).

@blex-max

@Smeds Smeds force-pushed the convert-setup-to-pyproject.toml branch from c3554d5 to 58562da Compare October 30, 2025 15:53
@Smeds
Copy link
Contributor Author

Smeds commented Oct 30, 2025

@blex-max I added a gitaction workflow to test parabam subset with a few different versions of python. Works on my fork (https://github.com/Smeds/parabam/actions/runs/18946923615), I think you may need to approve gitactions before this test is run in this repo.

Don't have a test for the parabam stat command sine I don't know how to setup the get_blueprint function

…ADERS]] [-v {0,1,2}]

            [--temp_dir DIR] --rule RULE --input INPUT [INPUT ...] [--debug]
            [--pair] [--coord] [-r [REGION]] [-d] [--output [OUTPUT]]

optional arguments:
  -h, --help            show this help message and exit
  -p [P]                The maximum amount of processes you wish
                        parabam stat to use. This should be less
                        than or equal to the amount of processor
                        cores in your machine [Default: 4].
  -s [S]                The amount of reads considered by each
                        distributed task. [Default: 250000]
  -f [READERS]          The amount of open connections to the file being read.
                        Conventional hard drives perform best with
                        the default of 1. [Default: 1]
  -v {0,1,2}            The amount of information output by the program:
                        	0: No output [Default]
                        	1: Total Reads Processed
                        	2: Detailed output
  --temp_dir DIR        Path for parabam stat to use for intermediate files [Default: /tmp].
  --rule RULE, -i RULE  The file containing the rule, written in python,
                        that we wish to apply to the input BAM.
  --input INPUT [INPUT ...], -b INPUT [INPUT ...]
                        The file(s) we wish to operate on.
                         Multiple entries should be separated by a single space
  --debug               Only the first 5million reads will be processed
  --pair                A pair processor is used instead of a conventional processor
  --coord               Engines recieve a list of reads which all map to the same starting position.
                         This mode is not compatible with the `--pair` option
  -r [REGION], --region [REGION]
                        The process will be run only on reads from
                        this region. Regions should be colon separated as
                        specified by samtools (eg 'chr1:1000,5000')
  -d                    parabam will not process reads marked duplicate.
  --output [OUTPUT], -o [OUTPUT]
                        Specify a name for the output CSV file. If this argument is
                        not supplied, the output will take the following form:
                        parabam_stat_[UNIX_TIME].csv

[Status] parabam is quitting gracefully
@Smeds
Copy link
Contributor Author

Smeds commented Oct 31, 2025

@blex-max I have added tests for parabam stat that make sure that the command can be run and compare the result with a previous run.

@Smeds
Copy link
Contributor Author

Smeds commented Nov 3, 2025

@blex-max Ran the parabam stat command successfully on my laptop!

@Smeds
Copy link
Contributor Author

Smeds commented Nov 5, 2025

@blex-max should we bump up the version of the software in this pull-request or in a separate one?

@Smeds
Copy link
Contributor Author

Smeds commented Nov 10, 2025

@blex-max would it be possible to take one more look at this pull-request soon? It now has gitactions to test run the tool

@Smeds
Copy link
Contributor Author

Smeds commented Nov 13, 2025

@blex-max bump

@Smeds
Copy link
Contributor Author

Smeds commented Nov 17, 2025

It would be super nice if we could have this pull-request handled soon!

@Smeds
Copy link
Contributor Author

Smeds commented Nov 18, 2025

@blex-max when the pull-request is handled it would be awesome to soon have a new release!

@blex-max
Copy link
Contributor

Hello, I can see this is important to you but the reality is that parabam has been explicitly and intentionally unsupported for some time. Were these entirely minor changes perhaps a quick merge would be in order, but at this point they're quite extensive and we need to properly review them before we commit to a new release of software that we're responsible for (as we should). It is on our TODO list and I haven't forgotten but I and the rest of the team are extremely busy at the moment, and, since parabam is not supported, it is quite low on the priority list. This is outside of my control. I'll make sure to update you when I can, but right now you will have to be patient for a little while longer.

@Smeds
Copy link
Contributor Author

Smeds commented Nov 19, 2025

Good to know! Does that mean that https://github.com/cancerit/telomerecat is also unsupported since it depends on parabam. My plan was to incorporate telomerecat this into the galaxyproject, I got a request from a user to do it (that currently is using telomerecat). My urgency is more related to that I will be ending my position here at the end of the year and after that I will probably not be doing much more work on this issue.

@blex-max
Copy link
Contributor

That is a good question that I can't immediately answer, but I will raise internally. I'm not aware that telomerecat is unsupported, certainly that was never indicated on the repo (unlike parabam), but it hasn't been looked at in a while. Parabam being unsupported doesn't necessarily imply that telomerecat is unsupported, but clearly we need to have a think about that. @AndyMenzies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants