Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems on multipleFieldSelection.py #163

Open
Dongmeng-wang opened this issue May 30, 2023 · 0 comments
Open

Problems on multipleFieldSelection.py #163

Dongmeng-wang opened this issue May 30, 2023 · 0 comments

Comments

@Dongmeng-wang
Copy link

Hi, thank you so much for sharing this helpful script for merging expression files of different samples. However, I encountered some problems when using it.

Is it possible to use more than one common field as the identifier? For instance, in my case, I have the counts of read mapping to different junctions for each sample. The columns are chrom, start, end and counts. I'd like to merge the files of all samples together, which requires the first three columns as the identifier. Is it possible to make it with this script?

Furthermore, I always get the error below when merging files with different identifiers. For example, different samples have different junctions, and I would like to keep all the junctions and set 0 to samples without the junctions.

INFO: Writing output to merge.1.txt
Traceback (most recent call last):
File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 125, in
f.write("\t".join(line) + "\n")
TypeError: sequence item 2: expected str instance, int found
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/scratch/prj/dtr/MultiMuTHER/scripts/txrevise/multipleFieldSelection.py", line 130, in
print("ERROR: %s" % err)
NameError: name 'err' is not defined

I have attached some files here for testing.
TWPID9206_20170313.txt
TWPID9206_20110217.txt
TWPID9206_20140812.txt

By the way, I have tried to use csvtk join command and merge() in R, but they all take too much time to deal with ~1000 samples. I would really appreciate it if this script could fix it with a shorter time. Or do you recommand any other tools to deal with this problem? Thank you so much.

All the best,
Meng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant