Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: You are trying to merge on object and float64 columns. #22

Open
rpetit3 opened this issue Jun 30, 2022 · 3 comments
Open

ValueError: You are trying to merge on object and float64 columns. #22

rpetit3 opened this issue Jun 30, 2022 · 3 comments

Comments

@rpetit3
Copy link

rpetit3 commented Jun 30, 2022

I'm trying to run PlasmidID via the Bioconda release, and Am running into an issue with Pandas. Might be user error though!

CREATING SUMMARY REPORT (Thu Jun 30 01:24:20 UTC 2022)
 An html report with miniatures of the images will be generate with useful statistics to determine the correct plasmids in the sample.
Namespace(group=False, input_folder='/home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634')
Creating summary
You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
Traceback (most recent call last):
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 465, in <module>
    main()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 457, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/bin/summary_report_pid.py", line 116, in complete_report_df
    df = len_description_df.merge(covered_df, on='id', how='left')
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 9203, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 119, in merge
    validate=validate,
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 703, in __init__
    self._maybe_coerce_merge_keys()
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/reshape/merge.py", line 1256, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat

---------------------------------------

ERROR in Script plasmidID on or near line 1089; exiting with status 1
MESSAGE:

See /home/robert_petit/temp/test/plasmid/logs/plasmidID.log for more information.
command:
summary_report_pid.py -i /home/robert_petit/temp/test/plasmid/NO_GROUP/SRX4563634 -g

---------------------------------------

Command Used

plasmidID -d plasmidFinder_01_26_2018.fsa -s SRX4563634 -c SRX4563634.fna -T 4

Here are the files used (added .txt so GitHub would allow upload)
plasmidFinder_01_26_2018.fsa.txt
SRX4563634.fna.txt

Update 1.

Doing some digging, covered_df might the issue. It looks like this:

print(covered_df)
            id  len_covered
0  500039.4128         2363

print(covered_df.dtypes)
id             float64
len_covered      int64
dtype: object

Going to play around with this some more

Update 2

Converted the ID to a string and now have this

Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
Traceback (most recent call last):
  File "./summary_report_pid.py", line 470, in <module>
    main()
  File "./summary_report_pid.py", line 462, in main
    summary_df = complete_report_df(complete_file, len_description_df, percentage_df)
  File "./summary_report_pid.py", line 126, in complete_report_df
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3602, in __setitem__
    self._set_item_frame_value(key, value)
  File "/home/robert_petit/miniconda3/envs/test-plasmidid/lib/python3.7/site-packages/pandas/core/frame.py", line 3729, in _set_item_frame_value
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

Update 3

Looks like the dataframe is empty

print(df)
Empty DataFrame
Columns: [id, length, species, description, fraction_covered, contig_name]
Index: []

    .... Code is below ... from complete_report_df()
    del df['len_covered']
    df = df.merge(contigs_df, on='id', how='left')
    df = df.dropna()
    print(df)
    df['contig_name'] = df.apply(lambda x: set_to_list(x), axis=1)

Not sure if it matters but the percentage_file (e.g. *.coverage_adapted_clustered_percentage) does not exist

@saramonzon
Copy link
Member

saramonzon commented Jul 5, 2022

Hi @rpetit3 ! Thanks for getting in touch!
I will take a look! It surely matters that percentage file is empty!
i think the problem is the input files, you are using as input PlasmidFinder database, that is only rep and INC genes for detecting if there is a plasmid.
PlasmidID needs as a database a set of COMPLETE plasmids in fasta format, because it tries to reconstruct the whole plasmids in the sample not only detect if there is one.
In order to build the database you can create a custom one with sequences of your choice, or you can follow this steps:
https://github.com/BU-ISCIII/plasmidID/wiki/Plasmid-Database

@saramonzon
Copy link
Member

In any case I'm aware that the summary plasmid script may be a little bit buggy, so if you get any error, please tell me and I can check what is going on :)

@rpetit3
Copy link
Author

rpetit3 commented Jul 5, 2022

Awesome! thank you for the update. I'll take a look at this and report back soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants