FASTA input for split in two dimensions and How to save the results of split in 2D

Hi Roman I hope you are doing well,
You have given a example in DatatSAIL web site how to prepare the target for split as bellow:
Preparation of Targets:
you copy all pdb files into one folder. This is a requirement of FoldSeek, the internally used algorithm to cluster PDB data.
```python
os.makedirs("pdbs", exist_ok=True)
for name, filename in df[["ids", "Target"]].values.tolist():
    shutil.copyfile(filename, f"pdbs/{name}.pdb")
```
I am trying to do the same when the target are protein seqences:
here is my data structure:
<img width="914" alt="Screenshot 2024-01-26 at 11 58 53" src="https://github.com/kalininalab/DataSAIL/assets/103953780/290e6960-dd37-4863-b47c-238494d644c8">

and use this code to save each sequnce in a fasta file with `ids` as ID

```python
sequences_dir = os.path.abspath(os.path.join(CURRENT_DIR, ".." ,"data", "enzyme_substrate_data","sequences"))
os.makedirs(sequences_dir, exist_ok=True)

for name, sequence in final_df_UID_MID[["ids", "Sequence"]].values.tolist():
    filename = os.path.join(sequences_dir, f"{name}.fasta")
    with open(filename, 'w') as f:
        f.write(f">{name}\n")
        f.write(sequence + "\n")
```

Best,
Vahid



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FASTA input for split in two dimensions and How to save the results of split in 2D #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FASTA input for split in two dimensions and How to save the results of split in 2D #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions