Open
Description
Hi Roman I hope you are doing well,
You have given a example in DatatSAIL web site how to prepare the target for split as bellow:
Preparation of Targets:
you copy all pdb files into one folder. This is a requirement of FoldSeek, the internally used algorithm to cluster PDB data.
os.makedirs("pdbs", exist_ok=True)
for name, filename in df[["ids", "Target"]].values.tolist():
shutil.copyfile(filename, f"pdbs/{name}.pdb")
I am trying to do the same when the target are protein seqences:
here is my data structure:
and use this code to save each sequnce in a fasta file with ids
as ID
sequences_dir = os.path.abspath(os.path.join(CURRENT_DIR, ".." ,"data", "enzyme_substrate_data","sequences"))
os.makedirs(sequences_dir, exist_ok=True)
for name, sequence in final_df_UID_MID[["ids", "Sequence"]].values.tolist():
filename = os.path.join(sequences_dir, f"{name}.fasta")
with open(filename, 'w') as f:
f.write(f">{name}\n")
f.write(sequence + "\n")
Best,
Vahid