-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi there,
I have a large size of social media dataset (~44GB after preprocessed into sqlite db). When I ran the package in my terminal, I constantly encountered the error message "sqlite3.OperationalError: database or disk is full". I assume it's because there are large size of temporary files generated in the background, which takes all of my available RAMs. Any thoughts to solve this issue? I'm using a vm with 128GB RAM, with sufficient storage size in the disk.
Here's the complete error message I had:
"
compute_networks weibocov2_20230603_file.db compute co_retweet --time_window 60
Calculating a co_retweet network on weibocov2_20230603_file.db with the following settings:
time_window: 60 seconds
min_edge_weight: 2 co-occurring messages
n_cpus: 32 processors
output_file: None
Ensure the indexes exist to drive the join.
Calculating the co-retweet network
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 748, in _run_query
db.execute(
sqlite3.OperationalError: database or disk is full
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ubuntu/.local/bin/compute_networks", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/main.py", line 281, in main
compute_co_retweet_parallel(
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 703, in compute_co_retweet_parallel
return parallise_query_by_user_id(
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 147, in parallise_query_by_user_id
d.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
sqlite3.OperationalError: database or disk is full
"
Here's the output of RAM usage of the vm when having the above error:
"
free -h
total used free shared buff/cache available
Mem: 125Gi 1.7Gi 11Gi 0.0Ki 112Gi 122Gi
Swap: 92Mi 5.0Mi 87Mi
"
Thanks.
Dan (HDR from DMRC)