Skip to content

disk full error #57

@dandaii

Description

@dandaii

Hi there,

I have a large size of social media dataset (~44GB after preprocessed into sqlite db). When I ran the package in my terminal, I constantly encountered the error message "sqlite3.OperationalError: database or disk is full". I assume it's because there are large size of temporary files generated in the background, which takes all of my available RAMs. Any thoughts to solve this issue? I'm using a vm with 128GB RAM, with sufficient storage size in the disk.

Here's the complete error message I had:
"
compute_networks weibocov2_20230603_file.db compute co_retweet --time_window 60
Calculating a co_retweet network on weibocov2_20230603_file.db with the following settings:
time_window: 60 seconds
min_edge_weight: 2 co-occurring messages
n_cpus: 32 processors
output_file: None
Ensure the indexes exist to drive the join.
Calculating the co-retweet network
concurrent.futures.process._RemoteTraceback:
"""
Traceback (most recent call last):
File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 748, in _run_query
db.execute(
sqlite3.OperationalError: database or disk is full
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/ubuntu/.local/bin/compute_networks", line 8, in
sys.exit(main())
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/main.py", line 281, in main
compute_co_retweet_parallel(
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 703, in compute_co_retweet_parallel
return parallise_query_by_user_id(
File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 147, in parallise_query_by_user_id
d.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
sqlite3.OperationalError: database or disk is full
"

Here's the output of RAM usage of the vm when having the above error:
"
free -h
total used free shared buff/cache available
Mem: 125Gi 1.7Gi 11Gi 0.0Ki 112Gi 122Gi
Swap: 92Mi 5.0Mi 87Mi
"

Thanks.
Dan (HDR from DMRC)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions