-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Running the ripe importer uses quite a bit of memory (~8GB as of today).
Can this be reduced?
Analysis
Downloading the ripe files for 2021-02-12 gives us the raw size we want to process:
2021-02-12> gzip -l *
gzip: delegated-ripencc-latest: not in gzip format
compressed uncompressed ratio uncompressed_name
8206950 78779140 89.6% ripe.db.aut-num
25903102 507574226 94.9% ripe.db.inet6num
242928983 3628042984 93.3% ripe.db.inetnum
5843624 95550239 93.9% ripe.db.organisation
4413327 77870566 94.3% ripe.db.role
287295986 4387817155 93.5% (totals)
so 4.3 GB of uncompressed data uncompressed data.
Using https://pypi.org/project/memory-profiler/
# Debian Buster
apt-get install python3-memory-profiler python3-matplotlibDecorating a few functions, where the memory consumption is:
--- a/intelmq_certbund_contact/ripe/ripe_data.py
+++ b/intelmq_certbund_contact/ripe/ripe_data.py
@@ -78,2 +78,3 @@ def add_common_args(parser):
+@profile
def load_ripe_files(options) -> tuple:
@@ -205,2 +206,3 @@ def read_asn_whitelist(filename, verbose=False):
+@profile
def parse_file(filename, fields, index_field=None, restriction=lambda x: True,
@@ -298,2 +300,3 @@ def parse_file(filename, fields, index_field=None, restriction=lambda x: True,
+@profile
def build_index(obj_list, index_attribute):
@@ -441,2 +444,3 @@ def split_for_known_orgs(obj_list, organisation_index):
+@profile
def sanitize_split_and_modify(obj_list, index, whitelist,
@@ -501,2 +505,3 @@ def sanitize_split_and_modify(obj_list, index, whitelist,
+@profile
def convert_inetnum_to_networks(inetnum_list):
@@ -510,2 +515,3 @@ def convert_inetnum_to_networks(inetnum_list):
+@profile
def convert_inet6num_to_networks(inet6num_list):
@@ -517,2 +523,3 @@ def convert_inet6num_to_networks(inet6num_list):
+@profile
def process_inetnum_contacts(key, inet_list, inet_list_u, restrict_country):We can get a plot, trying to import with a country restriction of NO:
env PYTHONPATH=/home/bern/dev/certbund-contact-git: python3-mprof run /home/bern/dev/certbund-contact-git/intelmq_certbund_contact/ripe/ripe_import.py -v --restrict-to-country NO --conninfo 'host=localhost port=5432 dbname=contactdb'
python3-mprof plot -t "ripe_importer memory profile 2021-12-02"
Here is the data file for interactive browsing (rename to remove the .txt suffix):
mprofile_20210212110015.dat.txt
Metadata
Metadata
Assignees
Labels
No labels
