small and straightforward representation of how a kademlia-based dht could be integrated into ethereum, particularly at das-research.
the current work on this simulation of a kademlia-based dht network offers:
-
DHTNewtork
object that can: spawndhtclients
, serve as main source to initialize therouting table
of thedhtclients
, resolveconnections
betweendhtclients
, handle and keep track of the interactions betweendhtclients
. the parameters to configure the adhtnetwork
are:networkid
: in case we want to simulate different network at the same timeerrorrate
: to define the number connections that will fall into an error (we can understand it as the likelines o f an error in %)delayrage
: range between the slowest possible delay and the biggest one. a random delay will be selected every time a connection is stablished between 2 nodes (if no error is raised)
the network offers the following functions:
parallel_clilist_initializer
init_with_random_peers
initializes a network using a "blazingly fast" method, which can be optimized even more if a number of threads/processes is definedadd_new_node
adds a new node to the localNetwork
connect_to_node
returns theConnection
obj between nodeA
andB
bootstrap_node
return the "best" nodes/dhtclis to compose the routing table for the given nodesummary
return the summary of the current status of the network (number of nodes, successful connections, failed ones, etc), will evolve over time
-
Connection
interface that limits how twodhtclients
interact with each other (like if it was a closed api/protocol). it offers the possibility to clienta
(client starting the connection) to ask clientb
(remote client), applying if specified the delay at the moment of stablishing the connection and per each interaction:-
get_closest_nodes_to(hash)
will return the k closest nodes to the givenhash
that clientb
has in it's routing table. note: it will also return the value if it's stored locally :) -
store_segment(segment)
will add thehash
and the segment as a key-value pair inb
's local storage -
retrieve_segment(hash)
will askb
to return the segment of the givenhash
if it has it
-
-
DHTClient
as representation of a node in the simulated dht network. thedhtclient
can be created using the following parameters:nodeid
: id of the node that hosts thedhtclient
network
: referece to the network obj that where thedhtclient
participates inkbucketsize
: k value, number of nodes per kbucketa
: number of concurrent node connections the client does while looking for a given keyb
: target of nodes (number of nodes) returned when asking for ahash
steptostop
: number of iterations without finding anyone closer to stop thelookup
operation
the client serves a list of endpoints such as:
bootstrap
uses the network reference to find the right peers for the routing tablelookup_for_hash
will try to look the value of thehash
in the network, and the closest nodes to itget_closest_nodes_to
will return the closest nodes to ahash
from the local routing tableprovide_block_segment
will lookup for the closest nodes in the network, and store the segment on themstore_segment
will store locally a segment value using itshash
as keyretrieve_segment
will return the value of ahash
if its locally, exception raised otherwise
-
RoutingTable
andKBucket
classes to store locally the local representation of the network for a given node -
Hash
andBitArray
classes to represent anodeid
/blocksegment
/generalobject
The source code runs mostly on plain Python libraries. However, to speed up the performance, the plain arrays
and dicts
were updated to classes from collections
. Thus, I recomend to have an specific virtual environment to use the module.
To install the dependencies, do:
# python -m venv venv
# or
# python -m virtualenv venv
# source venv/bin/activate
(venv)$ pip install -r requirements.txt
The repo has a list of tests to ensure that no functionality is broken whenever a new feature is added. All the tests are
triggered whenever GitHub records a Push
or a PullRequest
. However, there are locally runable using the ./launch_test.sh
script
# the script will try to activate any `venv` located at the root of the directory
py-dht$ bash launch_tests.sh
Running benchmarks is a bit trickier. First install the py-dht
module as editable
in the venv
py-dht$ pip install -e ./
after that, feel free to change the parameters in the benchmarks/launch_benchmarks.sh
and run it like if it was a test:
# the script will try to activate any `venv` located at the root of the directory
py-dht$ cd benchmarks
py-dht/benchmarks$ bash launch_benchmarks.sh
From the experience of running tests and benchmarks on the repo, I can say that the optimizations on #8 and #9 were more than necesary.
Recomendations:
- to simulate a network -> use the
network.init_with_random_peers()
function setting theprocesses
parameters the initialization of the network is by far the process that takes the longer, as it has to compute the best routing tables for each of the spawned nodes. So, please benefit from the concurrency to reduce the duration (check this test as example) - At the moment using 20 cores of a
ryzen 5900x
I'm able to initialize a network of10k dhtclients
withk=20
in28 secs
Latest numbers of the network becnhmark
# 03.08.2023
py-dht$ python benchmarks/network.py -t opt-conc -o ./csvs -i 1 -k 20 -n 10000 --threads 20
-- benchmark: opt-conc_network_n_initialization --
rounds : 1
failed rounds : 0
prep time (s) : 1.5974044799804688e-05
avg (s) : 0.09420228004455566
median (s) : 0.09420228004455566
p90_duration (s): 0.09420228004455566
-- benchmark: opt-conc_network_bootstrap_node --
rounds : 1
failed rounds : 0
prep time (s) : 0.11602234840393066
avg (s) : 0.6563236713409424
median (s) : 0.6563236713409424
p90_duration (s): 0.6563236713409424
-- benchmark: opt-conc_network_fast_bootstrap_network --
rounds : 1
failed rounds : 0
prep time (s) : 8.344650268554688e-06
avg (s) : 174.6814157962799
median (s) : 174.6814157962799
p90_duration (s): 174.6814157962799
-- benchmark: opt-conc_network_fast_threaded_bootstrap_network --
rounds : 1
failed rounds : 0
prep time (s) : 0.0008330345153808594
avg (s) : 28.239949226379395
median (s) : 28.239949226379395
p90_duration (s): 28.239949226379395
At the moment (after applying the code optimizations in #8 and the conncurrency #9) the numbers look like this:
Number of nodes | Default implementation | 1st Optimization (monothread) | 2nd Optimization (8 processes) |
---|---|---|---|
500 | 10 | 2.096 | 0.489 |
1000 | 39 | 5.39 | 1.02 |
1200 | 56 | 7.22 | 1.48 |
2000 | 157.36 | 14.49 | 2.83 |
5000 | 995.73 | 60.01 | 11.15 |
10000 | (beyond 2 hours) | 199.97 | 39.15 |
NOTE: all the measurements displayed in the table are expresed in seconds |
Mikel Cortes-Goicoechea - @cortze
feel free to dive in! change proposals, issues, and prs will be more than welcome.
- the work has been supported by codex
- feel free to support this project through buy me a coffee; help me getting the ship of caffeine that I need 😊.
mit, see license file