WIP: Storm db manager #481

lfdversluis · 2016-05-25T10:35:05Z

I open this PR so people can view the progress on this work and provide feedback and or suggestions.

This will become the database manager used in Tribler/Dispersy to do IO on a separate thread to gain performance.
Small performance indicator: The current synchronous, blocking tests take 635-640 seconds to complete. Now with 18 more tests, it takes 375 seconds.

Adds the Storm DB manager (feature).
Adds 18 tests

Fixes #488 Fixes #484

Refactoring progress:

File	Uses Stormdb? (Synchronous and blocking)
`dispersy.py`	√
`community.py`	√
`multichain DB`	x
`DispersyDatabase`	√
`statistics.py`	√
`node.py`	√
`test_classification.py`	√
`test_sync.py`	√
`test_undo.py`	√
`tracker/community.py`	√
`database.py`	√

Database function	Returning deferred ?	Running on threadpool ?
`execute`	√	√
`execute_many`	√	√
`insert`	√	√
`fetchone`	√	√
`fetchall`	√	√
`executescript`	√	√
`insert_many`	√	√
`delete`	√	√
`count`	√	√

Impact changes for Tribler et al.

dispersy_auto_load was previously a setter and getter using @propery. I've left the getter untouched but it returns a deferred now. The setter however is a function now set_dispersy_auto_load which returns a deferred too. This setter was only used in a test, but probably is used more in Tribler.

whirm · 2016-05-26T12:50:44Z

StormDBManager.py

+        # Create a DeferredLock that should be used by callers to schedule their call.
+        self.db_lock = DeferredLock()
+
+        self._version = 1


On the current code, the initial database version is 0. Are you changing this for any particular reason?

in sqlitecachedb they set it to 1 in the _open_connection function if the query fails. That's why I went for 1 here.

I was looking a the dispersy db manager.

You will have to be careful to keep it consistent when refactoring I don't know if 0/1 are actually checked and used to act in Tribler and/or dispersy

If the version is below 17 in Tribler, the database is too old to be migrated apparently. So 0 or 1 will make no difference. In dispersy the database_version method (property) is used and checked for 0. if it's 0 it reads "# setup new database with current database_version".

That's what I meant, if you refactor Tribler/dispersy to use the same db manager, and one used to give special meaning to 0, 1 or whatnot. You will need to make extra sure the meaning gets changed everywhere to mean the same. Just a friendly reminder.

whirm · 2016-05-27T17:05:08Z

On Monday I wil set up an AllChannel experiment runner for dispersy too (will use latest Tribler devel with the dispersy from the PR being tested.

This way you will be able to compare it with the upstream results.

@qstokkink This will be very useful for your PRs too.

lfdversluis · 2016-05-27T21:51:11Z

Awesome, it's good to have more insurance that changes do not break current workings.

qstokkink · 2016-05-28T07:17:48Z

@whirm 🎉 Nice, this leaves me more time to do other things.

whirm · 2016-05-30T15:51:38Z

retest this please

whirm · 2016-06-06T11:46:29Z

retest this please

lfdversluis · 2016-06-07T09:04:48Z

@whirm Why did my job on jenkins get aborted? https://jenkins.tribler.org/job/dispersy/job/GH/job/PR/job/Unit_tests_linux/1009/

whirm · 2016-06-07T10:08:17Z

The Allchannel experiment failed so it aborted the whole stage.

I moved the experiment to a second phase until I have time to deploy the automatic cluster selection stuff I made.

lfdversluis · 2016-06-07T10:10:02Z

Quite the timing, I was about to comment the question if the allchannel experiment was being timedout and therefore the build. Thanks for your reply :D

lfdversluis · 2016-06-07T15:55:08Z

@whirm Why does a push trigger two PRs ? https://gyazo.com/771ba958cf9dfc517da4095d3807ce69
Here one succeeded and one failed too, so does that mean something is still off?

https://jenkins.tribler.org/job/dispersy/job/GH/job/PR/job/Unit_tests_linux/1073/
and https://jenkins.tribler.org/job/dispersy/job/GH/job/PR/job/Unit_tests_linux/1074/ were both triggered by one push: both report commit 8e1e722

whirm · 2016-06-07T16:07:26Z

Turns out both the master multijob and the Linux tester job had the GHPRB trigger enabled, so it has been running twice since the beginning.

whirm · 2016-06-07T16:08:14Z

(fixed now)

lfdversluis · 2016-06-07T16:12:49Z

Thanks man!

lfdversluis · 2016-06-10T10:10:09Z

Time to start activating the Deferred lock on one of the functions!

lfdversluis · 2016-06-10T19:15:09Z

tests/test_sequence.py

@@ -237,7 +199,7 @@ def requests(self, node_count, expected_responses, *pairs):
            other.send_identity(node)

        messages = [other.create_sequence_text("Sequence message #%d" % i, i + 10, i) for i in range(1, 11)]
-        yield other.store(messages)


can be still a yield I guess.

lfdversluis · 2016-07-21T20:30:34Z

Allchannel succeeded.

For those interested:
Very interesting observations during the last two weeks of debugging.
I have manged to find plenty of bugs thanks to the allchannel experiment. It turns out that once you create and close a connection per database transaction (any execute, insert, etc.) you are killing your performance. Currently we only flush once per minute or when we send messages that are our own. This saves A LOT of IO and if you generate a graph out of this behaviour you will see notable differences.

After putting this commit once per minute structure back, I got the following results ( 20 nodes 1k clients, i.e. the real allchannel scenario): https://jenkins.tribler.org/job/pers/job/allchannel_laurens_v2/115/artifact/output/ a (almost) perfect match with https://jenkins.tribler.org/job/Tribler/job/Experiments/job/AllChannel+Channelcommunity_devel/975/artifact/output/

Some interesting and notable differences: You see clear peaks in the read and writes of the regular one: and in my case it's more of a curve but with peaks:

rchars:
Current allchannel:

My work:

The download, upload and more are very similar however. For example the upload:

and my work:

lfdversluis · 2016-07-21T20:33:49Z

So in conclusion: batching your SQL statements and flush once in a while really gives A LOT of performance. This will be an interesting problem to tackle once IO is moved to the threadpool because there you may not be able to batch that easily, perhaps have a dedicated thread for the IO that can batch + block so the main twisted thread can continue.

Next up: gumby refactoring and Tribler.

lfdversluis · 2016-07-21T20:35:37Z

Also, it looks like my stuff is able to do (slightly) more IO than the current, while yielding the same results. Also I noticed that at the start of the experiment there is quite a bit of dropped messages, yet no dropped messages after that (like the current allchannel but more drops). @whirm thoughts?

synctext · 2016-07-21T21:49:13Z

sooooo... You reduced the amount of commits by batching stuff up?

dramatic changes in the graphs.

synctext · 2016-07-21T21:52:51Z

lot more IO in your stuff. You commit after creating each message?

lfdversluis · 2016-07-21T22:15:11Z

@synctext Yes, I am now batching them (or well SQLite is) and once every minute or when you send messages created by yourself (so your global time advances) the changes are flushed to the actual disk.

In all my previous attempts I would commit after every database change, this generated a lot more IO and was killing the performance. Basically the owner of the channel in the allchannel experiment would go flat.

devos50 · 2016-07-22T07:52:00Z

@lfdversluis good job!

Indeed, most graphs look very similar. I was expecting the spiked pattern for the wchar graph but instead, it is a line so you have IO operations ongoing all the time?

Regarding the dropped messages, they peak around 100 seconds. If you take a look at https://jenkins.tribler.org/job/pers/job/allchannel_laurens_v2/116/artifact/output//wchars.png, you see that the IO also peaks at around 100 seconds so I think you run into the same problem again: too much IO, causing dropped messages. I think this peak occurred the moment clients are starting to discover each other and start to send messages. Can you somehow investigate/reduce this peak?

synctext · 2016-07-22T09:12:33Z

try to disable all commits and see what happens.

…ibler

devos50 · 2017-03-12T21:31:08Z

Closing in favour of #530

whirm reviewed May 26, 2016
View reviewed changes

lfdversluis force-pushed the storm-db-manager branch from 98fed43 to f3fc4ec Compare May 27, 2016 10:15

lfdversluis force-pushed the storm-db-manager branch 2 times, most recently from 30ea219 to 5da399c Compare May 30, 2016 13:00

lfdversluis force-pushed the storm-db-manager branch from 3b4fcb3 to d4792c3 Compare June 7, 2016 08:44

lfdversluis force-pushed the storm-db-manager branch from d91843e to 09ee2c2 Compare June 7, 2016 14:07

lfdversluis reviewed Jun 10, 2016
View reviewed changes

DROPME more debugging

850d84d

lfdversluis force-pushed the storm-db-manager branch from b3b1b9f to 5ac48c5 Compare July 18, 2016 19:06

DROPME debug

21acaa8

lfdversluis force-pushed the storm-db-manager branch from 5ac48c5 to 21acaa8 Compare July 18, 2016 19:17

lfdversluis added 11 commits July 19, 2016 13:32

Removed some debug stuff'

9e5d28a

DROPME debug

229244d

added drop logs

08702c3

added yield

20c7353

Changed storm to use one connection and one cursor object

52b5a6f

updated database to match the new situation in stormdb

aae6ea6

updated stormDBManager and dispersy

05f7c73

Reenabled commits

445e815

removed ignorecommit case

c3be413

fixed some bugs

19f4a4a

closing stormdb is probably better

9222ce6

lfdversluis added 3 commits July 22, 2016 11:25

debug cleanup

0b991d9

unloading communities need to be yielded too

fd50353

Added the IgnoreCommits structure to StormDBManager as its used by Tr…

06f77d9

…ibler

devos50 mentioned this pull request Oct 28, 2016

Refactored checkpointing system Tribler/tribler#2595

Merged

devos50 mentioned this pull request Mar 12, 2017

ON HOLD: Async dispersy + Storm DB #530

Open

devos50 closed this Mar 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Storm db manager #481

WIP: Storm db manager #481

lfdversluis commented May 25, 2016 •

edited

Loading

whirm May 26, 2016

lfdversluis May 26, 2016

whirm May 26, 2016

whirm May 26, 2016

lfdversluis May 27, 2016 •

edited

Loading

whirm May 27, 2016

whirm commented May 27, 2016

lfdversluis commented May 27, 2016

qstokkink commented May 28, 2016

whirm commented May 30, 2016

whirm commented Jun 6, 2016

lfdversluis commented Jun 7, 2016 •

edited

Loading

whirm commented Jun 7, 2016

lfdversluis commented Jun 7, 2016

lfdversluis commented Jun 7, 2016 •

edited

Loading

whirm commented Jun 7, 2016

whirm commented Jun 7, 2016

lfdversluis commented Jun 7, 2016

lfdversluis commented Jun 10, 2016

lfdversluis Jun 10, 2016

lfdversluis commented Jul 21, 2016 •

edited

Loading

lfdversluis commented Jul 21, 2016

lfdversluis commented Jul 21, 2016 •

edited

Loading

synctext commented Jul 21, 2016

synctext commented Jul 21, 2016

lfdversluis commented Jul 21, 2016 •

edited

Loading

devos50 commented Jul 22, 2016 •

edited

Loading

synctext commented Jul 22, 2016

devos50 commented Mar 12, 2017

WIP: Storm db manager #481

WIP: Storm db manager #481

Conversation

lfdversluis commented May 25, 2016 • edited Loading

Refactoring progress:

Impact changes for Tribler et al.

whirm May 26, 2016

Choose a reason for hiding this comment

lfdversluis May 26, 2016

Choose a reason for hiding this comment

whirm May 26, 2016

Choose a reason for hiding this comment

whirm May 26, 2016

Choose a reason for hiding this comment

lfdversluis May 27, 2016 • edited Loading

Choose a reason for hiding this comment

whirm May 27, 2016

Choose a reason for hiding this comment

whirm commented May 27, 2016

lfdversluis commented May 27, 2016

qstokkink commented May 28, 2016

whirm commented May 30, 2016

whirm commented Jun 6, 2016

lfdversluis commented Jun 7, 2016 • edited Loading

whirm commented Jun 7, 2016

lfdversluis commented Jun 7, 2016

lfdversluis commented Jun 7, 2016 • edited Loading

whirm commented Jun 7, 2016

whirm commented Jun 7, 2016

lfdversluis commented Jun 7, 2016

lfdversluis commented Jun 10, 2016

lfdversluis Jun 10, 2016

Choose a reason for hiding this comment

lfdversluis commented Jul 21, 2016 • edited Loading

Allchannel succeeded.

lfdversluis commented Jul 21, 2016

lfdversluis commented Jul 21, 2016 • edited Loading

synctext commented Jul 21, 2016

synctext commented Jul 21, 2016

lfdversluis commented Jul 21, 2016 • edited Loading

devos50 commented Jul 22, 2016 • edited Loading

synctext commented Jul 22, 2016

devos50 commented Mar 12, 2017

lfdversluis commented May 25, 2016 •

edited

Loading

lfdversluis May 27, 2016 •

edited

Loading

lfdversluis commented Jun 7, 2016 •

edited

Loading

lfdversluis commented Jun 7, 2016 •

edited

Loading

lfdversluis commented Jul 21, 2016 •

edited

Loading

lfdversluis commented Jul 21, 2016 •

edited

Loading

lfdversluis commented Jul 21, 2016 •

edited

Loading

devos50 commented Jul 22, 2016 •

edited

Loading