You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In both v9 and v10, when the outbox sweeper runs it checks the outbox for any outstanding messages past a certain age. For the Dynamo DB implementation, it does this by performing a query operation on the Outstanding index, with a key expression looking at a particular shard for a given topic and the created time for the message (to only retrieve messages past a certain age). It then also applies a filter expression in order to filter out the messages which have a dispatch time (and have therefore already been dispatched).
The issue here is that filter expressions are applied after reading from the table, and then applied server side before sending the filtered data over the wire to the client. Even though the data is filtered server side, the user is still charged for the reads. In this instance, this means that every time the sweeper runs, it's reading the vast majority of the messages from the table.
By way of example, consider an outbox where:
2000 messages are published per minute
The sweeper runs every 5 seconds
The archiver archives dispatched messages older than 1 hour
In this example, every time the sweeper runs it would be reading 120k messages from the table every 5 seconds, even if none of those messages were actually outstanding. This results in a level of cost that makes it impractical to use a sweeper with a Dynamo DB outbox.
Possible fix
What we really want for the Outstanding index is a sparse index - one in which every message found in the index is one that is yet to be dispatched, removing the need for a filter expression at all. One way to do this would be to make the sort key on the index a simple boolean indicating that the message is yet to be dispatched, however this would remove the ability to only query for messages past a certain age. Instead, we should populate a new numerical attribute called OutstandingCreatedTime on each message which is only populated if the message is yet to be dispatched. This attribute will be the sort key on the Outstanding index, meaning messages that don't have an OutstandingCreatedTime attribute will not be part of the Outstanding index.
The overall flow for a message would therefore be:
The message is added to the outbox, with both the CreatedTime and OutstandingCreatedTime attributes populated
The sweeper runs and queries the Outstanding index, only retrieving messages for which OutstandingCreatedTime is populated
The sweeper publishes the outstanding message, and then populates the DeliveryTime attribute and removes the OutstandingCreatedTime attribute from the record, removing it from the index
This is a breaking change to the outbox table structure, which we could make wholesale to v10. Users upgrading would therefore need to create a new table, as GSIs cannot be edited after creation.
Given that the current implementation makes a dynamo outbox effectively unusable, the change also needs to be made to v9. My suggestion here would be to add a boolean property to DynamoDbConfiguration called SparseOutstandingIndex, which defaults to false. If the flag is false, then continue with the current implementation. If the value is true, then use the approach above. This would essentially allow users of v9 to "opt in" to using the cheaper, more performant table structure if they're doing a new implementation or are willing to perform a migration.
The text was updated successfully, but these errors were encountered:
Describe the bug
In both v9 and v10, when the outbox sweeper runs it checks the outbox for any outstanding messages past a certain age. For the Dynamo DB implementation, it does this by performing a query operation on the
Outstanding
index, with a key expression looking at a particular shard for a given topic and the created time for the message (to only retrieve messages past a certain age). It then also applies a filter expression in order to filter out the messages which have a dispatch time (and have therefore already been dispatched).The issue here is that filter expressions are applied after reading from the table, and then applied server side before sending the filtered data over the wire to the client. Even though the data is filtered server side, the user is still charged for the reads. In this instance, this means that every time the sweeper runs, it's reading the vast majority of the messages from the table.
By way of example, consider an outbox where:
In this example, every time the sweeper runs it would be reading 120k messages from the table every 5 seconds, even if none of those messages were actually outstanding. This results in a level of cost that makes it impractical to use a sweeper with a Dynamo DB outbox.
Possible fix
What we really want for the
Outstanding
index is a sparse index - one in which every message found in the index is one that is yet to be dispatched, removing the need for a filter expression at all. One way to do this would be to make the sort key on the index a simple boolean indicating that the message is yet to be dispatched, however this would remove the ability to only query for messages past a certain age. Instead, we should populate a new numerical attribute calledOutstandingCreatedTime
on each message which is only populated if the message is yet to be dispatched. This attribute will be the sort key on theOutstanding
index, meaning messages that don't have anOutstandingCreatedTime
attribute will not be part of theOutstanding
index.The overall flow for a message would therefore be:
CreatedTime
andOutstandingCreatedTime
attributes populatedOutstanding
index, only retrieving messages for whichOutstandingCreatedTime
is populatedDeliveryTime
attribute and removes theOutstandingCreatedTime
attribute from the record, removing it from the indexThis is a breaking change to the outbox table structure, which we could make wholesale to v10. Users upgrading would therefore need to create a new table, as GSIs cannot be edited after creation.
Given that the current implementation makes a dynamo outbox effectively unusable, the change also needs to be made to v9. My suggestion here would be to add a boolean property to
DynamoDbConfiguration
calledSparseOutstandingIndex
, which defaults tofalse
. If the flag isfalse
, then continue with the current implementation. If the value istrue
, then use the approach above. This would essentially allow users of v9 to "opt in" to using the cheaper, more performant table structure if they're doing a new implementation or are willing to perform a migration.The text was updated successfully, but these errors were encountered: