Key size limit and dump.rdb question #1389

jaesonrosenfeld1 · 2023-06-11T10:43:04Z

jaesonrosenfeld1
Jun 11, 2023

Trying to tryout dragonflydb as a drop-in replacement for redis and excited to see what it can do.

I have an existing application that writes to a dump.rdb file for persistence and reloads about 25GB of data into memory when redis-server is restarted.

I'm noticing that when I try to load dragonfly while pointed to the dump.rdb file written by redis to the host, it doesn't load these keys into the db (dbsize remains 0). Is this because I need to change the default format for the dump.rdb to redis by changing --df_snapshot_format=False and then it can also read the existing dump.rdb from redis format? I tried this and still was unable to get the dump.rdb to be loaded into memory when launching dragonfly.

Secondly, when I then try to write new python files using the redis python package, a few keys write fine but it gets to a slightly larger key to write (350 MB in pandas) and I get the message "Error 32 writing to socket. Broken Pipe". Is there a keysize limitation I should know about that could be modified? I know the limit in Redis is 512MB. Here is the code for launching the dragonflydb container as well as the code for writing the files from python:

docker run --log-driver awslogs --log-opt awslogs-region=us-east-2 --log-opt awslogs-group=WebServerLogsRFG --log-opt awslogs-stream=DockerLogsRedis --name myredis -p 6380:6380 --network my-network -v /home/ubuntu/redis/data:/data --ulimit memlock=-1 docker.dragonflydb.io/dragonflydb/dragonfly dragonfly --port 6380

def openRedisCon():
pool = redis.ConnectionPool(
host=REDIS_HOST,
port=REDIS_PORT,
db=0,
)
r = redis.Redis(connection_pool=pool)
return r

r = openRedisCon()

def storeDFInRedis(alias, r, df):
buffer = io.BytesIO()
df.reset_index(drop=True).to_feather(buffer, compression="zstd")
buffer. Seek(0) # re-set the pointer to the beginning after reading
res = r.set(alias, buffer. Read())

Thanks!

romange · 2023-06-12T03:34:59Z

romange
Jun 12, 2023
Maintainer

@jaesonrosenfeld1 this is not a self-contained example, so hard to understand what the problem is.
Are you saying you are writing a single entry with a key that is 350 million characters long? or the value is 350MB?
Can you provide a self contained python code that I can run on DF to see what the problem is?

1 reply

romange Jun 12, 2023
Maintainer

nm, the code below indeed fails. reducing the length to 64000000 solves the issue.

From looking at the code, I saw constexpr int64_t kMaxBulkLen = 64 * (1ul << 20); // 64MB. line in redis_parser.cc written like 1.5 years ago :)

import redis

red = redis.Redis()
red.set('foo', 'x' * 350000000)

romange · 2023-06-12T04:41:03Z

romange
Jun 12, 2023
Maintainer

@jaesonrosenfeld1 fyi, writing 350M blobs should be possible (currently is not), but not advisable. It's better to write each row of df into a separate key than serialize it as a single blob.

0 replies

jaesonrosenfeld1 · 2023-06-12T08:47:30Z

jaesonrosenfeld1
Jun 12, 2023
Author

@romange Thank you for your responses! Regarding your idea of writing/reading each row of df separately, can you recommend how to do that given the write function I sent above and read function? Unsure of how to concatenate the rows of the dataframe into one dataframe without massively impacting load speed. In addition can you comment on reading dump.rdb creating using redis-server directly into dragonfly as I mentioned?

Here's my current read function:
import io
import pandas as pd

def loadDFFromRedis(alias, r):
try:
buffer = io.BytesIO(r.get(alias))
buffer. Seek(0)
df = pd.read_feather(buffer, use_threads=True)
return df
except:
return None

0 replies

romange · 2023-06-12T19:29:11Z

romange
Jun 12, 2023
Maintainer

I am not a python programmer but with some chatgpt help I succeeded to come up with this code to save to redis a dataframe:

#!/usr/bin/env python3

import pickle
import random
import pandas as pd
import numpy as np
import secrets
import string

import asyncio
from redis import asyncio as aioredis
from redis.asyncio import BlockingConnectionPool
from redis.asyncio.client import Redis

NP_LETTERS = np.array(list(string.ascii_letters))


def generate_random_string(length):
    random_indices = np.random.randint(0, len(NP_LETTERS), size=length)
    random_string = ''.join(NP_LETTERS[random_indices])
    return random_string


NUM_ROWS = 10000

data = {
    'col1': [generate_random_string(128) for _ in range(NUM_ROWS)],
    'col2': [generate_random_string(128) for _ in range(NUM_ROWS)]
}


async def save_row(redis, index, row):
    serialized_row = pickle.dumps(row)
    redis_key = f"row_{index}"
    await redis.set(redis_key, serialized_row)


async def save_dataframe(df, redis: Redis):
    tasks = []
    for index, row in df.iterrows():
        task = save_row(redis, index, row)
        tasks.append(task)
    print(f"Saving {len(tasks)} rows")
    await asyncio.gather(*tasks)
    await redis.close()    


async def main():
    redis = await Redis(max_connections=100, connection_pool=BlockingConnectionPool())
    df = pd.DataFrame(data)
    print('created dataframe')
    await save_dataframe(df, redis)

# Save the DataFrame in parallel using asyncio
asyncio.run(main())

0 replies

jaesonrosenfeld1 · 2023-06-14T10:38:22Z

jaesonrosenfeld1
Jun 14, 2023
Author

Thanks for providing that code! Will reading/writing in a loop, even using asyncio, be as fast or faster than writing the df to a single key as bytes? I find that surprising. I can give it a try!

2 replies

romange Jun 15, 2023
Maintainer

It won't be faster but hopefully should be comparable. The thing is that writing 350MB into a single entry is just not advised because Dragonfly and Redis both operate on single key granularity and this may create reliability issues when replicating data or saving snapshots.
it's not about speed - it's about reliability.

romange Jun 15, 2023
Maintainer

Having said that, writing multiple rows in parallel should in theory be very much faster, but it really depends on python performance. I am pretty sure my proposed approach would be faster when using compiled languages like go, rust, java etc. With pyhon - I just do not know.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key size limit and dump.rdb question #1389

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 3 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Key size limit and dump.rdb question #1389

jaesonrosenfeld1 Jun 11, 2023

Replies: 5 comments · 3 replies

romange Jun 12, 2023 Maintainer

romange Jun 12, 2023 Maintainer

romange Jun 12, 2023 Maintainer

jaesonrosenfeld1 Jun 12, 2023 Author

romange Jun 12, 2023 Maintainer

jaesonrosenfeld1 Jun 14, 2023 Author

romange Jun 15, 2023 Maintainer

romange Jun 15, 2023 Maintainer

jaesonrosenfeld1
Jun 11, 2023

Replies: 5 comments 3 replies

romange
Jun 12, 2023
Maintainer

romange Jun 12, 2023
Maintainer

romange
Jun 12, 2023
Maintainer

jaesonrosenfeld1
Jun 12, 2023
Author

romange
Jun 12, 2023
Maintainer

jaesonrosenfeld1
Jun 14, 2023
Author

romange Jun 15, 2023
Maintainer

romange Jun 15, 2023
Maintainer