Key size limit and dump.rdb question #1389
Replies: 5 comments 3 replies
-
@jaesonrosenfeld1 this is not a self-contained example, so hard to understand what the problem is. |
Beta Was this translation helpful? Give feedback.
-
@jaesonrosenfeld1 fyi, writing 350M blobs should be possible (currently is not), but not advisable. It's better to write each row of |
Beta Was this translation helpful? Give feedback.
-
@romange Thank you for your responses! Regarding your idea of writing/reading each row of df separately, can you recommend how to do that given the write function I sent above and read function? Unsure of how to concatenate the rows of the dataframe into one dataframe without massively impacting load speed. In addition can you comment on reading dump.rdb creating using redis-server directly into dragonfly as I mentioned? Here's my current read function: def loadDFFromRedis(alias, r): |
Beta Was this translation helpful? Give feedback.
-
I am not a python programmer but with some chatgpt help I succeeded to come up with this code to save to redis a dataframe: #!/usr/bin/env python3
import pickle
import random
import pandas as pd
import numpy as np
import secrets
import string
import asyncio
from redis import asyncio as aioredis
from redis.asyncio import BlockingConnectionPool
from redis.asyncio.client import Redis
NP_LETTERS = np.array(list(string.ascii_letters))
def generate_random_string(length):
random_indices = np.random.randint(0, len(NP_LETTERS), size=length)
random_string = ''.join(NP_LETTERS[random_indices])
return random_string
NUM_ROWS = 10000
data = {
'col1': [generate_random_string(128) for _ in range(NUM_ROWS)],
'col2': [generate_random_string(128) for _ in range(NUM_ROWS)]
}
async def save_row(redis, index, row):
serialized_row = pickle.dumps(row)
redis_key = f"row_{index}"
await redis.set(redis_key, serialized_row)
async def save_dataframe(df, redis: Redis):
tasks = []
for index, row in df.iterrows():
task = save_row(redis, index, row)
tasks.append(task)
print(f"Saving {len(tasks)} rows")
await asyncio.gather(*tasks)
await redis.close()
async def main():
redis = await Redis(max_connections=100, connection_pool=BlockingConnectionPool())
df = pd.DataFrame(data)
print('created dataframe')
await save_dataframe(df, redis)
# Save the DataFrame in parallel using asyncio
asyncio.run(main()) |
Beta Was this translation helpful? Give feedback.
-
Thanks for providing that code! Will reading/writing in a loop, even using asyncio, be as fast or faster than writing the df to a single key as bytes? I find that surprising. I can give it a try! |
Beta Was this translation helpful? Give feedback.
-
Trying to tryout dragonflydb as a drop-in replacement for redis and excited to see what it can do.
I have an existing application that writes to a dump.rdb file for persistence and reloads about 25GB of data into memory when redis-server is restarted.
I'm noticing that when I try to load dragonfly while pointed to the dump.rdb file written by redis to the host, it doesn't load these keys into the db (dbsize remains 0). Is this because I need to change the default format for the dump.rdb to redis by changing --df_snapshot_format=False and then it can also read the existing dump.rdb from redis format? I tried this and still was unable to get the dump.rdb to be loaded into memory when launching dragonfly.
Secondly, when I then try to write new python files using the redis python package, a few keys write fine but it gets to a slightly larger key to write (350 MB in pandas) and I get the message "Error 32 writing to socket. Broken Pipe". Is there a keysize limitation I should know about that could be modified? I know the limit in Redis is 512MB. Here is the code for launching the dragonflydb container as well as the code for writing the files from python:
docker run --log-driver awslogs --log-opt awslogs-region=us-east-2 --log-opt awslogs-group=WebServerLogsRFG --log-opt awslogs-stream=DockerLogsRedis --name myredis -p 6380:6380 --network my-network -v /home/ubuntu/redis/data:/data --ulimit memlock=-1 docker.dragonflydb.io/dragonflydb/dragonfly dragonfly --port 6380
def openRedisCon():
pool = redis.ConnectionPool(
host=REDIS_HOST,
port=REDIS_PORT,
db=0,
)
r = redis.Redis(connection_pool=pool)
return r
r = openRedisCon()
def storeDFInRedis(alias, r, df):
buffer = io.BytesIO()
df.reset_index(drop=True).to_feather(buffer, compression="zstd")
buffer. Seek(0) # re-set the pointer to the beginning after reading
res = r.set(alias, buffer. Read())
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions