Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

entry->buffer modified after retrieval leads to invalid access and segment fault in rumget.c:scanpage. #144

Open
Wenbo94 opened this issue Feb 24, 2025 · 2 comments

Comments

@Wenbo94
Copy link

Wenbo94 commented Feb 24, 2025

conclusion:
The entry->buffer pointer is retrieved, but the underlying buffer can be modified by another thread before it's used, leading to invalid memory access and a segment fault.

env:
postgresql REL_12_STABLE branch and rum master branch.

reproduce:
test.sql:

set enable_seqscan to off;
set max_parallel_workers_per_gather = 0;
set force_parallel_mode = off;


insert into test_float4 values (1),(-1),(2);

explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;
explain analyze select * from test_float4 where i = 2::float4;
explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;
explain analyze select * from test_float4 where i = 2::float4;
explain analyze select * from test_float4 where i = 1::float4;
explain analyze select * from test_float4 where i = -1::float4;

test.py:

import threading
import psycopg2
import time

def execute_sql():
    while True:
        try:
            conn = psycopg2.connect(
                dbname="postgres",
                user="username",
                password="password",
                host="localhost",
                port="5432"
            )
            cur = conn.cursor()

            with open('test.sql', 'r') as file:
                sql = file.read()
                cur.execute(sql)
                conn.commit()

            cur.close()
            conn.close()
        except Exception as e:
            print(f"Error: {e}")
            break

threads = []
for i in range(16):
    thread = threading.Thread(target=execute_sql)
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

just create a simple table and a rum index on it:

CREATE TABLE test_float4(i float4);
CREATE INDEX idx_t ON test_float4 USING rum(i);

run test.py a few minutes will got some coredump like this:

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f04f33d5f1a in rumDataPageLeafRead (ptr=0x7f04ea534eaa "\b", attnum=1, item=0x7ffde964e578, copyAddInfo=true, rumstate=0x55a871e8bfb8)
    at src/rum.h:987
987			if (attr->attbyval)
(gdb) bt
#0  0x00007f04f33d5f1a in rumDataPageLeafRead (ptr=0x7f04ea534eaa "\b", attnum=1, item=0x7ffde964e578, copyAddInfo=true, rumstate=0x55a871e8bfb8)
    at src/rum.h:987
#1  0x00007f04f33d7b3e in scanPage (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, item=0x55a871e9d080, equalOk=false) at src/rumget.c:1673
#2  0x00007f04f33d73a2 in entryGetNextItem (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, snapshot=0x55a871e3ca30) at src/rumget.c:896
#3  0x00007f04f33d553a in entryGetItem (rumstate=0x55a871e8bfb8, entry=0x55a871e9d058, nextEntryList=0x0, snapshot=0x55a871e3ca30) at src/rumget.c:1310
#4  0x00007f04f33d86f4 in scanGetItemRegular (scan=0x55a871e82380, advancePast=0x7ffde964e7d0, item=0x7ffde964e7d0, recheck=0x7ffde964e7e7)
    at src/rumget.c:1480
#5  0x00007f04f33d3c29 in scanGetItem (scan=0x55a871e82380, advancePast=0x7ffde964e7d0, item=0x7ffde964e7d0, recheck=0x7ffde964e7e7) at src/rumget.c:2129
#6  0x00007f04f33d36f9 in rumgetbitmap (scan=0x55a871e82380, tbm=0x55a871e83590) at src/rumget.c:2167
#7  0x000055a870b88811 in index_getbitmap (scan=0x55a871e82380, bitmap=0x55a871e83590) at indexam.c:670
#8  0x000055a870d9296c in MultiExecBitmapIndexScan (node=0x55a871e82090) at nodeBitmapIndexscan.c:105
#9  0x000055a870d7baea in MultiExecProcNode (node=0x55a871e82090) at execProcnode.c:506
#10 0x000055a870d91860 in BitmapHeapNext (node=0x55a871e81da0) at nodeBitmapHeapscan.c:114
#11 0x000055a870d7dbe3 in ExecScanFetch (node=0x55a871e81da0, accessMtd=0x55a870d91780 <BitmapHeapNext>, recheckMtd=0x55a870d91e30 <BitmapHeapRecheck>)
    at execScan.c:133
#12 0x000055a870d7d832 in ExecScan (node=0x55a871e81da0, accessMtd=0x55a870d91780 <BitmapHeapNext>, recheckMtd=0x55a870d91e30 <BitmapHeapRecheck>)
    at execScan.c:183
@Wenbo94
Copy link
Author

Wenbo94 commented Feb 24, 2025

add some logs in root datapage split and found that after the split ,stack->buffer should be untoached but entrygetnextitem will still lock the stack->buffer and perform a scanpage on it.
for more details, stack->buffer's page flags became 1 which is RUM_DATA and not RUM_LEAF, and scanpage will use rumDataPageLeafRead to read this page.
make a fix in PR#145.

@sokolcati
Copy link
Contributor

Thank you for your issue, especially the playback scripts!

I saw your commit and will try to get back to you with feedback soon. I suggest discussing further edits in the pull request page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants