-
Notifications
You must be signed in to change notification settings - Fork 185
Open
Labels
type: BugSomething isn't workingSomething isn't working
Description
Apache Cloudberry version
main branch
What happened
create external web table t3(a int, b text)
LOCATION ('http://<ip>:<port>/bad_gb.txt')
FORMAT 'TEXT' (DELIMITER ',' NULL '' ) ENCODING 'GB18030'
LOG ERRORS SEGMENT REJECT LIMIT 2;
select * from t3;output:
gpadmin=# select * from t3;
ERROR: segment reject limit reached, aborting operation (seg0 slice1 127.0.1.1:7002 pid=2316762)
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3 0x0a
CONTEXT: External table t3, line 3 of file http://.../bad_gb.txt
bad_gb.txt: encoding GB18030
gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
00000000 31 2c ca c0 bd e7 0a 32 2c c4 e3 ba c3 c2 f0 a3 |1,.....2,.......|
00000010 0a 33 2c 6e 69 68 61 6f 0a |.3,nihao.|
00000019
What you think should happen instead
Only the second line is bad, the first and third line should output according to its definition.
How to reproduce
repro, replace the
create external web table t3(a int, b text)
LOCATION ('http://<ip>:<port>/bad_gb.txt')
FORMAT 'TEXT' (DELIMITER ',' NULL '' ) ENCODING 'GB18030'
LOG ERRORS SEGMENT REJECT LIMIT 2;
select * from t3;
-- or
create temp table t0(a int, b text);
-- copy the file bad_gb.txt to /tmp
copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log errors segment reject limit 2;output:
gpadmin=# select * from t3;
ERROR: segment reject limit reached, aborting operation (seg0 slice1 127.0.1.1:7002 pid=2316762)
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3 0x0a
CONTEXT: External table t3, line 3 of file http://.../bad_gb.txt
-- or
gpadmin=# copy t0 from '/tmp/www/bad_gb.txt' with(encoding 'gb18030') log errors segment reject limit 2;
ERROR: segment reject limit reached, aborting operation
DETAIL: Last error was: invalid byte sequence for encoding "GB18030": 0xa3 0x0a, column a
CONTEXT: COPY t0, line 2, column a: "1,世界"
bad_gb.txt: encoding GB18030
gpadmin@hashdata:/tmp/www$ hexdump -C bad_gb.txt
00000000 31 2c ca c0 bd e7 0a 32 2c c4 e3 ba c3 c2 f0 a3 |1,.....2,.......|
00000010 0a 33 2c 6e 69 68 61 6f 0a |.3,nihao.|
00000019
Operating System
ubuntu 22.04
Anything else
No response
Are you willing to submit PR?
- Yes, I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct.
Metadata
Metadata
Assignees
Labels
type: BugSomething isn't workingSomething isn't working