-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(python): optimize pystr deserialize perf #2007
base: main
Are you sure you want to change the base?
perf(python): optimize pystr deserialize perf #2007
Conversation
8ba4b1b
to
6f0a64b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is very efficient,very nice!
maybe we can optimize the repetitive code.
// Handle remaining elements
for (; i < length; i++) {
if (arr[i] > max_sse) {
max_sse = arr[i];
}
It's just the way it's written. It's nothing serious.
cdef const char * buf = <const char *>(self.c_buffer.get().data() + self.reader_index) | ||
self.reader_index += size | ||
cdef uint32_t encoding = header & <uint32_t>0b11 | ||
if encoding == 0: | ||
# PyUnicode_FromASCII | ||
return PyUnicode_DecodeLatin1(buf, size, "strict") | ||
return <unicode>Fury_PyUnicode_FromUCS1(buf, size) | ||
# return PyUnicode_DecodeLatin1(buf, size, "strict") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i use PyUnicode_DecodeLatin1 directly here, It's faster in macos, which is unexpected Since my implementation used the simd, and if i invoke PyUnicode_DecodeLatin1 directly in PyUnicode_FromUCS1, it's slower too. @penguin-wwy do you have any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the testing method? The tests I wrote myself do not have this issue.
# integration_tests/cpython_benchmark/fury_benchmark.py
STRING = "sjuveaibngurbzsivbrubiasb3r93284r92r1209130r0fa;2''j93r2nfln''[]\=-_+/,./!@$#%^&*()i9124u0hpq[jnzj0r9h034-2iu1058]"
def micro_benchmark():
runner.bench_func(
"fury_string", fury_object, language, not args.no_ref, STRING
)
runner.bench_func(
"fury_large_string", fury_object, language, not args.no_ref, STRING * 10000
)
Using PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 54.7 us +- 2.5 us
fury_large_string: Mean +- std dev: 255 us +- 24 us
Using Fury_PyUnicode_FromUCS1:
fury_string: Mean +- std dev: 53.8 us +- 2.0 us
fury_large_string: Mean +- std dev: 236 us +- 6 us
What does this PR do?
This PR implemented an optimized version of
PyUnicode_FromUCS1/Fury_PyUnicode_FromUCS2
for faster performance by :Related issues
Does this PR introduce any user-facing change?
Benchmark