Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plan to support Oracle direct-path api for faster bulk inserts? #369

Open
gitpickle opened this issue Nov 14, 2019 · 20 comments
Open

plan to support Oracle direct-path api for faster bulk inserts? #369

gitpickle opened this issue Nov 14, 2019 · 20 comments

Comments

@gitpickle
Copy link

Oracle provides a well-known mechanism commonly refereed to as the "Direct Path API" which appends rows to the end of a table for faster bulk insertion. Additionally, Oracle offers (for free) the ability to compress blocks written in this manner based on repeating values. The current cx_oracle executemany code does not seem to leverage or offer a way to leverage the Direct Path API. The API is accessed commonly through the /*+ append */ hint. Are there plans for cx_oracle to natively support direct path insertion? Thanks for your help! Mike

@cjbj
Copy link
Member

cjbj commented Nov 14, 2019

Data loading is something that SQL Loader already does well.

We have no immediate plans to investigate using the Oracle Call Interface Direct Path Load API for cx_Oracle, but I know it would be nice to have.

@anthony-tuininga
Copy link
Member

anthony-tuininga commented Nov 14, 2019

Just to be clear: usign the /*+ append */ hint should work with cx_Oracle just as well as it does in SQL*Plus or any other client. The other method that Chris mentioned is a different API altogether which bypasses some of the other overhead associated with binding data.

@gitpickle
Copy link
Author

gitpickle commented Nov 14, 2019 via email

@cjbj
Copy link
Member

cjbj commented Nov 15, 2019

50% faster than executeMany()? Do you have Python benchmark data?

The Oracle Call Interface direct path load API is not small, so learning & implementing & testing would be a solid effort. It would need to be added to ODPI-C first, which @anthony-tuininga would have to do (we don't accept PR's for ODPI-C for various reasons). Then this would be wrapped in cx_Oracle calls. Typically Anthony would do the cx_Oracle API while doing the ODPI-C interface since it makes testing easier.

Can you post a brain-storm cx_Oracle API and show how you would use it?

@gitpickle
Copy link
Author

gitpickle commented Nov 15, 2019 via email

@cjbj
Copy link
Member

cjbj commented Nov 15, 2019

Can you do a quick benchmark with executeMany() in Python and compare that with your ETL direct path? I am very curious.

@gitpickle
Copy link
Author

gitpickle commented Nov 15, 2019 via email

@cjbj cjbj changed the title plan to support Oracle direct-path api (/*+ append */) for faster bulk inserts? plan to support Oracle direct-path api for faster bulk inserts? Dec 4, 2019
@cjbj
Copy link
Member

cjbj commented Dec 4, 2019

@gitpickle how did your benchmark go?

@gitpickle
Copy link
Author

gitpickle commented Dec 11, 2019 via email

@cjbj
Copy link
Member

cjbj commented Dec 11, 2019

The decision factors are standard ones (i) implementation complexity and maintenance costs (ii) performance benefits (iii) usability benefits (iv) whether leaving the feature to existing Oracle tools that do it is a wiser allocation of responsibilities.

@gitpickle
Copy link
Author

gitpickle commented Dec 11, 2019 via email

@cjbj
Copy link
Member

cjbj commented Dec 11, 2019

@gitpickle since you have data, you could help with (ii).

@gitpickle
Copy link
Author

gitpickle commented Dec 12, 2019 via email

@qianxuanyon
Copy link

I also hope to have the function of direct path loading without giving up cx_Oracler

@gitpickle
Copy link
Author

gitpickle commented May 8, 2021 via email

@cjbj
Copy link
Member

cjbj commented May 8, 2021

I think so too. It's on my long wish-list, but it's a big project. API suggestions, testcases etc welcome

@qianxuanyon
Copy link

Will there be compressed data when cx oracle reads data transmission and writes to a remote server to increase the transmission speed?

@cjbj
Copy link
Member

cjbj commented Jul 1, 2021

@qianxuanyon the feature hasn't been investigated yet.

@qianxuanyon
Copy link

I currently have a scenario where a csv.gz file needs to be written to a remote database
Two methods of operation are used

  1. Directly read from cx_Oracle and then executemany to insert into the database
  2. Upload the file to a server near the database and then use cx_Oracle

The second method plus the data upload time is much faster than the first method

So is it possible for us to compress data during the transmission of cx_Oracle to increase the transmission speed?

@cjbj
Copy link
Member

cjbj commented Jul 2, 2021

@qianxuanyon that is a different topic to this enhancement request. Can you start a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants