-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented Oracle support and a minor speed improvement #1
Open
wschroeder
wants to merge
37
commits into
eisuke:master
Choose a base branch
from
wschroeder:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This removes the ()s so that we can treat the type name separately from its size.
Different databases support different sets of types and sometimes need to be treated differently. For example, Oracle has some requirements on how to pass blob data into an insert, whereas other databases can accept it as a hexed value.
…s to prepare(). This helps with security concerns and sets the stage for Oracle blob support. I ensured Postgres's blobs are bound correctly.
This requires changes to how "where" SQL is generated. We still support the limit interface in the abstract API, but in Oracle's case, the raw SQL generation will translate to the correct sections. Oracle tests are coming in the next commit.
I copied tests from Pg and reduced them as necessary. I am not testing all possible types at this time, and they should eventually be added, both to the tests and to the type/storage system.
Example for Oracle: MYSCHEMA.USER@DB2 Format: ${db_schema}.${table_name}@${connect_identifier}
This required a few fundamental changes to LIMIT usage and parameter binding.
…op speed. In my environment, this dropped us from around 4 minutes for all() 140K rows in session queries to closer to 3 minutes. The same query took about 12 seconds when constructed directly from the lower level metadata API. I saw room for more improvements like this with regard to functions performed in the critical loop (the map { callback } @$results in the iterator). This is a critical loop. On the other hand, if someone is fetching 140K rows as mapped objects, they might be approaching the task incorrectly. Another thought is to extend the API a little: $query->execute->all_arrayref # make all() thin by calling this and then map $query->execute->all_hashref # derive column names, make hashes, no bless Experiments revealed that this kind of to-the-metal query only takes 3 seconds.
Example load code: $mapper->metadata->table($table_name => { driver => $mapper->engine->driver, column_info => [ { 'name' => 'FOO_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '15' }, { 'name' => 'BAR_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '15' }, { 'name' => 'BAZ_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '10' } ], primary_key => ['BAR_ID', 'BAZ_ID', 'FOO_ID'], foreign_key => [ { 'keys' => ['BAR_ID'], 'refs' => ['BAR_ID'], 'table' => 'BAR' }, { 'keys' => ['FOO_ID'], 'refs' => ['FOO_ID'], 'table' => 'FOO' }, { 'keys' => ['BAZ_ID'], 'refs' => ['BAZ_ID'], 'table' => 'BAZ' } ], unique_key => [['BAZR_I' => ['BAR_ID', 'BAZ_ID', 'FOO_ID']]], });
The bug was passing @_ to convert_table instead of $t.
Although I may alias my tables, I still want to know the original table of each of the columns. This is useful for relation-based typing, wherein I need to get the header ($query->builder->column) to further interpret the results. Side note: It is inconsistent that a different storage strategy is used for column aliases vs table aliases.
A code pattern emerges.
By default, OM supports mutable query building. clone() allows us to extend queries without affecting the original queries. See 09_query/000_basic.t for a specific example of how this works.
The two new functions serve a few purposes. First, they follow the principle of least surprise. The join() and add_join() functions default to "LEFT OUTER" joins, as opposed to SQL default of "INNER". Since their names are explicit, a user knows exactly what will be added to the join clauses. Second, since the join type is in the function name, the parameters may be less awkward. Compare: $query->add_join( [$his_table => [$id == 1], 'INNER'], [$her_table => [$id == 2], 'INNER'], ) to: $query->add_inner_join( $his_table => [$id == 1], $her_table => [$id == 2], ) Third, there are really only two semantically distinct kinds of joins in SQL: inner and left. There is also "right", but it is rarely used and a simple transform of the parameters. Thus, the two functions serve the purpose of narrowing down the options. A user may fall back to using "add_join" instead if other joins are important. Also, "left" always implies "outer", so there is no need to include the keyword. Fourth, there is a distinct lack of inner_join() and left_join() functions. I noticed when people play with the query-building API, they expect fresh instances of the method calls to ADD to the clause slots, not replace them. So, while respecting the original API's behavior, I have intentionally neglected the slot-replacement, especially since adds work on empty slots. My general recommendation to my clients will be to always use "add_", which I believe should be a default behavior with "replace_" being the less-used but still available behavior.
Example: $column->like("Foo\\_Thing")->escape("\\")
We found this necessary for migration from our in-house system to OM in that we needed to share handles and transactions.
…grade to DBD::Oracle 1.68
This package was not what the use cases actually called for. We needed some specialized object-specific merging that preserved the original blessed references. Fixed the Column manipulations in the Metadata::Table to use the more specific merge strategy and removed a package dependency (yay!).
Using Module::Find makes the system run slower with additional I/O, and it is simply magic. Considering that OM::Metadata::Sugar dynamically creates functions based on the types, we are running the risk of a type with the same name as a Perl built-in. This entire dynamic method-building plus module-finding tends to be a design smell. This change is the first step towards a safer system. I initially ran into problems with this because I had a global version of OM installed in addition to linking to my local repo lib, so Sugar actually found two classes for each type and spit out warnings about function redefinition. My first inclination was to use List::MoreUtils::uniq, but that fixed the symptom, not the problem.
Looks like this entire .t file never passed. It should now.
…ns broke this test.
We were missing the method to map or the column we were mapping.
… not limit." This reverts commit 77adef1.
You may find two-nested-select online; ignore them. We have to support arbitrarily-constructed aggregate queries, after all. Better to preserve that than to attempt to modify it with our additional WHERE conditions.
Oracle's ALL_CONSTRAINTS and ALL_CONS_COLUMNS tables are not indexed or optimized for query. DBD::Oracle's foreign_key_info can take anywhere between 4 to 18 seconds to get information for a single table, which is dreadfully unacceptable when one has a database with almost 1000 tables. This new algorithm sacrifices some client memory in exchange for an 11x speedup.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hello!
Please accept my implementation of an Oracle driver. All the tests pass, and I have started to use this project successfully with our Oracle database.
Members of my group are excited about your project. The code is well-designed and clean with decent separation of concerns.
I also ran a performance test where I downloaded 140K rows of data. The last commit description goes into some detail on what I saw and some ideas on direction. I forgot to mention this: The add_before_method_modifier in Mapper.pm ultimately causes an additional 13 µs per $getter call in get_val(), which adds up in the tight loop during the 140K row pull. I wonder if there is a way to restructure the code to acquire enough column information before creating objects with each row. In comparison, DBIx::Class can map 140K rows to objects in about 8 seconds.
Another future thought is dumping autoloaded tables to files as a helpful schema-dump extension.
Please let me know your thoughts on these things as well as your vision of the future. I look forward to working with you to improve this project.