Implemented Oracle support and a minor speed improvement #1

wschroeder · 2011-05-18T22:16:31Z

Hello!

Please accept my implementation of an Oracle driver. All the tests pass, and I have started to use this project successfully with our Oracle database.

Members of my group are excited about your project. The code is well-designed and clean with decent separation of concerns.

I also ran a performance test where I downloaded 140K rows of data. The last commit description goes into some detail on what I saw and some ideas on direction. I forgot to mention this: The add_before_method_modifier in Mapper.pm ultimately causes an additional 13 µs per $getter call in get_val(), which adds up in the tight loop during the 140K row pull. I wonder if there is a way to restructure the code to acquire enough column information before creating objects with each row. In comparison, DBIx::Class can map 140K rows to objects in about 8 seconds.

Another future thought is dumping autoloaded tables to files as a helpful schema-dump extension.

Please let me know your thoughts on these things as well as your vision of the future. I look forward to working with you to improve this project.

This removes the ()s so that we can treat the type name separately from its size.

Different databases support different sets of types and sometimes need to be treated differently. For example, Oracle has some requirements on how to pass blob data into an insert, whereas other databases can accept it as a hexed value.

…s to prepare(). This helps with security concerns and sets the stage for Oracle blob support. I ensured Postgres's blobs are bound correctly.

This requires changes to how "where" SQL is generated. We still support the limit interface in the abstract API, but in Oracle's case, the raw SQL generation will translate to the correct sections. Oracle tests are coming in the next commit.

I copied tests from Pg and reduced them as necessary. I am not testing all possible types at this time, and they should eventually be added, both to the tests and to the type/storage system.

Example for Oracle: MYSCHEMA.USER@DB2 Format: ${db_schema}.${table_name}@${connect_identifier}

This required a few fundamental changes to LIMIT usage and parameter binding.

…op speed. In my environment, this dropped us from around 4 minutes for all() 140K rows in session queries to closer to 3 minutes. The same query took about 12 seconds when constructed directly from the lower level metadata API. I saw room for more improvements like this with regard to functions performed in the critical loop (the map { callback } @$results in the iterator). This is a critical loop. On the other hand, if someone is fetching 140K rows as mapped objects, they might be approaching the task incorrectly. Another thought is to extend the API a little: $query->execute->all_arrayref # make all() thin by calling this and then map $query->execute->all_hashref # derive column names, make hashes, no bless Experiments revealed that this kind of to-the-metal query only takes 3 seconds.

Example load code: $mapper->metadata->table($table_name => { driver => $mapper->engine->driver, column_info => [ { 'name' => 'FOO_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '15' }, { 'name' => 'BAR_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '15' }, { 'name' => 'BAZ_ID', 'default' => undef, 'type' => 'number', 'is_nullable' => 0, 'size' => '10' } ], primary_key => ['BAR_ID', 'BAZ_ID', 'FOO_ID'], foreign_key => [ { 'keys' => ['BAR_ID'], 'refs' => ['BAR_ID'], 'table' => 'BAR' }, { 'keys' => ['FOO_ID'], 'refs' => ['FOO_ID'], 'table' => 'FOO' }, { 'keys' => ['BAZ_ID'], 'refs' => ['BAZ_ID'], 'table' => 'BAZ' } ], unique_key => [['BAZR_I' => ['BAR_ID', 'BAZ_ID', 'FOO_ID']]], });

The bug was passing @_ to convert_table instead of $t.

Although I may alias my tables, I still want to know the original table of each of the columns. This is useful for relation-based typing, wherein I need to get the header ($query->builder->column) to further interpret the results. Side note: It is inconsistent that a different storage strategy is used for column aliases vs table aliases.

A code pattern emerges.

By default, OM supports mutable query building. clone() allows us to extend queries without affecting the original queries. See 09_query/000_basic.t for a specific example of how this works.

The two new functions serve a few purposes. First, they follow the principle of least surprise. The join() and add_join() functions default to "LEFT OUTER" joins, as opposed to SQL default of "INNER". Since their names are explicit, a user knows exactly what will be added to the join clauses. Second, since the join type is in the function name, the parameters may be less awkward. Compare: $query->add_join( [$his_table => [$id == 1], 'INNER'], [$her_table => [$id == 2], 'INNER'], ) to: $query->add_inner_join( $his_table => [$id == 1], $her_table => [$id == 2], ) Third, there are really only two semantically distinct kinds of joins in SQL: inner and left. There is also "right", but it is rarely used and a simple transform of the parameters. Thus, the two functions serve the purpose of narrowing down the options. A user may fall back to using "add_join" instead if other joins are important. Also, "left" always implies "outer", so there is no need to include the keyword. Fourth, there is a distinct lack of inner_join() and left_join() functions. I noticed when people play with the query-building API, they expect fresh instances of the method calls to ADD to the clause slots, not replace them. So, while respecting the original API's behavior, I have intentionally neglected the slot-replacement, especially since adds work on empty slots. My general recommendation to my clients will be to always use "add_", which I believe should be a default behavior with "replace_" being the less-used but still available behavior.

Example: $column->like("Foo\\_Thing")->escape("\\")

We found this necessary for migration from our in-house system to OM in that we needed to share handles and transactions.

…grade to DBD::Oracle 1.68

This package was not what the use cases actually called for. We needed some specialized object-specific merging that preserved the original blessed references. Fixed the Column manipulations in the Metadata::Table to use the more specific merge strategy and removed a package dependency (yay!).

Using Module::Find makes the system run slower with additional I/O, and it is simply magic. Considering that OM::Metadata::Sugar dynamically creates functions based on the types, we are running the risk of a type with the same name as a Perl built-in. This entire dynamic method-building plus module-finding tends to be a design smell. This change is the first step towards a safer system. I initially ran into problems with this because I had a global version of OM installed in addition to linking to my local repo lib, so Sugar actually found two classes for each type and spit out warnings about function redefinition. My first inclination was to use List::MoreUtils::uniq, but that fixed the symptom, not the problem.

Looks like this entire .t file never passed. It should now.

…ns broke this test.

We were missing the method to map or the column we were mapping.

… not limit." This reverts commit 77adef1.

You may find two-nested-select online; ignore them. We have to support arbitrarily-constructed aggregate queries, after all. Better to preserve that than to attempt to modify it with our additional WHERE conditions.

Oracle's ALL_CONSTRAINTS and ALL_CONS_COLUMNS tables are not indexed or optimized for query. DBD::Oracle's foreign_key_info can take anywhere between 4 to 18 seconds to get information for a single table, which is dreadfully unacceptable when one has a database with almost 1000 tables. This new algorithm sacrifices some client memory in exchange for an 11x speedup.

William Schroeder and others added 30 commits May 18, 2011 16:18

Some databases support a timestamp with a number of digits.

f7101d4

This removes the ()s so that we can treat the type name separately from its size.

Pushed the type list down to the driver layer.

fe0f6fe

Different databases support different sets of types and sometimes need to be treated differently. For example, Oracle has some requirements on how to pass blob data into an insert, whereas other databases can accept it as a hexed value.

Many more values are now passed in execute() instead of as raw string…

047ec71

…s to prepare(). This helps with security concerns and sets the stage for Oracle blob support. I ensured Postgres's blobs are bound correctly.

Added Oracle support.

6f24d4d

I copied tests from Pg and reduced them as necessary. I am not testing all possible types at this time, and they should eventually be added, both to the tests and to the type/storage system.

Added support for connect identifiers.

72cd674

Example for Oracle: MYSCHEMA.USER@DB2 Format: ${db_schema}.${table_name}@${connect_identifier}

Implemented Oracle support.

0667534

This required a few fundamental changes to LIMIT usage and parameter binding.

Oracle supports VARCHAR2.

f47b334

Implemented Oracle support for table aliasing.

79b603c

Fixed bug in multiple tables in the from clause.

626e506

The bug was passing @_ to convert_table instead of $t.

Implemented support for subqueries in columns.

4989c81

Fixed bug introduced by allowing binds in columns.

44d727a

A code pattern emerges.

Implemented query cloning.

6623a01

By default, OM supports mutable query building. clone() allows us to extend queries without affecting the original queries. See 09_query/000_basic.t for a specific example of how this works.

Implemented support for escaping LIKE clauses.

3902aaa

Example: $column->like("Foo\\_Thing")->escape("\\")

DBIx::ObjectMapper can now operate on an externally-provided DBI handle.

c1ea2f4

We found this necessary for migration from our in-house system to OM in that we needed to share handles and transactions.

One can now call union from a metadata.

48fd8e1

Someone had a typo; this _function does not exist anywhere.

fd688f3

Be nice to externally provided DBHs.

f8c9bf8

Oracle has a concept of a recycle bin for dropped tables. Ignoring it.

42c2920

Looks like "0" was a bad assumption.

4900956

if it is union, put parenthesis around it.

c86208f

DBD::Oracle overrides to remove deprecated RULE* hint until we can up…

cbf5565

…grade to DBD::Oracle 1.68

Incorrect use of parameters to Col fixed.

c123f6f

Looks like this entire .t file never passed. It should now.

polymorphic_identity is not a number. Use string comparison.

b61abad

William Schroeder added 7 commits June 26, 2014 09:04

Dependency on Devel::Cycle is not explicitly declared; earlier versio…

5c986f1

…ns broke this test.

Fixed table inheritance assumptions.

5ad69e9

We were missing the method to map or the column we were mapping.

Merged in an Oracle HINT fix.

cd59962

Revert "First serious pass at Oracle support: Oracle supports rownum,…

d0722d9

… not limit." This reverts commit 77adef1.

Easier to do straight eq comparisons if we are always a string.

0438711

Implemented proper Oracle three-nested-select strategy for rownum.

3da2e3d

You may find two-nested-select online; ignore them. We have to support arbitrarily-constructed aggregate queries, after all. Better to preserve that than to attempt to modify it with our additional WHERE conditions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented Oracle support and a minor speed improvement #1

Implemented Oracle support and a minor speed improvement #1

wschroeder commented May 18, 2011

Implemented Oracle support and a minor speed improvement #1

Are you sure you want to change the base?

Implemented Oracle support and a minor speed improvement #1

Conversation

wschroeder commented May 18, 2011