Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Oracle support and a minor speed improvement #1

Open
wants to merge 37 commits into
base: master
Choose a base branch
from

Conversation

wschroeder
Copy link

Hello!

Please accept my implementation of an Oracle driver. All the tests pass, and I have started to use this project successfully with our Oracle database.

Members of my group are excited about your project. The code is well-designed and clean with decent separation of concerns.

I also ran a performance test where I downloaded 140K rows of data. The last commit description goes into some detail on what I saw and some ideas on direction. I forgot to mention this: The add_before_method_modifier in Mapper.pm ultimately causes an additional 13 µs per $getter call in get_val(), which adds up in the tight loop during the 140K row pull. I wonder if there is a way to restructure the code to acquire enough column information before creating objects with each row. In comparison, DBIx::Class can map 140K rows to objects in about 8 seconds.

Another future thought is dumping autoloaded tables to files as a helpful schema-dump extension.

Please let me know your thoughts on these things as well as your vision of the future. I look forward to working with you to improve this project.

William Schroeder and others added 30 commits May 18, 2011 16:18
This removes the ()s so that we can treat the type name separately from its
size.
Different databases support different sets of types and sometimes need to be
treated differently.  For example, Oracle has some requirements on how to pass
blob data into an insert, whereas other databases can accept it as a hexed
value.
…s to prepare().

This helps with security concerns and sets the stage for Oracle blob support.
I ensured Postgres's blobs are bound correctly.
This requires changes to how "where" SQL is generated.  We still support the
limit interface in the abstract API, but in Oracle's case, the raw SQL
generation will translate to the correct sections.

Oracle tests are coming in the next commit.
I copied tests from Pg and reduced them as necessary.  I am not testing all
possible types at this time, and they should eventually be added, both to the
tests and to the type/storage system.
Example for Oracle: MYSCHEMA.USER@DB2
Format: ${db_schema}.${table_name}@${connect_identifier}
This required a few fundamental changes to LIMIT usage and parameter binding.
…op speed.

In my environment, this dropped us from around 4 minutes for all() 140K rows
in session queries to closer to 3 minutes.  The same query took about 12
seconds when constructed directly from the lower level metadata API.

I saw room for more improvements like this with regard to functions performed
in the critical loop (the map { callback } @$results in the iterator).  This
is a critical loop.  On the other hand, if someone is fetching 140K rows as
mapped objects, they might be approaching the task incorrectly.

Another thought is to extend the API a little:

  $query->execute->all_arrayref  # make all() thin by calling this and then map
  $query->execute->all_hashref   # derive column names, make hashes, no bless

Experiments revealed that this kind of to-the-metal query only takes 3 seconds.
Example load code:

  $mapper->metadata->table($table_name => {
       driver => $mapper->engine->driver,
       column_info => [
           {
               'name'        => 'FOO_ID',
               'default'     => undef,
               'type'        => 'number',
               'is_nullable' => 0,
               'size'        => '15'
           },
           {
               'name'        => 'BAR_ID',
               'default'     => undef,
               'type'        => 'number',
               'is_nullable' => 0,
               'size'        => '15'
           },
           {
               'name'        => 'BAZ_ID',
               'default'     => undef,
               'type'        => 'number',
               'is_nullable' => 0,
               'size'        => '10'
           }
       ],
       primary_key => ['BAR_ID', 'BAZ_ID', 'FOO_ID'],
       foreign_key => [
           {
               'keys'  => ['BAR_ID'],
               'refs'  => ['BAR_ID'],
               'table' => 'BAR'
           },
           {
               'keys'  => ['FOO_ID'],
               'refs'  => ['FOO_ID'],
               'table' => 'FOO'
           },
           {
               'keys'  => ['BAZ_ID'],
               'refs'  => ['BAZ_ID'],
               'table' => 'BAZ'
           }
       ],
       unique_key => [['BAZR_I' => ['BAR_ID', 'BAZ_ID', 'FOO_ID']]],
  });
The bug was passing @_ to convert_table instead of $t.
Although I may alias my tables, I still want to know the original table of
each of the columns.  This is useful for relation-based typing, wherein I need
to get the header ($query->builder->column) to further interpret the results.

Side note: It is inconsistent that a different storage strategy is used for
column aliases vs table aliases.
By default, OM supports mutable query building.  clone() allows us to extend
queries without affecting the original queries.  See 09_query/000_basic.t for
a specific example of how this works.
The two new functions serve a few purposes.

First, they follow the principle of least surprise.  The join() and add_join()
functions default to "LEFT OUTER" joins, as opposed to SQL default of "INNER".
Since their names are explicit, a user knows exactly what will be added to the
join clauses.

Second, since the join type is in the function name, the parameters may be
less awkward.  Compare:

  $query->add_join(
      [$his_table => [$id == 1], 'INNER'],
      [$her_table => [$id == 2], 'INNER'],
  )

to:

  $query->add_inner_join(
      $his_table => [$id == 1],
      $her_table => [$id == 2],
  )

Third, there are really only two semantically distinct kinds of joins in SQL:
inner and left.  There is also "right", but it is rarely used and a simple
transform of the parameters.  Thus, the two functions serve the purpose of
narrowing down the options.  A user may fall back to using "add_join" instead
if other joins are important.  Also, "left" always implies "outer", so there
is no need to include the keyword.

Fourth, there is a distinct lack of inner_join() and left_join() functions.  I
noticed when people play with the query-building API, they expect fresh
instances of the method calls to ADD to the clause slots, not replace them.
So, while respecting the original API's behavior, I have intentionally
neglected the slot-replacement, especially since adds work on empty slots.  My
general recommendation to my clients will be to always use "add_", which I
believe should be a default behavior with "replace_" being the less-used but
still available behavior.
Example:

  $column->like("Foo\\_Thing")->escape("\\")
We found this necessary for migration from our in-house system to OM in that we
needed to share handles and transactions.
This package was not what the use cases actually called for.  We needed some
specialized object-specific merging that preserved the original blessed
references.  Fixed the Column manipulations in the Metadata::Table to use the
more specific merge strategy and removed a package dependency (yay!).
Using Module::Find makes the system run slower with additional I/O, and it is
simply magic.  Considering that OM::Metadata::Sugar dynamically creates
functions based on the types, we are running the risk of a type with the same
name as a Perl built-in.  This entire dynamic method-building plus
module-finding tends to be a design smell.  This change is the first step
towards a safer system.

I initially ran into problems with this because I had a global version of OM
installed in addition to linking to my local repo lib, so Sugar actually found
two classes for each type and spit out warnings about function redefinition.
My first inclination was to use List::MoreUtils::uniq, but that fixed the
symptom, not the problem.
Looks like this entire .t file never passed.  It should now.
William Schroeder added 7 commits June 26, 2014 09:04
We were missing the method to map or the column we were mapping.
You may find two-nested-select online; ignore them.  We have to support
arbitrarily-constructed aggregate queries, after all.  Better to preserve that
than to attempt to modify it with our additional WHERE conditions.
Oracle's ALL_CONSTRAINTS and ALL_CONS_COLUMNS tables are not indexed or
optimized for query.  DBD::Oracle's foreign_key_info can take anywhere between
4 to 18 seconds to get information for a single table, which is dreadfully
unacceptable when one has a database with almost 1000 tables.  This new
algorithm sacrifices some client memory in exchange for an 11x speedup.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant