-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MDEV-34870 - Join order optimizer hints #3878
base: bb-11.8-MDEV-35504-opt-hints-v2
Are you sure you want to change the base?
MDEV-34870 - Join order optimizer hints #3878
Conversation
…AYED Post-fix: remove unnecessary warning messages when wrep is not used.
This backports mysql/mysql-server@e2f685972985, which works around perl issue Perl/perl5#17570 MySQL fix ensures that FILE_APPEND_DATA is used in redirected stdout/stderr open flags, rather than FILE_GENERIC_WRITE. Only this gives lossless append behavior, if multiple processes use the same file.
…ffset in my_casedn_utf8mb3 The functions MY_CHARSET_HANDLER::caseup() and MY_CHARSET_HANDLER::casedn() in their virtual imlementations do "const char *end= src + srclen" in the very beginning. Therefore src cannot be NULL to avoid "UBSAN: SUMMARY: UndefinedBehaviorSanitizer: nullptr-with-offset". Adding DBUG_ASSERT(src != NULL) into all virtual implementations, to catch this problem in regular Debug builds (without UBSAN). Fixing Master_info_index::get_master_info() to check connection_name->str. If it is NULL then passing empty_clex_str into IdentBufferCasedn instead of *connection_name.
…* trx_t::check_bulk_buffer(dict_table_t*), Assertion `table->skip_alter_undo || !check_unique_secondary' failed in trx_t::check_bulk_buffer unlike CREATE TABLE, CREATE SEQUENCE always causes an implicit commit, even for temporary sequences.
…rch further does not work Since VECTOR data type is based on VARCHAR, ALTER TABLE expanding VECTOR column was allowed to go with ALGORITHM=INPLACE. However InnoDB sees such columns as VARCHAR and thus it doesn't pad updated columns data to new length. In contrast to ALGORITHM=COPY, which goes with field copy routines. With this patch ALGORITHM=INPLACE is not allowed for VECTOR columns.
Also factor out the json parsing steps into a function.
Since bulk load is fixed, in some newer versions of the server, reenable it, after checking whether MDEV-34703 was really fixed in server.
prepare for MDEV-35450 VEC_DISTANCE auto-detection
…t_arg The problem was caused by this scenario: The query had both SELECT DISTINCT and ORDER BY. DISTINCT was converted into GROUP BY. Then, vector index was used to resolve the GROUP BY. When join_read_first() initialized vector index scan, it used the ORDER BY clause instead of GROUP BY, which caused a crash. Fixed by making test_if_skip_sort_order() remember which ordering the scan produces in JOIN_TAB::full_index_scan_order, and join_read_first() using that.
…l mode - MDEV-34392(commit cc810e6) adds the check for nullability of foreign key column when foreign key relation is of UPDATE_CASCADE or UPDATE SET NULL. This check makes DDL fail when it violates foreign key nullability. This patch basically does the nullability check for foreign key column only for strict sql mode
Two-Phase ALTER added a sa_seq_no field, but `Gtid_log_event::write()`'s size calculation doesn't have an addend in its name. This patch resizes the buffer to match `write()`'s code.
partion_engine_name was not reset when looping over tables in mysql_rm_table_no_locks. This could cause maria_backup to think that at normal droped table was partitioned. This issue was discovered in 11.8 as part of atomic created and replace and only the fix was backported.
It is not clear that the three Compare_key enums form a hierarchy like alter algorithms do. So on the safe side (in case MDEV-22168 gets implemented without looking at this code) we should return NotEqual if partition engines do not return the same enum value. There's a static merge function that merges these enums, but it is used to merge the Compare_keys of different key parts (horizontally), rather than different partitions (vertically).
The conn_kind, which stands for "connection kind", is no longer useful because the HandlerSocket support is deleted and Spider now has only one connection kind, SPIDER_CONN_KIND_MYSQL. Remove conn_kind and related code. Signed-off-by: Yuchen Pei <yuchen.pei@mariadb.com> Reviewed-by: Nayuta Yanagisawa <nayuta.yanagisawa@mariadb.com>
Each spider connection is identified with a connection key, which is an encoding of the backend parameters. The first byte of the key is by default 0, and in rare circumstances it is changed to a different value: when semi_table_lock is set to 1; and when using casual read. When this happens, often a new connection is created with the new key. Neither case is useful: the description of semi_table_lock has nothing to do with creation of new connections and the parameter itself was deprecated for 10.7+ (MDEV-28829) and marked for deletion (MDEV-28830); while new threads created by non-zero spider_casual_read causes only threads to be idle, thus not achieving any gain, see MDEV-26151, and the param has also been deprecated in 11.5+ (MDEV-31789). The relevant code adds unnecessary complexity to the spider code. This change does not reduce parallelism, because already when bgs mode is on a background thread is created per partition, and there is no evidence spider creates multiple threads for one partition. If the needs of such cases arise it will be a separate issue.
Factoring out parts and documentation. Does not change the logic of the code.
Update conn->queue_connect_share in spider_check_trx_and_get_conn to avoid use-after-free. There are two branches in spider_check_trx_and_get_conn, often called at the beginning of a spider DML, depending on whether an update of various spider fields is needed. If it is determined to be needed, the updating may NULL the connections associated with the spider handler, which subsequently causes a call to spider_get_conn() which updates conn->queued_connect_share with the SPIDER_SHARE associated with the spider handler. We make it so that conn->queued_connect_share is updated regardless of the branch it enters, so that it will not be a stale and potentially already freed one.
… on XA RECOVER With UBSAN builds the function my_string_repertoire_8bit() failed on "runtime error: applying zero offset to null pointer" when NULL wad passed as the str parameter. Fix: test str for NULL, and return MY_REPERTOIRE_ASCII if str is NULL. MTR: This problem made MTR tests - main.xa_sync - innodb.xa_debug - main.xa fail with the nullptr-with-offset UNSAN error. After this commit these tests do not fail anymore. This commit does not need any new MTR tests.
…haracter set "utf8mb4" Map Unix utf8 locales to utf8mb4 instead of utf8mb3.
…CHECKSUM TABLE CHECKSUM TABLE causes variety of crashes when killed. This bug it not specific to PERFORMANCE_SCHEMA. Removed duplicate handler::ha_rnd_end() call.
row_ins_cascade_calc_update_vec(): Skip any virtual columns in the update vector of the parent table. Based on mysql/mysql-server@0ac1764 Reviewed by: Debarun Banerjee
…d(*this)' failed in Item_func_or_sum::do_build_clone Added missing do_get_copy of Item_func_current_user.
Workaround build bugs with preprocessor flags du-jour 1. OPENSSL_ALL does not work anymore (error in ssl.h) nor WOLFSSL_MYSQL_COMPATIBLE, would work, when building library. 2. OPENSSL_EXTRA has to be used instead of OPENSSL_ALL now. WOLFSSL_MYSQL_COMPATIBLE needs to be used to workaround their conflicting definition of protocol_version, which is used in server code. 3. -D_CRT_USE_CONFORMING_ANNEX_K_TIME to force C11-correct definition of gmtime_s on Windows, set some other flags WOLFSSL_MYSQL_COMPATIBLE was previously defining 4. Use HAVE_EMPTY_AGGREGATES=0 to workaround build error on clang (error: struct has size 0 in C, size 1 in C++ [-Werror,-Wextern-c-compat] WOLF_AGG_DUMMY_MEMBER;)
innodb_convert_name(): Convert a schema or table name to my_charset_filename compatible format. dict_table_lookup(): Replaces dict_get_referenced_table(). Make the callers responsible for invoking innodb_convert_name(). innobase_casedn_str(): Remove. Let us invoke my_casedn_str() directly. dict_table_rename_in_cache(): Do not duplicate a call to dict_mem_foreign_table_name_lookup_set(). innobase_convert_to_filename_charset(): Defined static in the only compilation unit that needs it. dict_scan_id(): Remove the constant parameters table_id=FALSE, accept_also_dot=TRUE. Invoke strconvert() directly. innobase_convert_from_id(): Remove; only called from dict_scan_id(). innobase_convert_from_table_id(): Remove (dead code). table_name_t::dblen(), table_name_t::basename(): In non-debug builds, tolerate names that may miss a '/' separator. Reviewed by: Debarun Banerjee
enum rename_fk: Replaces the "bool use_fk" parameter of row_rename_table_for_mysql() and innobase_rename_table(): RENAME_IGNORE_FK: Replaces use_fk=false when the operation cannot involve any FOREIGN KEY constraints, that is, it is a partitioned table or an internal table for FULLTEXT INDEX. RENAME_REBUILD: Replaces use_fk=false when the table may contain FOREIGN KEY constraints, which must not be modified in the data dictionary tables SYS_FOREIGN and SYS_FOREIGN_COLS. RENAME_ALTER_COPY: Replaces use_fk=true. This is only specified in ha_innobase::rename_table(), which may be invoked as part of ALTER TABLE…ALGORITHM=COPY, but also during RENAME TABLE. An alternative value RENAME_FK could be useful to specify in ha_innobase::rename_table() when it is executed as part of CREATE OR REPLACE TABLE, which currently is not an atomic operation. Reviewed by: Debarun Banerjee
…ffset to null pointer in check_rules and in init_weight_level Rewriting loops in check_rules() and init_weight_level() in the way to avoid UBSAN nullptr-with-offset error. No MTR test are needed - the reported failures disappeared from "mtr" output when running an UBSAN compiled build.
…action holding S-lock gets X-lock first" fix from MySQL to MariaDB This commit implements mysql/mysql-server@7037a0b functionality. If some transaction 't' requests not-gap X-lock 'Xt' on record 'r', and locks list of the record 'r' contains not-gap granted S-lock 'St' of transaction 't', followed by not-gap waiting locks WB={Wb1, Wb2, ..., Wbn} conflicting with 'Xt', and 'Xt' does not conflict with any other lock, located in the list after 'St', then grant 'Xt'. Note that insert-intention locks are also gap locks. If some transaction 't' holds not-gap lock 'Lt' on record 'r', and some other transactions have not-gap continuous waiting locks sequence L(B)={L(b1), L(b2), ..., L(bn)} following L(t) in the list of locks for the record 'r', and transaction 't' requests not-gap, what means also not insert intention, as ii-locks are also gap locks, X-lock conflicting with any lock in L(B), then grant the. MySQL's commit contains the following explanation of why insert-intention locks must not overtake a waiting ordinary or gap locks: "It is important that this decission rule doesn't allow INSERT_INTENTION locks to overtake WAITING locks on gaps (`S`, `S|GAP`, `X`, `X|GAP`), as inserting a record into a gap would split such WAITING lock, violating the invariant that each transaction can have at most single WAITING lock at any time." I would add to the explanation the following. Suppose we have trx 1 which holds ordinary X-lock on some record. And trx 2 executes "DELETE FROM t" or "SELECT * FOR UPDATE" in RR(see lock_delete_updated.test and MDEV-27992), i.e. it creates waiting ordinary X-lock on the same record. And then trx 1 wants to insert some record just before the locked record. It requests insert-intention lock, and if the lock overtakes trx 2 lock, there will be phantom records for trx 2 in RR. lock_delete_updated.test shows how "DELETE" allows to insert some records in already scanned gap and misses some records to delete. The current implementation differs from MySQL implementation. There are two key differences: 1. Lock queue ordering. In MySQL all waiting locks precede all granted locks. A new waiting lock is added to the head of the queue, a new granted lock is added to the end of the queue, if some waiting lock is granted, it's moved to the end of the queue. In MariaDB any new lock is added to the end of the queue and waiting lock does not change its position in the queue where the lock is granted. The rule is that blocking lock must be located before blocked lock in lock queue. We maintain the rule with inserting bypassing lock just before bypassed one. 2. MySQL implementation uses some object(locksys::Trx_locks_cache) which can be passed to consecutive calls to rec_lock_has_to_wait() for the same trx and heap_no to cache the result of checking if trx has a granted lock which is blocking the waiting lock(see locksys::Trx_locks_cache::has_granted_blocker()). The current implementation does not use such object, because it looks for such granted lock on the level of lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). I.e. there is no need in additional lock queue iteration in locksys::Trx_locks_cache::has_granted_blocker(), as we already iterate it in lock_rec_other_has_conflicting() and lock_rec_has_to_wait_in_queue(). During the testing the following case was found. Suppose we have delete-marked record and going to do inplace insert into that delete-marked record. Usually we don't create explicit lock if there are no conlicting with not gap X-lock locks(see lock_clust_rec_modify_check_and_lock(), btr_cur_update_in_place()). The implicit lock will be converted to explicit one by demand. That can happen during INSERT, the not-gap S-lock can be acquired on searching for duplicates(see row_ins_duplicate_error_in_clust()), and, if delete-marked record is found, inplace insert(see btr_cur_upd_rec_in_place()) modifies the record, what is treated as implicit lock. But there can be a case when some transaction trx1 holds not-gap S-lock, another transaction trx2 creates waiting X-lock, and then trx2 tries to do inplace insert. Before the fix the waiting X-lock of trx2 would be conflicting lock, and trx1 would try to create explicit X-lock, what would cause deadlock, and one of the transactions whould be rolled back. But after the fix, trx2 waiting X-lock is not treated as conflicting with trx1 X-lock anymore, as trx1 already holds S-lock. If we don't create explicit lock, then some other transaction trx3 can create it during implicit to explicit lock conversion and place it at the end of the queue. So there can be the following locks order in the queue: S1(granted) X2(waiting) X1(granted) The above queue is not valid, because all granted trx1 locks must be placed before waiting trx2 lock. Besides, lock_rec_release_try() can remove S(granted, trx1) lock and grant X lock to trx 2, and there can be two granted X-locks on the same record: X2(granted) X1(granted) Taking into account that lock_rec_release_try() can release cell and lock_sys latches leaving some locks unreleased, the queue validation function can fail in any unexpected place. It can be fixed with two ways: 1) Place explicit X(granted, trx1) lock before X(waiting, trx2) lock during implicit to explicit lock conversion. This option is implemented in MySQL, as granted lock is always placed at the top of locks queue, and waiting locks are placed at the bottom of the queue. MariaDB does not do this, and implementing this variant would require conflicting locks search before converting implicit to explicit lock, what, in turns, would require cell and/or lock_sys latch acquiring. 2) Create and place X(granted, trx1) lock before X(waiting, trx2) during inplace INSERT, i.e. when lock_rec_lock() is invoked from lock_clust_rec_modify_check_and_lock() or lock_sec_rec_modify_check_and_lock(), if X(waiting, trx2) is bypassed. Such a way we don't need in additional conflicting locks search, as they are searched anyway in lock_rec_low(). This fix implements the second variant(see the changes around c_lock_info.insert_after in lock_rec_lock). I.e. if some record was delete-marked and we do inplace insert in such a record, and some lock for bypass was found, create explicit lock to avoid conflicting lock search on each implicit to explicit lock conversion. We can remove it if MDEV-35624 is implemented. lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue(): search locks to bypass along with conflicting locks searching in the same loop. The result is returned in conflicting_lock_info object. There can be several locks to bypass, only the first one is returned to limit lock_rec_find_similar_on_page() with the first bypassed lock to preserve "blocking before blocked" invariant. conflicting_lock_info also contains a pointer to the lock, after which we can insert bypassing lock. This lock precedes bypassed one. Bypassing lock can be next-key lock, and the following cases are possible: 1. S1(not-gap, granted) II2(granted) X3(waiting for S1), When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(not-gap, granted) II2(granted) X1(ordinary, granted) X3(waiting for S1) If we had inserted new X1 lock just after S1, and S1 had been released on transaction commit or rollback, we would have the following sequence in the locks queue: X1(ordinary, granted) II2(granted) X3(waiting for X1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is not a real issue as II lock once granted can be ignored but it could possibly hit some assert(taking into account that lock_release_try() can release lock_sys latch, and other threads can acquire the latch and validate lock queue) as it breaks our design constraint that any granted lock in the queue should not conflict with locks ahead in the queue. But lock_rec_queue_validate() does not check the above constraint. We place new bypassing lock just before bypassed one, but there still can be the case when lock bitmap is used instead of creating new lock object(see lock_rec_add_to_queue() and lock_rec_find_similar_on_page()), and the lock, which owns the bitmap, can precede II2(granted). We can either disable lock_rec_find_similar_on_page() space optimization for bypassing locks or treat "X1(ordinary, granted) II2(granted)" sequence as valid. As we don't currently have the function which would fail on the above sequence, let treat it as valid for the case, when lock_release() execution is in process. 2. S1(ordinary, granted) II2(waiting for S1) X3(waiting for S1) When new X1(ordinary) lock is acquired, there will be the following locks queue: S1(ordinary, granted) II2(waiting for S1) X1(ordinary, granted) X3(waiting for S1). After S1 releasing there will be: II2(granted) X1(ordinary, granted) X3(waiting for X1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The above queue is valid because ordinary lock does not conflict with II-lock(see lock_rec_has_to_wait()). lock_rec_create_low(): insert new lock to the position which lock_rec_other_has_conflicting(), lock_rec_has_to_wait_in_queue() returned if the lock is bypassing. lock_rec_find_similar_on_page(): add ability to limit similiar lock search with the certain lock to preserve "blocking before blocked" invariant for all bypassed locks. lock_rec_add_to_queue(): don't treat bypassed locks as waiting ones to let lock bitmap reusing for bypassing locks. lock_rec_lock(): fix inplace insert case, explained above. lock_rec_dequeue_from_page(), lock_rec_rebuild_waiting_queue(): move bypassing lock to the correct place to preserve "blocking before blocked" invariant. Reviewed by: Debarun Banerjee, Marko Mäkelä.
Done |
Fixed name resolution for such scenarios |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, it's hard to review opt_hints.cc as diff is a total mess.
Please check this out: adds comments, adds questions in the comments: https://gist.github.com/spetrunia/13e9b3b7cd1d887e526a3a7a0289c1d1
OK, applied the patch |
Please apply this: |
09d6e3a
to
0c9bf1f
Compare
Implementing a recursive descent parser for optimizer hints.
This commit introduces: - the infrastructure for optimizer hints; - hints for join buffering: BNL(), NO_BNL(), BKA(), NO_BKA(); - NO_ICP() hint for disabling index condition pushdown; - MRR(), MO_MRR() hint for multi-range reads control; - NO_RANGE_OPTIMIZATION() for disabling range optimization; - QB_NAME() for assigning names for query blocks.
- Using Lex_ident_sys to scan identifiers, like the SQL parser does. This fixes handling of double-quote-delimited and backtick-delimited identifiers, as well as handling of non-ASCII identifiers. Unescaping and converting from the client character set to the system character set is now done using Lex_ident_cli_st and Lex_ident_sys, like it's done in the SQL tokenizer/parser. Adding helper methods to_ident_cli() and to_ident_sys() in Optimizer_hint_parser::Token. - Fixing the hint parser to report a syntax error when an empty identifiers: SELECT /*+ BKA(``) */ * FROM t1; - Moving a part of the code from opt_hints_parser.h to opt_hints_parser.cc Moving these method definitions: - Optimizer_hint_tokenizer::find_keyword() - Optimizer_hint_tokenizer::get_token() to avoid huge pieces of the code in the header file. - A Lex_ident_cli_st cleanup Fixing a few Lex_ident_cli_st methods to return Lex_ident_cli_st & instead of void, to use them easier in the caller code. - Fixing the hint parser to display the correct line number Adding a new data type Lex_comment_st (a combination of LEX_CSTRING and a line number) Using it in sql_yacc.yy - Getting rid of redundant dependencies on sql_hints_parser.h Moving void LEX::resolve_optimizer_hints() from sql_lex.h to sql_lex.cc Adding a class Optimizer_hint_parser_output, deriving from Optimizer_hint_parser::Hint_list. Fixing the hint parser to return a pointer to an allocated instance of Optimizer_hint_parser_output rather than an instance of Optimizer_hint_parser::Hint_list. This allows to use a forward declaration of Optimizer_hint_parser_output in sql_lex.h and thus avoid dependencies on sql_hints_parser.h.
Forbid adding optimizer hints to view definitions. In the case when optimizer hints are added to the view definition at a `CREATE (OR REPLACE) VIEW`/`ALTER VIEW` statement, a warning is generated and the hints are ignored. This commit also disables ps-protocol for test cases where `Unresolved table/index name` warnings are generated. The reason for this is such warnings are generated during both PREPARE and EXECUTE stages. Since opt_hints.test has `--enable_prepare_warnings`, running it with `--ps-protocol` causes duplication of warning messages
join_cache_level=0 disables join cache buffers, but the hint BNL() now allows to employ BNL(H) buffers for particular tables or query blocks. This commit also adds a number of test cases including OUTER JOINs to make sure hints do not break the rules of join buffers application
BNL() hint effectively increases join_cache_level up to 4 if it is set to value less than 4. This commit also makes the BKA() hint override not only `join_cache_bka` optimizer switch but `join_cache_level` as well. I.e., BKA() hint enables BKA and BKAH join buffers both flat and incremental despite `join_cache_level` and `join_cache_bka` setting.
It places a limit N (a timeout value in milliseconds) on how long a statement is permitted to execute before the server terminates it. Syntax: SELECT /*+ MAX_EXECUTION_TIME(milliseconds) */ ... Only top-level SELECT statements support the hint.
Since BNL() hint does not specify which whether hashed or non-hashed join cache should be employed, allow usage of hashed ones where possible
… max_statement_time
- add a comment that opt_hints_global->check_unresolved() is never called - improve comments - rename everything with "resolved_children" to "fully_resolved_children" - Opt_hints_table::adjust_key_hints() now returns value - less "reach-back-to-parent" logic - rename Hint "adjustment" and "resolution" (yes, both terms were used) to "fixing". "Resolution" is already used for parse-tree objects
1. Disable opt_hint_timeout.test for embedded server. Make sure the test does not crash even when started for embedded server. Disable view-protocol since hints are not supported inside views. 2. Hints are designed to behave like regular /* ... */ comments: `SELECT /*+ " */ 1;` -- a valid SQL query However, the mysql client program waits for the closing doublequote character. Another problem is observed when there is no space character between closing `*/` of the hint and the following `*`: `SELECT /*+ some_hints(...) */* FROM t1;` In this case the client treats `/*` as a comment section opening and waits for the closing `*/` sequence. This commit fixes all of these issues
- remove get_args_printer() from hints printing - add append_hint_arguments(THD *thd, opt_hints_enum hint, String *str) - add more comments - rename st_opt_hint_info::hint_name to hint_type - add pptimizer trace support for hints - add dbug_print_hints() - make print_warn() not be a template - introduce Printable_parser_rule interface, make grammar rules that emit warnings implement it and print_warn invokes its function) - remove Parser::Hint::append_args() as it is not used anywhere (it used to be necessary call print_warn(... (Parser::Hint*)NULL);
0c9bf1f
to
4bb6895
Compare
This commit implements optimizer hints allowing to affect the order of joining tables: - JOIN_FIXED_ORDER similar to existing STRAIGHT_JOIN hint; - JOIN_ORDER to apply the specified table order; - JOIN_PREFIX to hint what tables should be first in the join; - JOIN_SUFFIX to hint what tables should be last in the join.
4bb6895
to
127f28a
Compare
Description
This commit implements optimizer hints allowing to affect the order of joining tables:
- JOIN_FIXED_ORDER similar to existing STRAIGHT_JOIN hint;
- JOIN_ORDER to apply the specified table order;
- JOIN_PREFIX to hint what tables should be first in the join;
- JOIN_SUFFIX to hint what tables should be last in the join
Release Notes
TODO
How can this PR be tested?
opt_hints_join_order.test is added
If the changes are not amenable to automated testing, please explain why not and carefully describe how to test manually.
Basing the PR against the correct MariaDB version
main
branch.PR quality check