Thursday, November 15, 2012

Changes in Twitter MySQL 5.5.28.t8

Earlier this week we pushed to Github the eighth iteration of Twitter MySQL. Here are some of the highlights. But, before that, a quick plug. If you are looking for a new opportunity and enjoy working on database internals, we should talk!

Bugs Fixed

  • Earlier in MySQL 5.5 development cycle, the SQL_LOG_BIN variable was made to be both global and session-scoped, instead of only session as it was in previous releases. We believe that usage of SQL_LOG_BIN at the global scope can be quite dangerous and offers little to no benefit, hence we made SQL_LOG_BIN once again a session-only variable, generating an error if it is used with SET GLOBAL.
  • If a page being read into the buffer pool is made "young" (moved to the head of the LRU), its time of first access was not being properly set. When the page eventually gets evicted, it would be incorrectly counted as a read-ahead page that was evicted without having been accessed. This lack of access time could also affect linear read-ahead, potentially causing InnoDB to calculate multiple times whether to trigger read-ahead whenever the page was accessed.

Functionality Added or Changed

  • In some scenarios, it might be necessary to purge the binary logs more frequently than once a day. In these cases, expire_logs_days cannot be used for automatic binary log file removal because of its whole-day resolution. In order to allow for more fine-grained control over the automatic removal period, the type of the expire_logs_days variable was changed to decimal where the fractional part represents the fraction of one day. For example, 12 hours can be expressed as 0.5 day.
  • When binary logging is enable, the thread state is set to "query end" most recently before writing to the binary log, so any time taken in that (due to contention or I/O) shows up as "query end", which makes it difficult to determine what is the current state of the thread.

    This change introduces new and more appropriate thread states that are set in any binary log related activities that may take time. For example, now whenever an event is being writing to the binary log, the thread state is set to "Writing an event to the binary log".
  • Although the INNODB_BUFFER_PAGE table shows the number of records within a page, it is not clear whether these records are row data or internal b-tree pointers. For this reason, a new LEVEL column is introduced which shows the level of a page in the B-tree.

    Additionally, the direction of insertion encoded in InnoDB data pages is now available in the DIRECTION column, and the number of consecutive inserts in that direction is available in the N_DIRECTION column. These columns are useful to determine whether records are being inserted in an ascending or descending sequence.
  • When extending a clustered index to the left or right, InnoDB attempts to fill leaf pages only up to a certain percentage (known as the fill factor), at which point the page is split. The remaining space is reserved for updating rows belonging to the page, attempting to make it likely that an updated row will be stored on the same page. This change introduces a new variable name innodb_index_fill_factor that can be used to control the fill factor for clustered indexes. Its value specifies the percentage of space on a leaf page to be used, reserving the remainder as free space.
  • While an alter table is executing, writes to the table whose definition is being changed are blocked until the operation completes or a lock wait timeout occurs. In a sharded system, it would be better if conflicting operations were aborted rather than blocked, thereby eliminating delays due to lock waits and allowing them to be redirect to another shard.

    In order to allow for this behavior, this change introduces a new alter table option named NO_WAIT. When set, conflicting locks requests that would normally block are aborted instead.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.

Thursday, July 26, 2012

Changes in Twitter MySQL 5.5.24.t7

Earlier this week we pushed to Github the seventh iteration of Twitter MySQL. Here are some of the highlights.

Bugs Fixed

Functionality Added or Changed

  • The execution of row-based events in a replication slave is not reflected in the SQL Thread state, making it rather difficult to identify long-running events. For example, the execution of a large row-by-row delete event is not immediately identifiable as the replication SQL thread state for most of its duration stays as "Reading event from the relay log".

    This change adds two new states to the replication slave SQL thread that are used to highlight the event that the SQL thread is executing and, for row-based events, to indicate how many rows have been applied. These will generally look like:
    Executing Delete_rows event at position 100
    Handling record 50 of 100 for a Delete_rows event
    
  • Although Table_map events include some column metadata information, such as type and length, it lacks information such as column names, whether an integer type is signed or unsigned, or the character set of string types. This makes it rather difficult for external programs to extract meaningful data from row-based events.

    In order to enable external programs to fully deduce the table/column definitions, a new Table_metadata event was introduced. The purpose of this event is to describe the structure and other properties of the table and its columns, such as column name, SQL type name, character set, etc.
  • The currently available fields that compromise the header of error log messages, which are the date and time, plus the severity of the message, do not provide the enough information to identify discontinuity in reporting or which process generated a message.

    To make such identification possible, this change includes the process ID in the header so that any changes in its value will identify the source (process) of the message and also indicate any discontinuity in reporting. Also, the format of the date and time is changed to the ISO format (YYYY-MM-DD hh:mm:ss).
  • Even though the creation of non-deterministic functions can be allowed by enabling log_bin_trust_function_creators, enabling the option also implies that executing such functions is allowed even if the binary log format is set to statement. If the binary log format is set to row-based, creating and executing non-deterministic functions shouldn't pose any problems. Hence, this change allows non-deterministic functions to be created when the binary log format is set to row-based. If said functions are invoked at runtime when the binary log format is statement-based, they will fail with an error.
  • Change the install name of the mysqlclient shared library to use an absolute pathname instead of just the library filename. This allows Apple's Mac OS X dynamic linker to locate the library even if the path where it is installed is not included in the default search path.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.

Wednesday, June 13, 2012

Changes in Twitter MySQL 5.5.23.t6

Earlier this week we pushed to Github the sixth iteration of Twitter MySQL. Here's some highlights:

Bugs Fixed

  • If a permanently corrupted page was read, InnoDB would forever keep trying to read the page.
  • Reading from a partitioned table using a impossible (e.g. where, between, etc) condition could lead to a crash if all partitions are pruned and the optimizer attempts to obtain the number of rows in the table. The problem occurs because even though no partitions are used, the optimizer might still attempt to perform (e.g. range) analysis where an estimate of the the number of rows is needed, but the partition storage engine wasn't prepared to provide it.

Functionality Added or Changed

  • Add partitioning functions for YMD and YMDH or equivalent
    Added a UTC_EXTRACT(unit FROM date) function that extracts parts from a date and returns them as an integer in a specified format. The function is modeled after EXTRACT(), but differs in that it uses UTC for all of its calculations. The supported unit specifiers are YEAR_MONTH_DAY and YEAR_MONTH_DAY_HOUR, which returns a representation of the date argument as a value in the YYYYMMDD or YYYYMMDDHH format.
    This function is intended to facilitate partitioning by days, specially when applied to TIMESTAMP values. It allows partitions to be named in a more human-readable format like YYYYMMDD.
  • Export the last know good binlog position as a status variable from InnoDB
    Introduced two new status variables that export the master binary log name and position of a slave as stored by InnoDB. Whenever the SQL thread commits a transaction, InnoDB also commits the master binary log name and position to the system tablespace. Now this information can be retrieved through the Innodb_mysql_master_log_file and Innodb_mysql_master_log_pos status variables.

    It is worth noting that the position stored within InnoDB might not match the actual position of the SQL thread. The position is only updated when the SQL thread applies an event that uses an InnoDB table and causes a read-write transaction to be committed. For example, an UPDATE statement which does not change any rows is considered a read-only transaction. 
  • Report counters for InnoDB corruption events so that they are alertable
    Introduced status variables that expose a counter of page corruption and table corruption events so that they can be alerted on without scanning the error log files.
  • Improve obscure "Got error -1 when reading table" messages
    The problem was that handler originated errors printed to the error log sometimes do not indicate the precise nature of the problem. This is especially an issue with InnoDB because it cannot always convert an internal InnoDB error code to a MySQL error code.
    InnoDB now provides an additional (and more detailed) error message if it cannot convert the error code. Additionally, errors related to a statement being interrupted are no longer printed to the error log.
    Also, the error message format was changed from "Got error when reading table" to "Error when reading table: error message (error number)" in order to distinguish the offending error message.
  • Google Breakpad crash reporting system
    Google's Breakpad is now used to generate minimal crash dump information when the server process (mysqld) crashes. The minidump file generated by breakpad contains a list of the executable and shared libraries loaded in the process, the state of the processor register and a stack trace for each thread, and miscellaneous information about the system and the reason for the dump. Minidumps are significantly smaller than core files, making them more practical for collection and processing.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.