Showing posts with label MySQL. Show all posts
Showing posts with label MySQL. Show all posts

Saturday, January 5, 2013

Changes in Twitter MySQL 5.5.28.t9

Earlier this week we pushed to Github the ninth iteration of Twitter MySQL. Here are some of the highlights.

Bugs Fixed

  • InnoDB's B+ tree page split algorithm that attempts to optimize for sequential inserts might end up causing poor space utilization depending on the distribution pattern of index key values throughout the index. For example, if an insert that causes a page to be split is inserting a key value that is an immediate successor or predecessor to the last inserted key value in the same page, the insertion point is used as the split point irrespective of the actual distribution of values in the page.

    The solution is to use the standard B+ tree split algorithm while still preserving some form of optimization for sequential inserts. When a page needs to be split, the median key value in a page is used as the split point so that the data is distributed in a more symmetric fashion. A new variable named innodb_index_page_split_mode is introduced to provide a way to control the page split behavior.
  • Once the segments (indexes) of a tablespace are bigger than 32 pages, fragment pages are no longer allocated for use, yet they are still reserved whenever a new fragment extent is allocated (usually every 16384 pages). This is a limitation due to the fact that a segment can only allocate up to 32 fragment pages since the array used to track fragment pages belonging to a segment is limited to 32 entries.

    The solution is to allow for fragment extents to be leased to segments whenever there are free fragment extents available. A fragment extent is considered available if the only used pages in the extent are the extent descriptor and ibuf bitmap pages. A new extent state is used to tag leased extents and to ensure that they are returned to the space free fragment list once no longer being used by a segment.

    See Page management in InnoDB space files for an in-depth description of extents, extent descriptors and fragments.
  • When performing operations that are expected to expand a table (for example, allocate new pages due to a page being split), InnoDB currently preallocates and reserves up to 1% of the total size of the tablespace as a measure to ensure that enough free extents (that is, disk space) are available for the operation and to ensure that if running out of disk space, these operations are preemptively failed as to reserve any remaining free space to operations that end up freeing space (that is, delete data).

    The percentage is reasonable for tables smaller than a few gigabytes, but not for tables sized at tens of gigabytes or more, at which point the percentage won't correctly estimate the free space needed to perform operations and may cause an excessive amount of free extents to be preallocated.

    This change introduces two new system variables to enable/disable free extents reservation and to control the amount of free extents that is reserved for such operations. The variable innodb_reserve_free_extents can be used to enable or disable free extents reservation and innodb_free_extents_reservation_factor can be used to control what percentage of a space size is reserved for operations that may cause more space to be used.

Functionality Added or Changed

  • Currently Innodb_page_merges counts only merge attempts but there is no metric for successful merges. This change introduces a new status variable named Innodb_page_merges_succeeded which indicates the number of successful page merge operations (that is, the number of pages successfully merged into another page).

    Additionally, this change also introduces a new status variable named Innodb_page_discards which represents the number of pages that have become empty and are thus discarded.
  • Augment the server plugin interface to allow plugins to define and expose floating-point system variables of type double. The convenience macros MYSQL_SYSVAR_DOUBLE and MYSQL_THDVAR_DOUBLE are introduced and can be used by plugins to declare system variables of type double.
  • Since the command-line option parsing interface (my_getopt) uses fields of type unsigned long long to store these values, the double values were being stored in a lossy way that discards the fractional part.

    This change allows the default, minimum and maximum values of system variables of type double to have a meaningful fractional part by to storing the raw representation of a double value in the raw bits of an unsigned long long field in a way that the binary representation remains the same. Hence, the actual value can be passed back and forth between the types.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.

Thursday, November 15, 2012

Changes in Twitter MySQL 5.5.28.t8

Earlier this week we pushed to Github the eighth iteration of Twitter MySQL. Here are some of the highlights. But, before that, a quick plug. If you are looking for a new opportunity and enjoy working on database internals, we should talk!

Bugs Fixed

  • Earlier in MySQL 5.5 development cycle, the SQL_LOG_BIN variable was made to be both global and session-scoped, instead of only session as it was in previous releases. We believe that usage of SQL_LOG_BIN at the global scope can be quite dangerous and offers little to no benefit, hence we made SQL_LOG_BIN once again a session-only variable, generating an error if it is used with SET GLOBAL.
  • If a page being read into the buffer pool is made "young" (moved to the head of the LRU), its time of first access was not being properly set. When the page eventually gets evicted, it would be incorrectly counted as a read-ahead page that was evicted without having been accessed. This lack of access time could also affect linear read-ahead, potentially causing InnoDB to calculate multiple times whether to trigger read-ahead whenever the page was accessed.

Functionality Added or Changed

  • In some scenarios, it might be necessary to purge the binary logs more frequently than once a day. In these cases, expire_logs_days cannot be used for automatic binary log file removal because of its whole-day resolution. In order to allow for more fine-grained control over the automatic removal period, the type of the expire_logs_days variable was changed to decimal where the fractional part represents the fraction of one day. For example, 12 hours can be expressed as 0.5 day.
  • When binary logging is enable, the thread state is set to "query end" most recently before writing to the binary log, so any time taken in that (due to contention or I/O) shows up as "query end", which makes it difficult to determine what is the current state of the thread.

    This change introduces new and more appropriate thread states that are set in any binary log related activities that may take time. For example, now whenever an event is being writing to the binary log, the thread state is set to "Writing an event to the binary log".
  • Although the INNODB_BUFFER_PAGE table shows the number of records within a page, it is not clear whether these records are row data or internal b-tree pointers. For this reason, a new LEVEL column is introduced which shows the level of a page in the B-tree.

    Additionally, the direction of insertion encoded in InnoDB data pages is now available in the DIRECTION column, and the number of consecutive inserts in that direction is available in the N_DIRECTION column. These columns are useful to determine whether records are being inserted in an ascending or descending sequence.
  • When extending a clustered index to the left or right, InnoDB attempts to fill leaf pages only up to a certain percentage (known as the fill factor), at which point the page is split. The remaining space is reserved for updating rows belonging to the page, attempting to make it likely that an updated row will be stored on the same page. This change introduces a new variable name innodb_index_fill_factor that can be used to control the fill factor for clustered indexes. Its value specifies the percentage of space on a leaf page to be used, reserving the remainder as free space.
  • While an alter table is executing, writes to the table whose definition is being changed are blocked until the operation completes or a lock wait timeout occurs. In a sharded system, it would be better if conflicting operations were aborted rather than blocked, thereby eliminating delays due to lock waits and allowing them to be redirect to another shard.

    In order to allow for this behavior, this change introduces a new alter table option named NO_WAIT. When set, conflicting locks requests that would normally block are aborted instead.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.

Thursday, July 26, 2012

Changes in Twitter MySQL 5.5.24.t7

Earlier this week we pushed to Github the seventh iteration of Twitter MySQL. Here are some of the highlights.

Bugs Fixed

Functionality Added or Changed

  • The execution of row-based events in a replication slave is not reflected in the SQL Thread state, making it rather difficult to identify long-running events. For example, the execution of a large row-by-row delete event is not immediately identifiable as the replication SQL thread state for most of its duration stays as "Reading event from the relay log".

    This change adds two new states to the replication slave SQL thread that are used to highlight the event that the SQL thread is executing and, for row-based events, to indicate how many rows have been applied. These will generally look like:
    Executing Delete_rows event at position 100
    Handling record 50 of 100 for a Delete_rows event
    
  • Although Table_map events include some column metadata information, such as type and length, it lacks information such as column names, whether an integer type is signed or unsigned, or the character set of string types. This makes it rather difficult for external programs to extract meaningful data from row-based events.

    In order to enable external programs to fully deduce the table/column definitions, a new Table_metadata event was introduced. The purpose of this event is to describe the structure and other properties of the table and its columns, such as column name, SQL type name, character set, etc.
  • The currently available fields that compromise the header of error log messages, which are the date and time, plus the severity of the message, do not provide the enough information to identify discontinuity in reporting or which process generated a message.

    To make such identification possible, this change includes the process ID in the header so that any changes in its value will identify the source (process) of the message and also indicate any discontinuity in reporting. Also, the format of the date and time is changed to the ISO format (YYYY-MM-DD hh:mm:ss).
  • Even though the creation of non-deterministic functions can be allowed by enabling log_bin_trust_function_creators, enabling the option also implies that executing such functions is allowed even if the binary log format is set to statement. If the binary log format is set to row-based, creating and executing non-deterministic functions shouldn't pose any problems. Hence, this change allows non-deterministic functions to be created when the binary log format is set to row-based. If said functions are invoked at runtime when the binary log format is statement-based, they will fail with an error.
  • Change the install name of the mysqlclient shared library to use an absolute pathname instead of just the library filename. This allows Apple's Mac OS X dynamic linker to locate the library even if the path where it is installed is not included in the default search path.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.

Wednesday, June 13, 2012

Changes in Twitter MySQL 5.5.23.t6

Earlier this week we pushed to Github the sixth iteration of Twitter MySQL. Here's some highlights:

Bugs Fixed

  • If a permanently corrupted page was read, InnoDB would forever keep trying to read the page.
  • Reading from a partitioned table using a impossible (e.g. where, between, etc) condition could lead to a crash if all partitions are pruned and the optimizer attempts to obtain the number of rows in the table. The problem occurs because even though no partitions are used, the optimizer might still attempt to perform (e.g. range) analysis where an estimate of the the number of rows is needed, but the partition storage engine wasn't prepared to provide it.

Functionality Added or Changed

  • Add partitioning functions for YMD and YMDH or equivalent
    Added a UTC_EXTRACT(unit FROM date) function that extracts parts from a date and returns them as an integer in a specified format. The function is modeled after EXTRACT(), but differs in that it uses UTC for all of its calculations. The supported unit specifiers are YEAR_MONTH_DAY and YEAR_MONTH_DAY_HOUR, which returns a representation of the date argument as a value in the YYYYMMDD or YYYYMMDDHH format.
    This function is intended to facilitate partitioning by days, specially when applied to TIMESTAMP values. It allows partitions to be named in a more human-readable format like YYYYMMDD.
  • Export the last know good binlog position as a status variable from InnoDB
    Introduced two new status variables that export the master binary log name and position of a slave as stored by InnoDB. Whenever the SQL thread commits a transaction, InnoDB also commits the master binary log name and position to the system tablespace. Now this information can be retrieved through the Innodb_mysql_master_log_file and Innodb_mysql_master_log_pos status variables.

    It is worth noting that the position stored within InnoDB might not match the actual position of the SQL thread. The position is only updated when the SQL thread applies an event that uses an InnoDB table and causes a read-write transaction to be committed. For example, an UPDATE statement which does not change any rows is considered a read-only transaction. 
  • Report counters for InnoDB corruption events so that they are alertable
    Introduced status variables that expose a counter of page corruption and table corruption events so that they can be alerted on without scanning the error log files.
  • Improve obscure "Got error -1 when reading table" messages
    The problem was that handler originated errors printed to the error log sometimes do not indicate the precise nature of the problem. This is especially an issue with InnoDB because it cannot always convert an internal InnoDB error code to a MySQL error code.
    InnoDB now provides an additional (and more detailed) error message if it cannot convert the error code. Additionally, errors related to a statement being interrupted are no longer printed to the error log.
    Also, the error message format was changed from "Got error when reading table" to "Error when reading table: error message (error number)" in order to distinguish the offending error message.
  • Google Breakpad crash reporting system
    Google's Breakpad is now used to generate minimal crash dump information when the server process (mysqld) crashes. The minidump file generated by breakpad contains a list of the executable and shared libraries loaded in the process, the state of the processor register and a stack trace for each thread, and miscellaneous information about the system and the reason for the dump. Minidumps are significantly smaller than core files, making them more practical for collection and processing.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.