Saturday, January 5, 2013

Changes in Twitter MySQL 5.5.28.t9

Earlier this week we pushed to Github the ninth iteration of Twitter MySQL. Here are some of the highlights.

Bugs Fixed

  • InnoDB's B+ tree page split algorithm that attempts to optimize for sequential inserts might end up causing poor space utilization depending on the distribution pattern of index key values throughout the index. For example, if an insert that causes a page to be split is inserting a key value that is an immediate successor or predecessor to the last inserted key value in the same page, the insertion point is used as the split point irrespective of the actual distribution of values in the page.

    The solution is to use the standard B+ tree split algorithm while still preserving some form of optimization for sequential inserts. When a page needs to be split, the median key value in a page is used as the split point so that the data is distributed in a more symmetric fashion. A new variable named innodb_index_page_split_mode is introduced to provide a way to control the page split behavior.
  • Once the segments (indexes) of a tablespace are bigger than 32 pages, fragment pages are no longer allocated for use, yet they are still reserved whenever a new fragment extent is allocated (usually every 16384 pages). This is a limitation due to the fact that a segment can only allocate up to 32 fragment pages since the array used to track fragment pages belonging to a segment is limited to 32 entries.

    The solution is to allow for fragment extents to be leased to segments whenever there are free fragment extents available. A fragment extent is considered available if the only used pages in the extent are the extent descriptor and ibuf bitmap pages. A new extent state is used to tag leased extents and to ensure that they are returned to the space free fragment list once no longer being used by a segment.

    See Page management in InnoDB space files for an in-depth description of extents, extent descriptors and fragments.
  • When performing operations that are expected to expand a table (for example, allocate new pages due to a page being split), InnoDB currently preallocates and reserves up to 1% of the total size of the tablespace as a measure to ensure that enough free extents (that is, disk space) are available for the operation and to ensure that if running out of disk space, these operations are preemptively failed as to reserve any remaining free space to operations that end up freeing space (that is, delete data).

    The percentage is reasonable for tables smaller than a few gigabytes, but not for tables sized at tens of gigabytes or more, at which point the percentage won't correctly estimate the free space needed to perform operations and may cause an excessive amount of free extents to be preallocated.

    This change introduces two new system variables to enable/disable free extents reservation and to control the amount of free extents that is reserved for such operations. The variable innodb_reserve_free_extents can be used to enable or disable free extents reservation and innodb_free_extents_reservation_factor can be used to control what percentage of a space size is reserved for operations that may cause more space to be used.

Functionality Added or Changed

  • Currently Innodb_page_merges counts only merge attempts but there is no metric for successful merges. This change introduces a new status variable named Innodb_page_merges_succeeded which indicates the number of successful page merge operations (that is, the number of pages successfully merged into another page).

    Additionally, this change also introduces a new status variable named Innodb_page_discards which represents the number of pages that have become empty and are thus discarded.
  • Augment the server plugin interface to allow plugins to define and expose floating-point system variables of type double. The convenience macros MYSQL_SYSVAR_DOUBLE and MYSQL_THDVAR_DOUBLE are introduced and can be used by plugins to declare system variables of type double.
  • Since the command-line option parsing interface (my_getopt) uses fields of type unsigned long long to store these values, the double values were being stored in a lossy way that discards the fractional part.

    This change allows the default, minimum and maximum values of system variables of type double to have a meaningful fractional part by to storing the raw representation of a double value in the raw bits of an unsigned long long field in a way that the binary representation remains the same. Hence, the actual value can be passed back and forth between the types.
For a more complete look at what's new in this version, please see the change history and documentation. Feedback, bug reports, etc, can be submitted directly to the issue tracker.