Apple CloudKit uses FoundationDB Record Layer

Together, the Record Layer and FoundationDB form the backbone of Apple’s CloudKit. We wrote a paper describing how we built the Record Layer to run at massive scale and how CloudKit uses it. Today, you can read the preprint to learn more.

From an anonymous FoundationDB blog post introducing relational database capabilities built atop FoundationDB’s key-value store. The paper about CloudKit (PDF) is also worth a read. CloudKit is Apple’s free at any legitimate scale back-end as a service for all iOS and MacOS apps.

Detect MySQL’s “too many connections” error

WordPress appears to continue with execution even when MySQL refuses connections/queries after init. Here’s a comment in the MySQL docs suggesting how to detect the condition in raw PHP:

$link = mysql_connect("localhost", "mysql_user", "mysql_password");
if (mysql_errno() == 1203) {
  // 1203 == ER_TOO_MANY_USER_CONNECTIONS (mysqld_error.h)

Just a note to myself, but I wonder if there’s opportunity here.

Speeding up MySQL joins on tables with TEXT columns, maybe

The thing about WordPress’ DB schema is that TEXT and VARCHAR content is mixed in the posts table (to say nothing of the frustrations of DATETIME columns). That’s not such a problem for a blog with a few hundred posts, but it’s a different matter when you have a few hundred thousand posts. And it wouldn’t even […] » about 500 words

What is the difference utf8_unicode_ci and utf8_general_ci?

From the MySQL manual:

For any Unicode character set, operations performed using the xxx_general_ci collation are faster than those for the xxx_unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci.

They have a amusing “examples of the effect of collation” set on “sorting German umlauts,” but it unhelpfully uses latin1_* collations. And another table that helpfully explains:

A difference between the collations is that this is true for utf8_general_ci:

ß = s

Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):

ß = ss

This forum post adds more info, but nowhere do they explain how a ☃ sorts against ☁ or ⛅.

How much faster is utf8_general_ci than utf8_unicode_ci, though? An August 2010 message in the MySQL forums seems to suggest the performance for specific operations could be 30% faster, but then dismisses the performance difference as unimportant compared to good indexing and writing efficient queries.

Cleaning Up Category Relationships In A WordPress Scriblio Site

A few lines of SQL I used to clean up a Scriblio site. It’s probably useless to anybody but me. I’m not suggesting anybody else use this code, as it will result in changed or deleted data.

Update the post author for catalog records (identified because they have a specific post meta entry):

UPDATE wp_8_postmeta
JOIN wp_8_posts ON wp_8_posts.ID = wp_8_postmeta.post_id
SET post_author = 15
WHERE meta_key = 'scrib_meditor_content'

Get the categories attached to every catalog record (except the “catalog” category):

SELECT tr.object_id , tr.term_taxonomy_id
FROM wp_8_term_relationships tr
JOIN wp_8_posts p ON p.ID = tr.object_id
WHERE tr.term_taxonomy_id IN (
	SELECT term_taxonomy_id
	FROM wp_8_term_taxonomy
	WHERE taxonomy = "category"
	AND term_id != 30
AND post_author = 15
ORDER BY tr.object_id , tr.term_taxonomy_id

Using the above list of object ids and term taxonomy ids, build a series of queries like the following to delete them:

DELETE FROM wp_8_term_relationships WHERE object_id = 12275 AND term_taxonomy_id = 271872 ;

Insert a catalog category relationship for all catalog records:

INSERT INTO wp_8_term_relationships
SELECT p.ID , '271871' , '0'
FROM wp_8_posts p
LEFT JOIN wp_8_term_relationships tr ON p.ID = tr.object_id AND tr.term_taxonomy_id = 271871
WHERE post_author = 15
AND tr.term_taxonomy_id IS NULL

Is MySQL 5.1 Ready?

MySQL 5.1 hasn’t gotten a lot of love, but it does introduce support for pluggable storage engines. And that’s required to use SphinxSE. Sphinx is a fast full text search engine. It doesn’t need to run as a MySQL storage engine to work, but doing that allows joining against other MySQL tables. So while I’m watching the future of MySQL alternatives, I’m also watching 5.1 bug fixes and playing with the CoolStack-packaged 5.1 on a friend’s box.

The Difference Between MySQL’s utf8_unicode_ci and. utf8_general_ci Collations

MySQL answer: utf8_unicode_ci vs. utf8_general_ci.

Collation controls sorting behavior. Unicode rationalizes the character set, but doesn’t, on it’s own, rationalize sorting behavior for all the various languages it supports. utf8_general_ci (ci = case insensitive) is apparently a bit faster, but sloppier, and only appropriate for English language data sets.

Oh Noes! My Table Is Gone!

# mysqlcheck -p -A --auto-repair --optimize
info     : Found block with too small length at 17732; Skipped
info     : Wrong block with wrong total length starting at 17776
info     : Found block with too small length at 28776; Skipped
warning  : Number of rows changed from 444 to 441
status   : OK

Cleaning up the mess after a hardware failure can suck. This mysqlcheck output is from the wp_options table for this blog. Unfortunately, if the options table is unreadable, all of WordPress panics and fails to load.

MySQL 5.1 Released, Community Takes Stock

MySQL 5.1 is out as a GA release, but with crashing bugs that should give likely users pause. Perhaps worse, the problems are blamed on essential breakdowns in the project management: “We have changed the release model so that instead of focusing on quality and features our release is now defined by timeliness and features. Quality is not regarded to be that important.”

Still, people are finding inspiration in OurDelta and Drizzle. Competition from those braches/forks and criticism from the community are sure to help re-align the MySQL core, or provide a reasonable alternative if Sun/MySQL can’t deliver. In the meanwhile, the High Availability MySQL blog is worth following.

Converting MySQL Character Sets

This Gentoo Wiki page suggests dumping the table and using iconv to convert the characters, then insert the dump into a new table with the new charset.

Alex King solved a different problem: his apps were talking UTF8, but his tables were Latin1. His solution was to dump the tables, change the charset info in the dump file, then re-insert the contents.

Copying MySQL Usernames and Database Priveleges

Now that I’m the nominal MySQL DBA for PSU, it became my job to jimmy up the MySQL user privileges so that the new web server could connect. I’m not sure if this is the fastest, most efficient way to do it, but it worked quickly enough:

``` CREATE TABLE mysql.user_copy SELECT * FROM mysql.user; DELETE FROM mysql.user_copy WHERE Host NOT LIKE 'OLD_HOST_NAME'; UPDATE mysql.user_copy SET Host = 'NEW_HOST_NAME'; INSERT INTO mysql.user SELECT * FROM mysql.user_copy; DROP TABLE mysql.user_copy;   CREATE TABLE mysql.db_copy SELECT * FROM mysql.db; DELETE FROM mysql.db_copy WHERE Host NOT LIKE 'OLD_HOST_NAME'; UPDATE mysql.db_copy SET Host = 'NEW_HOST_NAME'; INSERT INTO mysql.db SELECT * FROM mysql.db_copy; DROP TABLE mysql.db_copy;   FLUSH PRIVILEGES; ```

Simply replace the OLD_HOST_NAME and NEW_HOST_NAME with the appropriate values. Most importantly, I didn’t have to know the passwords for each user to do this. This script simply copied the user info and gave them access from the new server.

Optimizing Inserts/Updates On MySQL Tables

When doing a bulk insert/update/change to a MySQL table you can temporarily disable index updates like this:

``` ALTER TABLE $tbl_name DISABLE KEYS ```

…do stuff…

``` ALTER TABLE $tbl_name ENABLE KEYS ```

From the docs:

ALTER TABLE ... DISABLE KEYS tells MySQL to stop updating non-unique indexes. ALTER TABLE ... ENABLE KEYS then should be used to re-create missing indexes. MySQL does this with a special algorithm that is much faster than inserting keys one by one, so disabling keys before performing bulk insert operations should give a considerable speedup. Using ALTER TABLE ... DISABLE KEYS requires the INDEX privilege in addition to the privileges mentioned earlier.

While the non-unique indexes are disabled, they are ignored for statements such as SELECT and EXPLAIN that otherwise would use them.

MySQL Bug?

After an upgrade to MySQL 5.0.51b on RHEL 5 I started seeing curious results in a fairly common query. Here’s a simplified version:

``` SELECT ID, post_date_gmt FROM wp_posts GROUP BY ID ORDER BY post_date_gmt DESC LIMIT 5 ```

What I expected was to get a handful of post ID numbers sorted in descending order by the post_date_gmt. Instead, I got a list of post IDs sorted in ascending order by the ID number. Something like this:

``` 3 2007-05-21 00:00:00 4 2007-05-21 00:00:00 5 2007-05-21 00:00:00 6 2007-05-21 00:00:00 7 2007-05-21 00:00:00 ```

After some fiddling I discovered that the GROUP BY clause was causing a problem. So this query works:

``` SELECT ID, post_date_gmt FROM wp_posts ORDER BY post_date_gmt DESC LIMIT 5 ```

…and outputs the results I expected:

``` 337832 2008-06-20 15:20:03 335991 2008-06-17 13:00:42 337777 2008-06-02 12:15:46 337390 2008-05-28 00:00:00 337831 2008-05-28 00:00:00 ```

The GROUP BY clause may be unnecessary, though it was originally written in to accommodate conditions where a JOIN (which is often added when the query is dynamically generated) causes MySQL to return multiple rows representing the same record ID.

Still, isn’t this behavior weird? It’s certainly different from previous versions.

Find Stuff By Minimum Bounding Rectangle

MySQL offers ENVELOPE() to find the minimum bounding rectangle of a geometric object. The result is a polygon with four segments, defined by five points. It took me a while to make sense of it, partially because the only documentation that I’ve run across so far for POLYGON() syntax is in the ENVELOPE() function mentioned […] » about 200 words

Working With Spatial Data in MySQL

It’s MySQL spatial data week here, though I am spreading out the posts to, um, ease the pain (or boredom). Anyway, here are some commands/functions I don’t want to forget about later:

Start with an existing table called geometry, add a spatial column and index it:

``` ALTER TABLE geometry ADD coord POINT NOT NULL; CREATE SPATIAL INDEX coord ON geometry (coord); ```

Insert some data; think in terms of POINT(X Y) or POINT(lat lon):

``` INSERT INTO geometry (coord) VALUES( GeomFromText( 'POINT(40 -100)' )); INSERT INTO geometry (coord) VALUES( GeomFromText( 'POINT(1 1)' )); ```

Get those X,Y coordinates back from the table:

``` SELECT X(coord), Y(coord) FROM geometry ```

Get points within a bounding rectangle:

``` SELECT MBRContains( GeomFromText( 'POLYGON((0 0,0 3,3 3,3 0,0 0))' ), coord ) FROM geometry ```