From the MySQL manual:
For any Unicode character set, operations performed using the
xxx_general_cicollation are faster than those for the
xxx_unicode_cicollation. For example, comparisons for the
utf8_general_cicollation are faster, but slightly less correct, than comparisons for
They have a amusing “examples of the effect of collation” set on “sorting German umlauts,” but it unhelpfully uses
latin1_* collations. And another table that helpfully explains:
A difference between the collations is that this is true for utf8_general_ci:
ß = s
Whereas this is true for utf8_unicode_ci, which supports the German DIN-1 ordering (also known as dictionary order):
ß = ss
This forum post adds more info, but nowhere do they explain how a ☃ sorts against ☁ or ⛅.
How much faster is
utf8_unicode_ci, though? An August 2010 message in the MySQL forums seems to suggest the performance for specific operations could be 30% faster, but then dismisses the performance difference as unimportant compared to good indexing and writing efficient queries.