Freaking MySQL Character Set Encodings

Derek Sivers‘ plan, with all it’s bin2hex and regexp and back and forth between MySQL and PHP almost looks good compared to what I’m about to do. Really, why is it so difficult to go from latin1 (tables created back in MySQL 3) to utf8? Not only do you have to set the charset on the table, but also the connection, in PHP, and flipping everywhere. And then you’ve gotta deal with all this old data that’s in the wrong character set.

I got all excited when I learned of CONVERT() and CAST(), and felt completely betrayed when they turned out to be basically useless to me.

CONVERT() with USING is used to convert data between different character sets. In MySQL, transcoding names are the same as the corresponding character set names. For example, this statement converts the string 'abc' in the default character set to the corresponding string in the utf8 character set:

SELECT CONVERT('abc' USING utf8);

Thing is, I can’t figure out where it’s inheriting “the default character set,” and I can’t seem to come up with a combination of settings that will do the job. This old doc looked promising at first, but quickly let me down.

Ack. It looks like I’m going to have to read the contents of a table using PHP and latin1 connection encoding, then insert the data into a duplicate table using utf8 connection encoding. Either that, or Derek Sivers’ 28 step process.

In the interest of documenting even the things that don’t work: I had high hopes for this, but it doesn’t do anything useful.

if($tables = $wpdb->get_col('SHOW TABLES;')) {
	// For every table in the database
	foreach($tables as $table) {
		$changes = array();
 
		// create the new table, set it to UTF8
		$changes[] = “CREATE TABLE `conv_$table` LIKE `$table`”;
		$changes[] = “ALTER TABLE `conv_$table` DEFAULT CHARACTER SET=utf8”;
 
		// copy the data into the new table, transcoding to UTF8
		$fields = $wpdb->get_results(“DESCRIBE {$table};”);			
		$allfields = $selectfields = array();
		foreach($fields as $field){
			$allfields[] = $field->Field;
			$selectfields[] = (stripos($field->Type, 'char') || stripos($field->Type, 'text')) ? “CONVERT(`$field->Field` USING utf8)” : “`$field->Field`”;
		}
		$changes[] = “INSERT INTO `conv_$table` (`”. implode($allfields, '`, `') .“`) SELECT ”. implode($selectfields, ', ') .“ FROM $table”;
 
		// move the old table aside and put the new one in place
		$changes[] = “RENAME TABLE `$table` TO `preutf8_$table`”;
		$changes[] = “RENAME TABLE `conv_$table` TO `$table`”;
 
		// echo out the mysql commands
		echo “<h2 id="11972_changes-to-table-tab_1" >Changes to table <code>$table</code>:</h2>”;
		echo implode($changes, “;<br />\n”);
	}
}

mysql, character set, encoding, utf8 conversion

2 Comments

  1. Hi!

    You should write
    (stripos($field->Type, ‘char’) !== false || stripos($field->Type, ‘text’) !== false)
    instead of
    (stripos($field->Type, ‘char’) || stripos($field->Type, ‘text’))

    because stripos($field->Type, ‘text’) returns 0 with mysql field type text

    But nice code anyway!

  2. [...] trials, I found that these commands are not full proof. What I ended up is adopting a script from maisonbisson to convert my database from latin1 to utf8. This scripts create empty tables by copying the [...]


Comments RSS TrackBack Identifier URI

Leave a comment

 

User contributed tags for this post:

mysql character set (141) - mysql charset (84) - mysql select encoding (69) - mysql set encoding (48) - mysql show database encoding (46) - mysql insert charset (40) - mysql show table character set (38) - mysql show table encoding (29) - mysql show encoding (28) - mysql insert character set (26) - Mysql convert character set (24) - mysql alter table encoding (21) - MYSQL Change encoding (21) - mysql show table charset (20) - F (20) - charset mysql (19) - character set mysql (19) - mysql change table encoding (19) - mysql set-charset (18) - mysql convert character encoding (16) - mysql insert encoding (16) - mysql determine charset (14) - mysql set names (14) - rss charset (13) - character (13) - mysql select utf-8 (13) - mysql describe character set (13) - mysql convert charset (12) - mysql utf8 select (12) - show encoding mysql (12) - mysql select utf8 (12) - mysql charset converter (11) - mysql insert utf-8 (11) - mysql create table encoding (11) - mysql SET CHARACTER SET (10) - mysql character sets (10) - mysql convert encoding (10) - change encoding mysql (10) - encoding mysql (10) - alter table encoding mysql (9) - mysql char() (9) - mysql character (9) - set encoding mysql (9) - mysql show database character set (9) - mysql describe table encoding (9) - change mysql charset (9) - mysql show character set table (9) - mysql set character (9) - mysql characterset (8) - mysql convert database charset (8) - mysql show character encoding (8) - mysql character-set (7) - utf8 mysql select (7) - mysql show charset (7) - php mysql insert utf8 (7) - mysql set (7) - mysql describe charset (7) - mysql display character set (7) - mysql table encoding (7) - mysql insert utf8 (7) - mysql insert encode (7) - mysql change character set (7) - php mysql connection charset (7) - mysql show character set of table (7) - mysql charset conversion (6) - MySQL create table character set (6) - mysql character set conversion (6) - table encoding mysql (6) - mysql characters (6) - mysql Charset Convert (6) - set MySQL charset (6) - show table character set mysql (6) - mysql convert to character set (6) - mysql table show character set (6) - mysql character set connection (6) - mysql change charset (6) - mysql change character set database (6) - mysql describe encoding (6) - character set in mysql (6) - mysql charset encoding (6) - mysql set charset (6) - php mysql connect charset (5) - mysql char set (5) - php mysql # character (5) - mysql convert database character set (5) - mysql ego (5) - php set mysql encoding (5) - insert charset mysql (5) - mysql show table character encoding (5) - mysql describe table charset (5) - wrong charset mysql (5) - show table charset (5) - mysql character set database (5) - MySQL UTF-8 select (5) - charset in mysql (5) - mysql character codes (5) - show table charset mysql (5) - mysql5 charset (5) - character sets mysql (5) - default character set mysql (5) - mysql utf8 conversion (4) - mysql create table set encoding (4) - mysql change database character set (4) - mysql field charset (4) - mysql character code (4) - mysql convert characters (4) - php mysql set encoding (4) - php mysql unicode (4) - php mysql set character set (4) - mysql add character set (4) - mysql character conversion (4) - MYSQL SELECT CONVERT UTF8 (4) - convert charset mysql (4) - show table encoding mysql (4) - mysql change database encoding (4) - mysql show encode (4) - php-mysql charset (4) - mysql set encoding select (4) - mysql table charset (4) - create mysql charset (4) - mysql insert character (4) - set charset mysql (4) - mysql convert table charset (4) - MySQL Select as UTF-8 (4) - how to change mysql charset (4) - mysql database character set (4) - mysql wrong charset (4) - change mysql database encoding (4) - change encoding in mysql (4) - show table encoding (4) - set encoding in select statement mysql (4) - mysql CONVERT() field (3) - convert mysql charset wordpress (3) - php mysql insert utf-8 (3) - mysql character set convert (3) - mysql set encoding utf8 (3) - php mysql set table encoding (3) - charset mysql connection (3) - utf-8 character set (3) - alter table default character set (3) - create database character set utf8 (3) - set mysql encoding (3) - character set connection (3) - insert charset (3) - mysql CONVERT latin1 (3) - mysql charset wrong (3) - mysql character convert (3) -