Freaking MySQL Character Set Encodings

Derek Sivers‘ plan, with all it’s bin2hex and regexp and back and forth between MySQL and PHP almost looks good compared to what I’m about to do. Really, why is it so difficult to go from latin1 (tables created back in MySQL 3) to utf8? Not only do you have to set the charset on the table, but also the connection, in PHP, and flipping everywhere. And then you’ve gotta deal with all this old data that’s in the wrong character set.

I got all excited when I learned of CONVERT() and CAST(), and felt completely betrayed when they turned out to be basically useless to me.

CONVERT() with USING is used to convert data between different character sets. In MySQL, transcoding names are the same as the corresponding character set names. For example, this statement converts the string 'abc' in the default character set to the corresponding string in the utf8 character set:

SELECT CONVERT('abc' USING utf8);

Thing is, I can’t figure out where it’s inheriting “the default character set,” and I can’t seem to come up with a combination of settings that will do the job. This old doc looked promising at first, but quickly let me down.

Ack. It looks like I’m going to have to read the contents of a table using PHP and latin1 connection encoding, then insert the data into a duplicate table using utf8 connection encoding. Either that, or Derek Sivers’ 28 step process.

In the interest of documenting even the things that don’t work: I had high hopes for this, but it doesn’t do anything useful.

if($tables = $wpdb->get_col('SHOW TABLES;')) {
	// For every table in the database
	foreach($tables as $table) {
		$changes = array();
 
		// create the new table, set it to UTF8
		$changes[] = “CREATE TABLE `conv_$table` LIKE `$table`”;
		$changes[] = “ALTER TABLE `conv_$table` DEFAULT CHARACTER SET=utf8”;
 
		// copy the data into the new table, transcoding to UTF8
		$fields = $wpdb->get_results(“DESCRIBE {$table};);			
		$allfields = $selectfields = array();
		foreach($fields as $field){
			$allfields[] = $field->Field;
			$selectfields[] = (stripos($field->Type, 'char') || stripos($field->Type, 'text')) ? “CONVERT(`$field->Field` USING utf8): “`$field->Field`”;
		}
		$changes[] = “INSERT INTO `conv_$table` (`”. implode($allfields, '`, `') .“`) SELECT ”. implode($selectfields, ', ') .“ FROM $table;
 
		// move the old table aside and put the new one in place
		$changes[] =RENAME TABLE `$table` TO `preutf8_$table`”;
		$changes[] =RENAME TABLE `conv_$table` TO `$table`”;
 
		// echo out the mysql commands
		echo<h2 id="11972_changes-to-table-tab_1" >Changes to table <code>$table</code>:</h2>;
		echo implode($changes,;<br />\n”);
	}
}

mysql, character set, encoding, utf8 conversion

Related:

2 Comments

  1. Comment by vahur on January 21, 2008 10:16 am

    Hi!

    You should write
    (stripos($field->Type, ‘char’) !== false || stripos($field->Type, ‘text’) !== false)
    instead of
    (stripos($field->Type, ‘char’) || stripos($field->Type, ‘text’))

    because stripos($field->Type, ‘text’) returns 0 with mysql field type text

    But nice code anyway!

  2. Pingback by Dummy’s guide for setting/converting character set for a web application « memento on February 15, 2008 1:16 am

    [...] trials, I found that these commands are not full proof. What I ended up is adopting a script from maisonbisson to convert my database from latin1 to utf8. This scripts create empty tables by copying the [...]

Comments RSS TrackBack Identifier URI

Leave a comment

 

User contributed tags for this post:

mysql character set (132) - mysql charset (84) - mysql set encoding (43) - mysql show table character set (29) - mysql show database encoding (29) - mysql insert charset (27) - mysql show table encoding (26) - mysql alter table encoding (21) - F (20) - character set mysql (19) - mysql change table encoding (19) - Mysql convert character set (18) - charset mysql (18) - mysql set-charset (18) - MYSQL Change encoding (18) - mysql show encoding (17) - mysql show table charset (16) - mysql insert character set (15) - time passing quote (14) - mysql set names (14) - mysql insert encoding (14) - mysql determine charset (13) - rss charset (12) - mysql charset converter (11) - mysql insert utf-8 (11) - mysql convert charset (11) - mysql create table encoding (11) - mysql select encoding (11) - mysql SET CHARACTER SET (10) - mysql character sets (10) - mysql select utf-8 (10) - show encoding mysql (10) - encoding mysql (10) - alter table encoding mysql (9) - mysql char() (9) - mysql character (9) - mysql convert encoding (9) - change mysql charset (9) - mysql select utf8 (9) - mysql set character (9) - set encoding mysql (8) - mysql convert database charset (8) - mysql character-set (7) - utf8 mysql select (7) - mysql show charset (7) - mysql characterset (7) - php mysql insert utf8 (7) - mysql set (7) - mysql utf8 select (7) - mysql describe table encoding (7) - change encoding mysql (7) - mysql table encoding (7) - mysql describe character set (7) - mysql show character set table (7) - mysql change character set (7) - mysql show character encoding (7) - character (6) - mysql charset conversion (6) - MySQL create table character set (6) - mysql character set conversion (6) - table encoding mysql (6) - mysql Charset Convert (6) - set MySQL charset (6) - mysql show database character set (6) - mysql convert to character set (6) - mysql character set connection (6) - mysql display character set (6) - mysql change charset (6) - mysql convert character encoding (6) - mysql change character set database (6) - mysql insert utf8 (6) - mysql insert encode (6) - php mysql connection charset (6) - mysql charset encoding (6) - mysql char set (5) - php mysql # character (5) - mysql convert database character set (5) - mysql characters (5) - mysql describe charset (5) - mysql ego (5) - php set mysql encoding (5) - insert charset mysql (5) - mysql show table character encoding (5) - mysql table show character set (5) - mysql describe table charset (5) - wrong charset mysql (5) - show table charset (5) - mysql character set database (5) - charset in mysql (5) - character set in mysql (5) - mysql character codes (5) - show table charset mysql (5) - mysql5 charset (5) - character sets mysql (5) - default character set mysql (5) - mysql utf8 conversion (4) - mysql field charset (4) - mysql character code (4) - mysql convert characters (4) - php mysql set encoding (4) - php mysql unicode (4) - php mysql set character set (4) - convert charset mysql (4) - show table encoding mysql (4) - mysql change database encoding (4) - php-mysql charset (4) - MySQL UTF-8 select (4) - mysql describe encoding (4) - mysql table charset (4) - create mysql charset (4) - set charset mysql (4) - mysql convert table charset (4) - how to change mysql charset (4) - mysql set charset (4) - mysql database character set (4) - change mysql database encoding (4) - mysql CONVERT() field (3) - convert mysql charset wordpress (3) - php mysql insert utf-8 (3) - mss (3) - mysql create table set encoding (3) - mysql character set convert (3) - mysql change database character set (3) - mysql set encoding utf8 (3) - php mysql set table encoding (3) - charset mysql connection (3) - utf-8 character set (3) - create database character set utf8 (3) - set mysql encoding (3) - insert charset (3) - mysql CONVERT latin1 (3) - mysql add character set (3) - mysql character convert (3) - mysql character conversion (3) - MYSQL SELECT CONVERT UTF8 (3) - freaking sex com (3) - show database character set (3) - encode mysql (3) - mysql show encode (3) - mysql set connection encoding (3) - mysql encoding convert (3) - mysql show database charset (3) - mysql select character set (3) - convert encoding mysql (3) - change table charset mysql (3) - change charset mysql (3) - mysql show charset table (3) - using php to select mysql character set (3) - php mysql charset conversion (3) - php SET CHARACTER SET mysql (3) -