Unicode Compliance for Text

Unicode Compliance for Database

The sql to convert table to unicode character set is: ALTER TABLE tbl_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci, ENGINE=InnoDB; or ALTER TABLE tbl_name CHARSET=utf8, ENGINE=InnoDB;

ALTER DATABASE Mifos charset=utf8;

German letter ß (U+00DF LETTER SHARP S) is sorted near "ss" Letter Œ (U+0152 LATIN CAPITAL LIGATURE OE) is sorted near "OE".

utf8_general_ci does not support expansions/ligatures, it sorts all these letters as single characters, and sometimes in a wrong order.

  1. utf8_unicode_ci is generally more accurate for all scripts.

For example, on Cyrillic block: utf8_unicode_ci is fine for all these languages: Russian, Bulgarian, Belarusian, Macedonian, Serbian, and Ukrainian. While utf8_general_ci is fine only for Russian and Bulgarian subset of Cyrillic. Extra letters used in Belarusian, Macedonian, Serbian, and Ukrainian are sorted not well.

The disadvantage of utf8_unicode_ci is that it is a little bit

slower than utf8_general_ci.