Exporting UTF-8 database, data gets mangled?

zonetrooperx posted 8 years ago in Import/Export
I have a database that is UTF-8, InnoDB engine and .

When I export the data using HeidiSQL the data gets mangled from:

서버에서 응답

to:

서버에서 응답

An ideas of what's up?
ansgar posted 8 years ago
Should not be the case, unless you're using an older build, or the 4.0 release?

Second thing is, you surely need to view the exported file using a Unicode capable text editor (HeidiSQL's query editor, or just Notepad is fine). Heidi writes UTF-8 files without BOM, which then requires any reader to detect the file encoding. Never saw an editor which cannot do that, but who knows which one you're using?
zonetrooperx posted 8 years ago
I've opened it in HeidiSQL (latest version), Notepad and another Unicode capable editor with the same result:

서버에서 응답

instead of:

서버에서 응답
ansgar posted 8 years ago
I hope "latest version" means you downloaded a build file.

In that case there must be some problem in the table's charset details, or your server is a pre MySQL 4.1 version, is it?

Could you post the CREATE TABLE statement for the table in question?
zonetrooperx posted 8 years ago
My MySQL version is: 5.1.41
My HeidiSQL version is: 5.0.0.3100

CREATE TABLE Statement:

# Dumping structure for table database.language_korean
CREATE TABLE IF NOT EXISTS `language_korean` (
`language_string_ID` bigint(20) NOT NULL AUTO_INCREMENT COMMENT 'The ID of the language string',
`language_string_name` text NOT NULL COMMENT 'The Name of the language string',
`language_string_text` text NOT NULL COMMENT 'The Text of the language string',
`language_string_type` text NOT NULL COMMENT 'The Type of language string',
`language_string_date_time_edited` datetime NOT NULL COMMENT 'The Date Time last edited of the language string',
PRIMARY KEY (`language_string_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=22 DEFAULT CHARSET=utf8;
ansgar posted 8 years ago
Ok... your sample text is also broken here after an export. But I can successully put other critical text into it. Seems that certain chinese or korean chars get broken by UTF8ToString(), weird.
ansgar posted 8 years ago
While I was playing around with different unicode writers I suddenly saw that Notepad is the only editor which displays the korean characters as intended:

Apart from Notepad I checked the query editor in HeidiSQL itself and Notepad++ which both display broken chars, like here:

As these chars looks fine in data grids of HeidiSQL, my guess is now that only file loading has a bug here. Not only in Heidi, but also in Notepad++. Good old Notepad obviously has a better working encoding detection here.
2 attachment(s):
  • korean_sqlexport_notepad
  • korean_sqlexport_notepadplus
ansgar posted 8 years ago
That probably means Heidi should write a UTF16 BOM to the result file, where it writes no BOM at all currently.
ansgar posted 8 years ago
Ah... no no everything alright here. HeidiSQL and Notepad++ display broken characters because my selected font (Courier New) doesn't include korean chars. When I select a different one, e.g. "Segeo UI", the chars are displayed as intended:



I'll commit a change so that it is possible to select non-fixed fonts via Tools > Preferences > SQL.
1 attachment(s):
  • korean_sqlexport_notepadplus_fixed
ansgar posted 8 years ago
An alternative fixed font which shows these chars well seems to be "SimSun ExtB". Please try.
ansgar posted 8 years ago
One last note here: If you execute these INSERT lines within HeidiSQL, the created rows look fine here, no matter which font is selected, and even if the text is displayed broken. Obvious, as this is a font issue, but noticable.
zonetrooperx posted 8 years ago
Thanks anse,

I've confirmed at my end that I now have it appearing correctly in my other Unicode editor (phpDesigner) using the font 'Consolas'.

As you suggested being able to select non-fixed fonts via Tools > Preferences > SQL would be awesome for UTF-8 developers.

happyThanks for the help
ansgar posted 8 years ago
Well, as I was testing non-fixed fonts I saw that SynEdit tries to display a non-fixed font like it was a fixed one, in a way that each character has the same width. That looks so unusable.

Please login to leave a reply, or register at first.