Export DB with utf8mb4 characters

[expired user #9273]'s profile image [expired user #9273] posted 9 years ago in Import/Export Permalink
Using HeidiSQL 9.3.0.4984 (64 bit) on Windows 8
I export my database using HeidiSQL to SQL file.
When I reload the SQL file, the utf characters (€ for example) are garbaged like '😄'

If I load the file into Notepad++, the characters look good.
And if I cut and paste the file into HeidiSQL, the look good as well.

If I dump just one table, I can load it with HeidiSQL and it looks OK as well.
The dump is big enough that HeidiSQL gives me the option to run the file directly or load it first.
If I run it directly, the resulting database has the bad characters.


I just started using utf8, so hopefully I am just doing something stupid.
ansgar's profile image ansgar posted 9 years ago Permalink
You're not doing something stupid. That's one of the bugs on which I already spent countless hours and still couldn't find a solution for.

Can you attach a file which creates broken characters?
[expired user #9273]'s profile image [expired user #9273] posted 9 years ago Permalink
This is the smallest file I could generate that reproduces the problem. I started with a 5MB file, so I did pretty well. Any insert lines I tried to delete further resulted in a file that HeidiSQL loaded and kept the utf8mb4 encoding.

If I load this file, the encoding seems to revert to ANSI. Search for the insert lines for table tt as a test.
They should look like this: (well better this, but at least you see fewer glyphs)
INSERT INTO `tt` (`id`, `t`) VALUES
(1, '
1 attachment(s):
ansgar's profile image ansgar posted 9 years ago Permalink
Encoding="Auto detect (may fail)" in the file open dialog fails to detect the encoding in the file you attached. If I instead select "UTF-8", it loads well.

But I thought the file does not import correctly when it's loaded via "Run directly, without loading into the editor"?
[expired user #9273]'s profile image [expired user #9273] posted 9 years ago Permalink
Yes. It (the original 5mb file) fails when I select "run directly". If I load in Notepad++ (yes, auto detect fails here too) and then select encoding-->utf8 and then cut/paste to HeidiSQL; it loads correctly.

Is there a way for a user to specify utf8 encoding when or after loading a file like Notepad++ allows?
I am also wondering why auto detect (wherever that function is implemented?) is failing on this file. I scanned it using mb_check_encoding in PHP and every line passes.

As you said, "Countless hours" and just to get a Euro sign :)
ansgar's profile image ansgar posted 9 years ago Permalink

Is there a way for a user to specify utf8 encoding when or after loading a file like Notepad++ allows?
I am also wondering why auto detect (wherever that function is implemented?) is failing on this file. I scanned it using mb_check_encoding in PHP and every line passes.



No, there is no encoding selector after a file has been loaded. Storing a file in HeidiSQL is always done in utf-8. So, showing the encoding to the user just matters when you load a file, and only then.

However, the auto-detection seems to fail quite often. It's a piece of code which should probably be refactored. Probably it would also be sufficient if I increase the size of the analyzed text file chunk from 100K to 1M.
Code modification/commit from ansgarbecker, 9 years ago, revision 9.3.0.4993
Increase size of analyzed text file chunk from 100K to 1M, in DetectEncoding(), so it fails less often to see encoding relevant characters. See http://www.heidisql.com/forum.php?t=19383
ansgar's profile image ansgar posted 9 years ago Permalink
I just did that in r4993. For the file you attached, HeidiSQL seems to correctly detect utf8 now here.
[expired user #9273]'s profile image [expired user #9273] posted 9 years ago Permalink
Thanks for your efforts and fix. It occurs to me that the file that originally failed was a DB dump to SQL created by HeidiSQL and that DB dumps can be quite lenthy and include a lot of data early in the file from a table/tables that do not have UTF8 multibyte characters in them which could make it hard to auto detect. Maybe HeidiSQL could put out a comment line early in the dump with multibyte characters to make it easy on the auto detector. Just a thought. Thanks again.

I think I am going to do this for the SQL files that I create.
ansgar's profile image ansgar posted 9 years ago Permalink
Funny idea... what could we write into the header file? Probably something which is recognized by the user as a character test, something like that:
-- --------------------------------------------------------

-- Host:                         127.0.0.1

-- Server version:               5.7.7-rc - MySQL Community Server (GPL)

-- Server OS:                    Win64

-- HeidiSQL Version:             8.3.0.4750

-- Character set:                utf8 - äöü
-- --------------------------------------------------------
jfalch's profile image jfalch posted 9 years ago Permalink
to add a character set "header" is most probably a good idea. The import routines could check on this and stop autodetecting when it is found.

Please login to leave a reply, or register at first.