How to get the character set right

[expired user #6195]'s profile image [expired user #6195] posted 12 years ago in Import/Export Permalink
I'm exporting and importing with HeidiSQL. Both databases are UTF-8, but he tables are Latin - Swedish for some reason. On Import, I've tried various combinations for the encoding. (This is a WordPress database.)

The problem is that non-latin characters (Cyrillic, quotation marks, ellipses and dashes) are replaced with 3-character odd characters. How do I do it right so the character sets line up.
BubikolRamios's profile image BubikolRamios posted 12 years ago Permalink
1.did you try drop/create table on export ?
2.are you viewing exported data with same client ?
3. Are exported data on same OS ?
[expired user #6195]'s profile image [expired user #6195] posted 12 years ago Permalink
1. The tables are being created (did not exist before).
2. Yes I am viewing with WordPress set to UTF-8 on both ends.
3. It's Linux both ends.
ansgar's profile image ansgar posted 12 years ago Permalink
Where do you see those "wrong" 3-chars approach? In Wordpress or HeidiSQL?

Which HeidiSQL version is it?

You could open the exported file with some text edior and watch out if that file is already broken. If yes, I guess the data in the existing database is already broken.
kalvaro's profile image kalvaro posted 12 years ago Permalink
if "both databases are UTF-8, but he tables are Latin - Swedish for some reason" then your data is not using UTF-8: it's using Latin 1. The database encoding is just a default to use, e.g., when you create a table and don't specify a charset. In the Latin 1 charset it's impossible to store Cyrillic characters.
[expired user #6195]'s profile image [expired user #6195] posted 12 years ago Permalink
I'm using the latest nightly build of HeidiSQL.

When I view the source data in WordPress, I see left quotation marks, ellipses and Cyrillic.

Looking at the HeidiSQL export SQL file I see: CREATE TABLE ... DEFAULT CHARSET=latin1

When I look at the HeidiSQL export SQL file with Notepad++, I see stuff like:

…

If you can't see that, it's a capitol A with a tilde over it, a cent sign, a lower case a with grave, a comma, a logical not sign, another tilde A and a vertical bar.

which is the same things I see in the target WordPress blog.

Both WordPress installations were setup with UTF-8 as the character set.

Unfortunately, I cannot get at the original data any more.
jfalch's profile image jfalch posted 12 years ago Permalink
this is probably the character '_', two times encoded as utf-8; ie if you decode the above string utf-8 -> (eg) win1252, you get …, decoding that again yields _ .
jfalch's profile image jfalch posted 12 years ago Permalink
via utf8-decoder
[expired user #6195]'s profile image [expired user #6195] posted 12 years ago Permalink
This turned out to be a WordPress installation error and nothing to do with the import/export. Sorry to have troubled you.
ansgar's profile image ansgar posted 12 years ago Permalink
Thank you for the update!

Please login to leave a reply, or register at first.