distal-attribute
distal-attribute
distal-attribute
distal-attribute

How to get the character set right

kwdavids1 posted 3 years ago in Import/Export
I'm exporting and importing with HeidiSQL. Both databases are UTF-8, but he tables are Latin - Swedish for some reason. On Import, I've tried various combinations for the encoding. (This is a WordPress database.)

The problem is that non-latin characters (Cyrillic, quotation marks, ellipses and dashes) are replaced with 3-character odd characters. How do I do it right so the character sets line up.
BubikolRamios posted 3 years ago
1.did you try drop/create table on export ?
2.are you viewing exported data with same client ?
3. Are exported data on same OS ?
kwdavids1 posted 3 years ago
1. The tables are being created (did not exist before).
2. Yes I am viewing with WordPress set to UTF-8 on both ends.
3. It's Linux both ends.
ansgar posted 3 years ago
Where do you see those "wrong" 3-chars approach? In Wordpress or HeidiSQL?

Which HeidiSQL version is it?

You could open the exported file with some text edior and watch out if that file is already broken. If yes, I guess the data in the existing database is already broken.
kalvaro posted 3 years ago
if "both databases are UTF-8, but he tables are Latin - Swedish for some reason" then your data is not using UTF-8: it's using Latin 1. The database encoding is just a default to use, e.g., when you create a table and don't specify a charset. In the Latin 1 charset it's impossible to store Cyrillic characters.

kwdavids1 posted 3 years ago
I'm using the latest nightly build of HeidiSQL.

When I view the source data in WordPress, I see left quotation marks, ellipses and Cyrillic.

Looking at the HeidiSQL export SQL file I see: CREATE TABLE ... DEFAULT CHARSET=latin1

When I look at the HeidiSQL export SQL file with Notepad++, I see stuff like:

…

If you can't see that, it's a capitol A with a tilde over it, a cent sign, a lower case a with grave, a comma, a logical not sign, another tilde A and a vertical bar.

which is the same things I see in the target WordPress blog.

Both WordPress installations were setup with UTF-8 as the character set.

Unfortunately, I cannot get at the original data any more.
jfalch posted 3 years ago
this is probably the character '_', two times encoded as utf-8; ie if you decode the above string utf-8 -> (eg) win1252, you get …, decoding that again yields _ .
jfalch posted 3 years ago
via utf8-decoder
kwdavids1 posted 3 years ago
This turned out to be a WordPress installation error and nothing to do with the import/export. Sorry to have troubled you.
ansgar posted 3 years ago
Thank you for the update!

Please login to leave a reply, or register at first.