Changing connection character set variables

[expired user #1520]'s profile image [expired user #1520] posted 16 years ago in Feature discussion Permalink
Well i have finally found out what all the problem i was having when viewing and explorting from my server mysql 4.1. Basically HeidiSQL is connecting with character_set_client, character_set_connection, and character_set_results as cp1256 and my database table is latin and all fields that have binary characters over ascii show up as question marks in the grid and all exports, so how do i change HeidiSQL's connection character sets.
ansgar's profile image ansgar posted 16 years ago Permalink
You can simply use the query tab to execute something like
SET NAMES latin1

But I guess that won't really help out as the characterset problem sticks on various components in HeidiSQL. One of them is the underlying database layer, Zeos, then the grid which is used up to ver 3.2 also doesn't support unicode and so on. But that's discussed in another thread.
[expired user #1520]'s profile image [expired user #1520] posted 16 years ago Permalink
So what is causing heidisql to set the variables as cp1256? Is there something in my system i can change to fix this as i doubt everyone is connecting with the same character set.
[expired user #1125]'s profile image [expired user #1125] posted 16 years ago Permalink
HeidiSQL chooses cp1256 because it matches the ANSI codepage your Windows installation uses. It's a theoretical (but unlikely) possibility that this auto-detection failed.

Hint: if you think that the SET NAMES that HeidiSQL does is wrong, then usually what's happening is that you're storing stuff wrong in your MySQL database. By doing a SET NAMES manually, you can get good characters by doing this wrong translation in reverse.

It's basically a sign that you have an application that is giving MySQL Server data and telling the server "this is data in character set X", but in reality it is data encoded in character set Y.

If you are storing raw binary data in a column you should set the column character set to "BINARY".

Existing data can be "relabeled" as being in another character set by first doing a no-op conversion to BINARY, then doing a no-op conversion to the actual character set used. For details see:
http://www.mysqlperformanceblog.com/2007/12/18/fixing-column-encoding-mess-in-mysql/
[expired user #1520]'s profile image [expired user #1520] posted 16 years ago Permalink
As HeidiSQL chooses cp1256 because of the ansi codepage, please give me the ability to decide the characterset i wish to use when connecting to my database. Luckily, i'm not using unicode so i'm not affect by the grid no supporting it. I tried the 'SET NAMES latin1' and it worked perfectly but it doesnt work perminently, so its not a solution i can use.

Hint: if you think that the SET NAMES that HeidiSQL does is wrong, then usually what's happening is that you're storing stuff wrong in your MySQL database. By doing a SET NAMES manually, you can get good characters by doing this wrong translation in reverse.



Well if i was entering data directly into my database from HeidiSQL, then likely this would be true, but the data comes from my PHP script and i use HeidiSQL just for maintenance and viewing data. :)

It's basically a sign that you have an application that is giving MySQL Server data and telling the server "this is data in character set X", but in reality it is data encoded in character set Y.



Well my server has character_set_server and character_set_database set as latin1, but heidisql is connecting with cp1256, so that would be the main problem behind it. Ideally, if you click on a table with a particular characterset, heidisql should change to that characterset.

Existing data can be "relabeled" as being in another character set by first doing a no-op conversion to BINARY, then doing a no-op conversion to the actual character set used. For details see:
http://www.mysqlperformanceblog.com/2007/12/18/fixing-column-encoding-mess-in-mysql/[/quote:fbe3fa0222]

Yes when my server mysql jumped from 3.23 to 4.12, i saw the confusion start. I found this link on mysql's website which gives the info http://dev.mysql.com/doc/refman/5.0/en/charset-conversion.html.

[expired user #1520]'s profile image [expired user #1520] posted 16 years ago Permalink

Existing data can be "relabeled" as being in another character set by first doing a no-op conversion to BINARY, then doing a no-op conversion to the actual character set used.



I did a heavy bit of testing and setting a VARCHAR field as BINARY or VARBINARY works well with the conversion over (WARNING: clicking the BINARY checkbox in HeidiSQL causes the field to be set with binary version of the table's character set). Unfortunately, HeidiSQL doenst have the ability to change the field character sets, so i did it in SQLyog and changed the field back to VARCHAR and their correct character sets, and it worked great.

I read through http://www.mysqlperformanceblog.com/2007/12/18/fixing-column-encoding-mess-in-mysql/, but unfortunately my field has a different character set then the table, so it doesnt help because i need to change the character set of the field in the process.
[expired user #1125]'s profile image [expired user #1125] posted 16 years ago Permalink

please give me the ability to decide the characterset i wish to use when connecting to my database.



That would encourage more people to store data with a wrong character set specification just like you (seem to) do.

Hint: if you think that the SET NAMES that HeidiSQL does is wrong, then usually what's happening is that you're storing stuff wrong in your MySQL database. By doing a SET NAMES manually, you can get good characters by doing this wrong translation in reverse.



Well if i was entering data directly into my database from HeidiSQL, then likely this would be true, but the data comes from my PHP script and i use HeidiSQL just for maintenance and viewing data. :)



Then the flaw is within the PHP scripts (or within the configuration used to set up the connection for the PHP scripts).

It's basically a sign that you have an application that is giving MySQL Server data and telling the server "this is data in character set X", but in reality it is data encoded in character set Y.



Well my server has character_set_server and character_set_database set as latin1, but heidisql is connecting with cp1256, so that would be the main problem behind it.



That is a misunderstanding.

Ideally, if you click on a table with a particular characterset, heidisql should change to that characterset.



That too.

If you're seeing wrong characters, you're definitely storing data wrong.

(OTOH, if you're seeing only question marks, no wrong characters, then you may just have hit characters that does not display in the local ANSI code page.)

Since your application is in PHP, you are probably processing html form data. Browsers will send form data either with an unspecified character set, but actually encoded in the character set that the server used for the web page the form was posted from, or with a character set explicitly specified in a Content-Type header. The latter is useful, for example, if you are posting to a server from a html file on your local machine, or if you are serving web pages in character set A, but telling the browser via the "Accept-Charset" HTTP header that the server only accepts incoming data in character set B.

I don't know how PHP interprets html form data when there is no character set given. It would be optimal if PHP would remember which character set was used in the web page sent to the client, and then decode incoming form data using the same character set when none was specified, converting the data to some given standard (eg. utf-8[u:55beefbfe3][/u:55beefbfe3]). Whether it actually does that I cannot attest to. I suspect not.

Either way, assuming that PHP does absolutely no character set processing of html form data, you probably just need to change the character set of the mysql connection used within your PHP scripts to reflect the output encoding of the web pages served to clients. That way, MySQL Server will know what the data you're sending it is.

AFAICT, that should be the correct solution to your problem.

(By the way, I think MySQL Server will store whatever junk you throw at it; you won't have a problem until you retrieve the data, and even then it will likely throw you the same junk if you just specify the same character set for your connection as the column has, since it can then presumably skip any conversion step.)

Yes when my server mysql jumped from 3.23 to 4.12, i saw the confusion start.



Possibly because the connection to the mysql server is now subject to some (default) character set (probably utf-8 or latin1) which happens not to match the output encoding of HTML pages from your web server, in turn leading to FORM input data being fed to the mysql server under the pretense of being in another encoding than is actually case. Try aligning the connection's character set to the output encoding of your HTML pages, or alternatively use accept-charset on the FORM html element (see link further below) to ensure that the encoding of incoming data match what you tell the database it is.

You may also want to check your server's log files for messages like "[Warning] <...> had no or invalid character set, and default character set is multi-byte, so character column sizes may have changed" and other character set related warnings.

NB.
For additional information regarding character sets used in data posted via HTML FORMs, refer to:
http://www.w3.org/TR/html4/interact/forms.html#adef-accept-charset

Please login to leave a reply, or register at first.