Binary characters at start of export file

johngosling posted 8 years ago in General
<Apologies - submitting this a second time as I forgot to add a title to the first post>

I have just switched over to using HeidiSQL (4.0 Revision 2405) after having used MySQL 2.4 for many years. I created an SQL export file using Export to upload some tables to a remote server. However, the remote server failed to parse the SQL, complaining that there were binary characters in the upload file. On inspecting file, it turned out their were indeed binary characters at the start which appeared as  in a hex dump. In order to upload the file successfully, I had to create a new file in Notepad and copy the content of the original into that.

The Export settings are Output=File and Target Compatability="HeidiSQL s/ MySQLServer 4.0", although various selections for Target Compatability produce the same result. My OS is Windows 7 64-bit.

Maybe I'm missing something obvious here but I would be grateful if anyone has any ideas.

ansgar posted 8 years ago
HeidiSQL 4.0 and newer builds export UTF-8 files. Most editors, including notepad.exe, are able to display such files. Not so mysqldump.exe, btw.

the remote server failed to parse the SQL, complaining that there were binary characters in the upload file

I'm sure it was not the server which complained, but the client which was not able to handle UTF-8 files. Which client did you use to import the file?
johngosling posted 8 years ago
Thanks for the reply Anse. The client on the server is phpMyAdmin 2.6.4-pl3 and I would be surprised if that is having trouble with UTF-8 files. After looking more into this, it turns out that the three problematic bytes at the start of the file are the UTF-8 byte-order mark (BOM) which, according to the source I read, are added by many Windows programs (including Notepad) at the start of any document saved as UTF-8. Though in my case, the file had not been saved by Notepad or any other program. The BOM bytes were present immediately upon creation of the file by HeidiSQL. Is this what you would expect?

As for phpMyAdmin, it's not clear whether the error is coming from the client or the server. For your information, I have pasted below the error messages which result from importing a Heidi export file which was modified to contain just a single select statement. As with the original, this file contains the BOM at the beginning of the file.


phpMyAdmin output

There is a chance that you may have found a bug in the SQL parser. Please examine your query closely, and check that the quotes are correct and not mis-matched. Other possible failure causes may be that you are uploading a file with binary outside of a quoted text area. You can also try your query on the MySQL command line interface. The MySQL server error output below, if there is any, may also help you in diagnosing the problem. If you still have problems or if the parser fails where the command line interface succeeds, please reduce your SQL query input to the single query that causes problems, and submit a bug report with the data chunk in the CUT section below:
----BEGIN CUT----
----END CUT----
----BEGIN RAW----

ERROR: C1 C2 LEN: 1 2 27
STR: »

CVS: $Id: sqlparser.lib.php,v 2.36 2005/08/08 20:22:11 lem9 Exp $
MySQL: 4.0.27-max-log
PMA: 2.6.4-pl3
PHP VER,OS: 4.3.10-200.schlund.1 Linux
LANG: en-iso-8859-1
SQL: select id from languages----END RAW----

SQL query:

select id from languages
MySQL said:

#1064 - You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'select id from languages' at line 1
ansgar posted 8 years ago

The BOM bytes were present immediately upon creation of the file by HeidiSQL. Is this what you would expect?

Yes, exactly, looks correct to me. The UTF-8 BOM is needed to correctly preserve international characters in the export file.

The client in this case is PHP which reads that file and expects it to be Latin1 encoded, which is not true. I'm not that deep into pma, but I guess you can change this handling of files by tweaking some charset setting in PMA. At least with PHP that is possible by skipping the BOM manually, like described in their manual.

Probably a PMA update can fix that problem for you - 2.6.4 is from 2005.
johngosling posted 8 years ago
Okay. Thanks for the info on this Anse. I guess I'm stuck because it's an IPS's server (1&1) and not much chance of getting them to update phpMyAdmin. But, no worry. I'll find a work around.
johngosling posted 8 years ago
. . . "ISP's" . . .
ansgar posted 8 years ago
Ok, in that case you can always edit your .sql file using any Unicode editor and save it in Latin1 encoding. Of course you will break international characters in your file, but as you're using MySQL 4.0 which is not Unicode enabled I think that's no problem.

Or, if the file is small enough you can load it into HeidiSQL to import (requires a direct MySQL connection which is probably blocked by 1&1, not sure).
djn posted 8 years ago
Anse, are you sure the BOM is actually needed there? The Unicode consortium says that in UTF-8 usage of a BOM is permitted, but not reccomended - exactly because of such mismatches.
ansgar posted 8 years ago
Oh good point, some time ago since I read that. I'll try if removing that BOM breaks HeidiSQL's own detection of Unicode files while reading into the query editor.
johngosling posted 8 years ago
Further to djn's point, I did come across a Wikipedia article (I know it's not necessarily the most authoritative source) which lists a number of interoperability problems which arise as a consequence of BOM usage, amongst which was just the problem I was experiencing. See http://en.wikipedia.org/wiki/UTF-8.
johngosling posted 8 years ago
Funnily enough, the Wikipedia link I added to the previous post, although the URL to the page itself, only leads to a Wipedia search page. So click on "Search for UTF-8" on that page and select the first article listed to get to the intended page.
ansgar posted 8 years ago
See r2840 - no BOM's any more in written files, and mysqldump.exe reads these files without complaint. Hopefully that doesn't break other tools, but as I understand right it's more common that applications do not add BOM - so does Heidi now.
johngosling posted 8 years ago
Great - thanks for looking at this Anse
djn posted 8 years ago
Anse, your speed in resolving this king of issues amazes me. Impressive.

John, it's not Wikipedia - it's the period at the end of the statement that got caught in the link... ;-)

Please login to leave a reply, or register at first.