100% crash on SQL file load.

Icos's profile image Icos posted 11 years ago in Import/Export Permalink
I have a SQL file created by HeidiSQL 8.0.0.4466 and i'm trying to import this dump back to HeidiSQL 8.0.0.4466. File is bigger then 5mb, so HeidiSQL asks if i want to load it in editor or just run it without loading.

If i load it, dump imports just fine. If i try to run it without loading HeidiSQL gives 100% crash with syntax error.

The file is in utf-8 encoding with Russian symbols.

Here is the link to dump and bugreport:
http://rghost.ru/47655433
ansgar's profile image ansgar posted 11 years ago Permalink
Your dump has an SQL error in it. The callstack stops at that error:

exception message : SQL Error (1064): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''STR_BOW_N' at line 4688.


I should somehow handle SQL errors with a popup dialog. However, you now know what's happening.
Icos's profile image Icos posted 11 years ago Permalink
Well, then HeidiSQL 8.0.0.4466 makes those bad dumps. I didn't touched it myself. Just made a dump and them tried to load it in another table.

Also, if there is an error then HeidiSQL 8.0.0.4466 should point on it when i load dump in the editor and then try to run it, but it doesn't.
ansgar's profile image ansgar posted 11 years ago Permalink
Can you please find the whole query around that ''STR_BOW_N', so I can probably find out what's exactly causing the error.
Icos's profile image Icos posted 11 years ago Permalink
Well there are a lot of such strings in the dump. For example:
(700859, 'STR_BOW_N_C_10A', 'Лук стражника'),

However there are no STR_BOW_N strings at line 4688 at all:
4687: (704670, 'STR_ITEM_R_CO_Q5514', 'Рецепт: Филе моллита'),
4688: (704671, 'STR_ITEM_R_CO_Q5515', 'Рецепт: Толченая капуста в кляре'),
4689: (704672, 'STR_ITEM_R_CO_Q5516', 'Рецепт: Пряная ножка руфента'),
ansgar's profile image ansgar posted 11 years ago Permalink
Do not look at line 4688 in the *file*. 4688 is the line number in *one* query, which is at least 4688 lines long.
Icos's profile image Icos posted 11 years ago Permalink
Ok, here are those lines:

4687: (749284, 'STR_BOW_N_L1_A_53A', 'Лук падения'),
4688: (749285, 'STR_BOW_N_L1_A_53B', 'Лук падения'),
4689: (749286, 'STR_BOW_N_L1_A_53C', 'Лук падения'),
4690: (749287, 'STR_BOW_N_C2_P_55A', 'Гардениевый лук'),
4691: (749288, 'STR_BOW_N_R1_P_55A', 'Сияющий гардениевый лук'),
4692: (749289, 'STR_BOW_N_R2_P_55A', 'Гардениевый лук подмастерья'),

Nothing unusual.
Icos's profile image Icos posted 11 years ago Permalink
One more test. I've deleted all previous insert statement. That truncated sql file's length to 9.9Mb (from 14.4Mb) and dump was imported correctly.

I think the problem lies in the dump's length.
ansgar's profile image ansgar posted 11 years ago Permalink
Did you select "UTF-8" encoding in the "open file" dialog? Probably you have left it to "Auto detect", and it detected the wrong encoding?
Icos's profile image Icos posted 11 years ago Permalink
Tried both auto and utf-8. Same result: crashes.

Why don't you try to import my dump into some test db? You'll see it yourself.
ansgar's profile image ansgar posted 11 years ago Permalink
Yes, I can reproduce that error, and I am trying to find the cause of it.
ansgar's profile image ansgar posted 11 years ago Permalink
HeidiSQL cuts the query at a wrong position:

4687: (749284, 'STR_BOW_N_L1_A_53A', 'Лук падения'),
4688: (749285, 'STR_BOW_N

It should be cut at the semicolon.
ansgar's profile image ansgar posted 11 years ago Permalink
I located the problem in my file reader method. Only the very first chunk of 5M is read correctly. Later reads always return an empty string.
Icos's profile image Icos posted 11 years ago Permalink
Glad to hear, that you've found the source of the problem.
Waiting for a fixed version. smile
ansgar's profile image ansgar posted 11 years ago Permalink
Bad news. My file reader method works on 4 other UTF8 files I just tested. Just your file with russion characters seemt to break it.

The actual issue is somewhere in the Delphi method TEncoding.Convert(), to which I pass the read bytes and the detected UTF8 encoding type. This call to Convert() returns an empty buffer in the second chunk of 5M of this file. The 3rd is again filled, so it's not only the first one which succeeds.
Code modification/commit from ansgar.becker, 11 years ago, revision 4469
Turn unhandled exception in case of SQL errors into a popup dialog when running SQL files. See http://www.heidisql.com/forum.php?t=13044
ansgar's profile image ansgar posted 11 years ago Permalink
When updating to r4469, SQL errors are handled by a normal popup dialog, instead of raising an exception window.

As a workaround, you can open that dump file in some editor, let's say Notepad++, and save it in Unicode format instead of UTF8. This way, HeidiSQL's call to TEncoding.Convert should work.
Icos's profile image Icos posted 11 years ago Permalink
So unless Delphi makers fix this bug, you can't do anything on your end?
Code modification/commit from ansgar.becker, 11 years ago, revision 4470
* Ensure ReadTextfileChunk reads a multiplier of the encoding's maximum byte count per char. See http://www.heidisql.com/forum.php?t=13044
* Log error message when TEncoding.Convert returns an empty TByte array.
* Documentation
ansgar's profile image ansgar posted 11 years ago Permalink
Now, with r4470, you should see more exactly what's happening in such cases. Watch out for an error message in the SQL log panel when running a dump file.
ansgar's profile image ansgar posted 11 years ago Permalink
By the way, I'm still thinking we can fix the issue somehow, without relying on a fix from the Delphi makers.
Icos's profile image Icos posted 11 years ago Permalink
I really hope that this will happen.
Code modification/commit from ansgar.becker, 11 years ago, revision 4472
Increase size of chunk to read from SQL dump from 5M to 20M, so we catch at least the file problems mentioned on http://www.heidisql.com/forum.php?t=13044 .
ansgar's profile image ansgar posted 11 years ago Permalink
r4472 now reads SQL files in 20M chunks instead of 5M. That magically fixes *your* problem, while it leaves the issue untouched. But I can't find a real solution to that TEncoding.Convert() problem, and reading 20M into memory is very ok for most users.
ansgar's profile image ansgar posted 11 years ago Permalink
No feedback any more?
Icos's profile image Icos posted 11 years ago Permalink
Sorry, i was away on two weeks vacation.
I'll test your new build tomorrow.
Icos's profile image Icos posted 11 years ago Permalink
My file imports without problems. Tried it by itself and as a part of entire database one file dump (141mb).

Thank you very much for fixing this problem.
Icos's profile image Icos posted 11 years ago Permalink
Well, apparently this bug is still alive. Today i've tried to import 250Mb dump and r4482 gave me an error on one table. This time about wrong column/value match, but i think the source of this problem is still the same.

When i dumped it separately it was imported without any problems.
ansgar's profile image ansgar posted 11 years ago Permalink
Wrong column/value match does not sound like the error above - to get that error, the syntax of the INSERT must be correct except for the number of columns/values. Can you track down that error to a SQL line in the big file?
Icos's profile image Icos posted 11 years ago Permalink
Well, this dump imports just fine with command line mysql. So it's a correct dump without a doubt.

Also, this bug cuts the query at a wrong position, so it could easily cut off some values and produce this error.
ansgar's profile image ansgar posted 11 years ago Permalink
A cut-at-wrong-position bug is more likely than the error you described previously for which I committed a workaround.

Still, to fix anything further here, I will need the exact SQL code which is causing the error.
Icos's profile image Icos posted 11 years ago Permalink
The error: "Column count doesn't match value count at row 4584"
Here is exact code:
INSERT INTO `client_strings_item_fr` (`id`, `name`, `body`) VALUES
...
4584: (704592, 'STR_ITEM_R_HA_Q5318', 'Plan : Echelle en bois'),
4585: (704593, 'STR_ITEM_R_HA_Q5319', 'Plan : Tambour'),
4586: (704594, 'STR_ITEM_R_HA_Q5320', 'Plan : Arc-bâton en Koa'),

As you can see, code is fairly simple and correct.
ansgar's profile image ansgar posted 11 years ago Permalink
Row 4584 is the row in one INSERT query, not row 4584 in the file.
Icos's profile image Icos posted 11 years ago Permalink
Yeah, you've told me about it already. I've cut the beginning of the file to this INSERT and counted rows starting from that.
ansgar's profile image ansgar posted 11 years ago Permalink
Ah ok, thanks. The SQL code looks very ok, so it might really be some cut-at-wrong-position bug in Heidi. I will again need that whole file to analyze that.
Icos's profile image Icos posted 11 years ago Permalink
Here is the full dump.
http://rghost.ru/48376145
ansgar's profile image ansgar posted 11 years ago Permalink
Thanks! I'm going to check that.
Icos's profile image Icos posted 11 years ago Permalink
Thank you.
ansgar's profile image ansgar posted 11 years ago Permalink
Confirmed. Although I see different lines when searching for line 4584 from the query which throws errors:

...
4583:	(778347, 'STR_REC_D_TA_TA_PART_MASS_D_DRA_LT_552C', 'Design balique : Paquet d\'éblouissants fragments d\'écaille de Balaur coriace'),
4584:	(778348, 'STR_REC_D_TA_TA_PART_D_DRA_RB_401B', 'Design balique : Noble tache de sang chaud de Balaur inerte.'),
4585:	(778349, 'STR_REC_D_TA_TA_PART_MASS_D_DRA_RB_401B', 'Design balique : Paquet de nobles taches de sang chaud de Balaur inerte.'),


The point is that I'm getting these errors in the SQL log:
Error when converting chunk from encoding 65001 (UTF-8) to 1200  (Unicode) in full-dump.sql at position 160,0 MiB
Error when converting chunk from encoding 65001 (UTF-8) to 1200  (Unicode) in full-dump.sql at position 180,0 MiB


You did not see these?

So we're again at the TEncoding.Convert() problem, as described above. I already said that r4472 just reads bigger chunks to minimize the probability to hit characters which break TEncoding.Convert().

There must be a fix for that. We already saw no problem when reading a file in one go, so the actual problem is that I'm splitting the SQL code somewhere within a multibyte character.
ansgar's profile image ansgar posted 11 years ago Permalink
Now, with r4494 I fixed a bug in the calculation of the chunksize which should have already worked in r4470. With that fix, I now get the same error at a later file position, with some japanese characters. These chars are somehow broken in that file, don't know why as we have UTF-8 here. I think HeidiSQL now still splits within some japanese multibyte character. Perhaps I should just not use the max-byte-per-char for the given encoding but a bigger multiplier of 2, let's say 40. Let me test that this evening. Leaving to work now.
Icos's profile image Icos posted 11 years ago Permalink
Nope, didn't see those errors.
Glad to see though, that you can reproduce this error. Looks like it's type depends on where heidi spits the line.
Icos's profile image Icos posted 11 years ago Permalink
So, any progress in fixing this bug?
ansgar's profile image ansgar posted 11 years ago Permalink
I am stuck, as each divider resulted in a splitmark somewhere in a multibyte character. I tried 4, 16, 128, even bigger ones, 256 and even 1k. All failed.

What I did not yet test is selecting Unicode encoding when loading the file per open-file-dialog. Could you please do that and report if that helped?
Icos's profile image Icos posted 11 years ago Permalink
utf-8 gives me the same error as autodetect. Unicode gives 'mysql server has gone away' error =)

I've mentioned this already, when we talked about the first dump's bug, but i'd like to remind you about it once more.

That first one crashed on (run without loading), but it worked just fine after loading it in query editor. I can't repeat this with full dump (getting out of memory error), but maybe your code which loads file in query editor does splitting correctly?

I have plenty of RAM on my machine, so if you can remove restriction/error on loading big dumps in editor i can try to check this situation.
ansgar's profile image ansgar posted 11 years ago Permalink
You can always load a file into the editor, you're asked by HeidiSQL what to do if the file is bigger than 5m.

And yes, of course loading into the editor is fine because it loads the file in *one go*, while running it directly reads it in chunks of 20m, to minimize RAM usage. The file reader method then reads from a filestream and stops at 20m (or 40m, 60m and so on). The stream is byte-based, not character based, which means, HeidiSQL must take care for multibyte characters. This is the place where the bug happens - the 20m (or 40m etc.) stops somewhere within a multibyte character. Later attempts to read the chunk with respect to the given encoding break here, and return an empty string most of the time.
Icos's profile image Icos posted 11 years ago Permalink
I've tried to load 226mb dump into editor and got "out of memory error". however, i have lots of free RAM.
kalvaro's profile image kalvaro posted 11 years ago Permalink
I haven't been following this issue so please bear with me if I'm saying nonsense but... It appears that HeidiSQL needs to split source *.sql file and that might happen in the middle of a multi-byte character. Can't you just start reading backwards from the last offset in chunk and remove the last bytes if they belong to an incomplete char? Those bytes can be prepended to the next chunk.

(I suppose HeidiSQL already does something similar to avoid incomplete statements.)
ansgar's profile image ansgar posted 11 years ago Permalink
Oh yes that's at least a new idea. Thanks.

To avoid incomplete statements, HeidiSQL postpones firing the last detected block to the next loop, where the file reader gets the next chunk of 20m and appends it to the last block. If still no semicolon is found in that block, the whole block is again preserved for the next loop, and so on.
Code modification/commit from ansgar.becker, 11 years ago, revision 4503
* Try a new approach in helpers.OpenTextFile(), helpers.ReadTextfile() and helpers.ReadTextfileChunk(): Based on TStreamReader instead of TFileStream now, so we can finally rely on Delphi internals for detecting a file's encoding. Also, this should fix read errors in some UTF-8 files, e.g. mentioned on http://www.heidisql.com/forum.php?t=13044
* Remove helpers.DetectEncoding(). Use a separate TStreamReader in the only caller to detect the encoding of a selected file
* Remove helpers.ScanNulChar(
ansgar's profile image ansgar posted 11 years ago Permalink
Please try r4503. I already have successfully loaded your both dump files, but I like to get some feedback on that, as this file related stuff is quite error prone due to encoding detection.
Icos's profile image Icos posted 11 years ago Permalink
Tried to import two different big dumps. Both imported correctly.

I think, this time you've fixed it at last. Thank you very much.
ansgar's profile image ansgar posted 11 years ago Permalink
Really hope so, r4503 rewrites the whole file-reading approach. Not sure what I had done if that did not work this time.
Code modification/commit from ansgar.becker, 11 years ago, revision 4510
Revert r4503, except for the removed helpers.ScanNulChars() and helpers.RemoveNulChars(). Seem Delphi's encoding detection is totally broken. Also, TStreamReader has bugs and hangs on the second call to ReadTextfileChunk(). Most files should be read fine again, including those mentioned in issue #3331. Also fixes issue #3328.
TODO: fix reading specific UTF8 files, mentioned on http://www.heidisql.com/forum.php?t=13044
ansgar's profile image ansgar posted 11 years ago Permalink
r4503 was broken for many files. I got two new bugreports, see issue #3331 and issue #3328. So, I'm back again at this above mentioned multibyte-reading bug.
ansgar's profile image ansgar posted 11 years ago Permalink
Btw, r4510 reverts what I did in r4503.
Icos's profile image Icos posted 11 years ago Permalink
So is it fixed finally in r4510?
ansgar's profile image ansgar posted 11 years ago Permalink
No, it's again broken in r4510. That's what I meant with "I'm back again at this multibyte-reading bug".
Icos's profile image Icos posted 11 years ago Permalink
I see, i'll stick with r4503 then. At least i can import my dumps correctly with it.
Code modification/commit from ansgar.becker, 11 years ago, revision 4512
New attempt to fix the file reader for cases where the chunk size end is within a multibyte character. Increase chunk size per loop and retry reading the chunk. See http://www.heidisql.com/forum.php?t=13044
ansgar's profile image ansgar posted 11 years ago Permalink
Think I got it, in r4512. Your 226M dump file imports correctly here.

I'm now increasing the 20M chunk size in a loop and retrying to read the whole chunk when detecting a 0-byte result of TEncoding.Convert().
Icos's profile image Icos posted 11 years ago Permalink
That's great! Thank you.
[expired user #6612]'s profile image [expired user #6612] posted 10 years ago Permalink
I had much the same problem (running latest 8.2) which I solved by loading the SQL file into Notepad then saving as ANSI.
Icos's profile image Icos posted 10 years ago Permalink
You can't save UTF-8 as ANSI, pal ;)
[expired user #11039]'s profile image [expired user #11039] posted 6 years ago Permalink

For me this error occurs when loading files files that contain binary data in a blob column that are illegal in UTF-8.

I have attached a Notepad++ screenshot of an example of such data.

1 attachment(s):
  • binarydataexample
ansgar's profile image ansgar posted 6 years ago Permalink

Blobs are normally exported as hex strings, aren't they?

[expired user #11039]'s profile image [expired user #11039] posted 6 years ago Permalink

That is true and I am trying to have the program that generates this binary data changed. I just wanted to add to the discussion another situation that causes file import problems. I don't know if and how HeidiSQL could fix this. I think at least HeidiSQL should detect this situation and not crash.

steinhaug's profile image steinhaug posted 6 years ago Permalink

I have the same problem aswell, it is a phpmyadmin export file in UTF8 format. The error clearly states that the chunk loaded is cut in the middle of a character, you are checking for this since you deliver the error, but when trying to redo the chunk you are doing the same mistake resulting in an endless loop. Wouldnt it be better to add a max try atlest?

Maby add a max tries and quit with a message, "UTF8 import is broken, open the file in another editor and paste the SQL manually and run F9". There are no prblems there.

I have no idea what is going on but could it be like in PHP where you have all the mb_ prefixed functions that ensures correct handling of character sets? I have had similar "problems" with text files for years in TextPad on my Windows machine, where UTF8 files sometimes was troublesome as there was certain characters detected conflicting somthing. That is - some characters would not work as UTF8 even if they were UTF8 in the first place. I could not find any file that made the problem when writing this, could it be some encoding or character set for the system while running the program that is windows related? I would think UTF8 is UTF8 but for all I know its a font file which is constantly updated.

Please login to leave a reply, or register at first.