100% crash on SQL file load.
If i load it, dump imports just fine. If i try to run it without loading HeidiSQL gives 100% crash with syntax error.
The file is in utf-8 encoding with Russian symbols.
Here is the link to dump and bugreport:
http://rghost.ru/47655433
exception message : SQL Error (1064): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ''STR_BOW_N' at line 4688.
I should somehow handle SQL errors with a popup dialog. However, you now know what's happening.
Also, if there is an error then HeidiSQL 8.0.0.4466 should point on it when i load dump in the editor and then try to run it, but it doesn't.
(700859, 'STR_BOW_N_C_10A', 'Лук стражника'),
However there are no STR_BOW_N strings at line 4688 at all:
4687: (704670, 'STR_ITEM_R_CO_Q5514', 'Рецепт: Филе моллита'),
4688: (704671, 'STR_ITEM_R_CO_Q5515', 'Рецепт: Толченая капуста в кляре'),
4689: (704672, 'STR_ITEM_R_CO_Q5516', 'Рецепт: Пряная ножка руфента'),
4687: (749284, 'STR_BOW_N_L1_A_53A', 'Лук падения'),
4688: (749285, 'STR_BOW_N_L1_A_53B', 'Лук падения'),
4689: (749286, 'STR_BOW_N_L1_A_53C', 'Лук падения'),
4690: (749287, 'STR_BOW_N_C2_P_55A', 'Гардениевый лук'),
4691: (749288, 'STR_BOW_N_R1_P_55A', 'Сияющий гардениевый лук'),
4692: (749289, 'STR_BOW_N_R2_P_55A', 'Гардениевый лук подмастерья'),
Nothing unusual.
The actual issue is somewhere in the Delphi method TEncoding.Convert(), to which I pass the read bytes and the detected UTF8 encoding type. This call to Convert() returns an empty buffer in the second chunk of 5M of this file. The 3rd is again filled, so it's not only the first one which succeeds.
Turn unhandled exception in case of SQL errors into a popup dialog when running SQL files. See http://www.heidisql.com/forum.php?t=13044
As a workaround, you can open that dump file in some editor, let's say Notepad++, and save it in Unicode format instead of UTF8. This way, HeidiSQL's call to TEncoding.Convert should work.
* Ensure ReadTextfileChunk reads a multiplier of the encoding's maximum byte count per char. See http://www.heidisql.com/forum.php?t=13044
* Log error message when TEncoding.Convert returns an empty TByte array.
* Documentation
Increase size of chunk to read from SQL dump from 5M to 20M, so we catch at least the file problems mentioned on http://www.heidisql.com/forum.php?t=13044 .
When i dumped it separately it was imported without any problems.
Here is exact code:
INSERT INTO `client_strings_item_fr` (`id`, `name`, `body`) VALUES
...
4584: (704592, 'STR_ITEM_R_HA_Q5318', 'Plan : Echelle en bois'),
4585: (704593, 'STR_ITEM_R_HA_Q5319', 'Plan : Tambour'),
4586: (704594, 'STR_ITEM_R_HA_Q5320', 'Plan : Arc-bâton en Koa'),
As you can see, code is fairly simple and correct.
http://rghost.ru/48376145
...
4583: (778347, 'STR_REC_D_TA_TA_PART_MASS_D_DRA_LT_552C', 'Design balique : Paquet d\'éblouissants fragments d\'écaille de Balaur coriace'),
4584: (778348, 'STR_REC_D_TA_TA_PART_D_DRA_RB_401B', 'Design balique : Noble tache de sang chaud de Balaur inerte.'),
4585: (778349, 'STR_REC_D_TA_TA_PART_MASS_D_DRA_RB_401B', 'Design balique : Paquet de nobles taches de sang chaud de Balaur inerte.'),
The point is that I'm getting these errors in the SQL log:
Error when converting chunk from encoding 65001 (UTF-8) to 1200 (Unicode) in full-dump.sql at position 160,0 MiB
Error when converting chunk from encoding 65001 (UTF-8) to 1200 (Unicode) in full-dump.sql at position 180,0 MiB
You did not see these?
So we're again at the TEncoding.Convert() problem, as described above. I already said that r4472 just reads bigger chunks to minimize the probability to hit characters which break TEncoding.Convert().
There must be a fix for that. We already saw no problem when reading a file in one go, so the actual problem is that I'm splitting the SQL code somewhere within a multibyte character.
What I did not yet test is selecting Unicode encoding when loading the file per open-file-dialog. Could you please do that and report if that helped?
I've mentioned this already, when we talked about the first dump's bug, but i'd like to remind you about it once more.
That first one crashed on (run without loading), but it worked just fine after loading it in query editor. I can't repeat this with full dump (getting out of memory error), but maybe your code which loads file in query editor does splitting correctly?
I have plenty of RAM on my machine, so if you can remove restriction/error on loading big dumps in editor i can try to check this situation.
And yes, of course loading into the editor is fine because it loads the file in *one go*, while running it directly reads it in chunks of 20m, to minimize RAM usage. The file reader method then reads from a filestream and stops at 20m (or 40m, 60m and so on). The stream is byte-based, not character based, which means, HeidiSQL must take care for multibyte characters. This is the place where the bug happens - the 20m (or 40m etc.) stops somewhere within a multibyte character. Later attempts to read the chunk with respect to the given encoding break here, and return an empty string most of the time.
(I suppose HeidiSQL already does something similar to avoid incomplete statements.)
To avoid incomplete statements, HeidiSQL postpones firing the last detected block to the next loop, where the file reader gets the next chunk of 20m and appends it to the last block. If still no semicolon is found in that block, the whole block is again preserved for the next loop, and so on.
* Try a new approach in helpers.OpenTextFile(), helpers.ReadTextfile() and helpers.ReadTextfileChunk(): Based on TStreamReader instead of TFileStream now, so we can finally rely on Delphi internals for detecting a file's encoding. Also, this should fix read errors in some UTF-8 files, e.g. mentioned on http://www.heidisql.com/forum.php?t=13044
* Remove helpers.DetectEncoding(). Use a separate TStreamReader in the only caller to detect the encoding of a selected file
* Remove helpers.ScanNulChar(
Revert r4503, except for the removed helpers.ScanNulChars() and helpers.RemoveNulChars(). Seem Delphi's encoding detection is totally broken. Also, TStreamReader has bugs and hangs on the second call to ReadTextfileChunk(). Most files should be read fine again, including those mentioned in issue #3331. Also fixes issue #3328.
TODO: fix reading specific UTF8 files, mentioned on http://www.heidisql.com/forum.php?t=13044
New attempt to fix the file reader for cases where the chunk size end is within a multibyte character. Increase chunk size per loop and retry reading the chunk. See http://www.heidisql.com/forum.php?t=13044
That is true and I am trying to have the program that generates this binary data changed. I just wanted to add to the discussion another situation that causes file import problems. I don't know if and how HeidiSQL could fix this. I think at least HeidiSQL should detect this situation and not crash.
I have the same problem aswell, it is a phpmyadmin export file in UTF8 format. The error clearly states that the chunk loaded is cut in the middle of a character, you are checking for this since you deliver the error, but when trying to redo the chunk you are doing the same mistake resulting in an endless loop. Wouldnt it be better to add a max try atlest?
Maby add a max tries and quit with a message, "UTF8 import is broken, open the file in another editor and paste the SQL manually and run F9". There are no prblems there.
I have no idea what is going on but could it be like in PHP where you have all the mb_ prefixed functions that ensures correct handling of character sets? I have had similar "problems" with text files for years in TextPad on my Windows machine, where UTF8 files sometimes was troublesome as there was certain characters detected conflicting somthing. That is - some characters would not work as UTF8 even if they were UTF8 in the first place. I could not find any file that made the problem when writing this, could it be some encoding or character set for the system while running the program that is windows related? I would think UTF8 is UTF8 but for all I know its a font file which is constantly updated.
Please login to leave a reply, or register at first.