SQL Snippets in ANSI only (not UTF-8)

4 posts

[expired user #2726] posted 17 years ago in Running SQL scripts

I'm in the process of migrating from http://www.navicat.com to HeidiSQL and have noticed a small problem.

The Load SQL from Textfile function requires the text file to be in ANSI encoding. All my saved sql queries are in UTF-8.

Is there a way around this?
Thanks
Chillbo

10152 posts

ansgar posted 17 years ago

Not sure what actually requires ANSI here. However, this is the relevant code for that part:

var
tmpstr, filecontent      : String;
begin
...
AssignFile( f, filename );
Reset( f );
while not eof( f ) do
begin
Readln( f, tmpstr );
filecontent := filecontent + tmpstr + CRLF;
end;
...
SynMemoQuery.SelText := filecontent;
...
end;

I guess the string variables and the ReadLn function don't like UTF8. Could also be that SynEdit doesn't like it or even both (latter one would be quite painful to fix).

10152 posts

ansgar posted 17 years ago

A first step for me could be to use WideString instead of String variables. Any suggestions, delphi hackers?

127 posts

[expired user #1125] posted 17 years ago

For an immediate fix, a workaround would be to use 'iconv' to convert the file from UTF-8 to fx latin1 ANSI. Iconv is a Unix tool, but it can be installed on Windows via MKS Toolkit, Cygwin, SFU or similar, in a virtual machine running Kubuntu or what not. There's also an abundance of text editors that can convert files, although probably not in a batch fashion.

I don't think using a WideString will magically cause the Delphi compiler to use a version of Readln() that is Unicode-aware (but I don't know).

There's a magic marker (a BOM) at the beginning of proper Unicode files, the code to read the file would have to identify that and read the file per the format indicated. See:
http://en.wikipedia.org/wiki/Byte_Order_Mark

The most common Unicode formats are UTF-8, UTF-16 BE and UTF-16 LE. As far as the others go, HeidiSQL could get away with throwing an exception (ultimately ending in the showing of an error message). UTF-32 BE/LE would be nice to support too, just because it's the simplest universal encoding form available (encompassing all characters from all languages) and as such might be of use to developers. It's not in wide-spread use though, since it generally requires compression to achieve space efficiency.

If you want to do it yourself, Googling...:
http://groups.google.com/groups?q=group%3Aborland.*%20read%20unicode%20file

...yields results:
http://groups.google.com/group/borland.public.delphi.rtl.general/msg/143ec5a40db89bd5
http://groups.google.com/group/borland.public.delphi.winapi/msg/8fe74c87f73c5b91

But they are far from perfect, the two above does not support complex encodings such as UTF-8, for example.

There are various components that will do the job for you. Here's one with an unspecified license and an unclear maintainership, as are the norm for these things:
http://www.yunqa.de/delphi/converters/

There's also open source components. SynEdit comes in a Unicode version, which has a LoadFromFile function that will load Unicode files. It's in the official SynEdit VCS by now, see:
http://mh-nexus.de/components.htm

10152 posts

ansgar posted 16 years ago

Btw, loading and saving SQL files and snippets should work in the latest build for any charset, including UTF8 and ANSI.

127 posts

[expired user #1125] posted 16 years ago

There is BOM detection code both in the WideStrings (iirc) Delphi unit and the TNT code recently added to HeidiSQL, so using that to detect Unicode files and load with the correct decoding if a UTF-8 BOM is found should be pretty straightforward.

But HeidiSQL does no such thing yet. File load does not work correctly, it assumes ANSI as of r1388. File save is ASCII, at least for SQL export.

There is also an issue with non-BOMmed files. Non-BOM text files generated by other apps could be in a variety of code pages. The most flexible would be to allow the user to choose at least between ASCII, ANSI codepage and UTF-8, but that's incidentally also the most complex to implement. Perhaps an option in preferences to switch between ASCII, local ANSI cp and UTF-8 for undetectable stuff would do the trick.