SQL Snippets in ANSI only (not UTF-8)

[expired user #2726]'s profile image [expired user #2726] posted 17 years ago in Running SQL scripts Permalink
I'm in the process of migrating from http://www.navicat.com to HeidiSQL and have noticed a small problem.

The Load SQL from Textfile function requires the text file to be in ANSI encoding. All my saved sql queries are in UTF-8.

Is there a way around this?
Thanks
Chillbo
ansgar's profile image ansgar posted 17 years ago Permalink
Not sure what actually requires ANSI here. However, this is the relevant code for that part:

var
tmpstr, filecontent      : String;
begin
...
AssignFile( f, filename );
Reset( f );
while not eof( f ) do
begin
Readln( f, tmpstr );
filecontent := filecontent + tmpstr + CRLF;
end;
...
SynMemoQuery.SelText := filecontent;
...
end;


I guess the string variables and the ReadLn function don't like UTF8. Could also be that SynEdit doesn't like it or even both (latter one would be quite painful to fix).
ansgar's profile image ansgar posted 17 years ago Permalink
A first step for me could be to use WideString instead of String variables. Any suggestions, delphi hackers?
[expired user #1125]'s profile image [expired user #1125] posted 17 years ago Permalink
For an immediate fix, a workaround would be to use 'iconv' to convert the file from UTF-8 to fx latin1 ANSI. Iconv is a Unix tool, but it can be installed on Windows via MKS Toolkit, Cygwin, SFU or similar, in a virtual machine running Kubuntu or what not. There's also an abundance of text editors that can convert files, although probably not in a batch fashion.

I don't think using a WideString will magically cause the Delphi compiler to use a version of Readln() that is Unicode-aware (but I don't know).

There's a magic marker (a BOM) at the beginning of proper Unicode files, the code to read the file would have to identify that and read the file per the format indicated. See:
http://en.wikipedia.org/wiki/Byte_Order_Mark

The most common Unicode formats are UTF-8, UTF-16 BE and UTF-16 LE. As far as the others go, HeidiSQL could get away with throwing an exception (ultimately ending in the showing of an error message). UTF-32 BE/LE would be nice to support too, just because it's the simplest universal encoding form available (encompassing all characters from all languages) and as such might be of use to developers. It's not in wide-spread use though, since it generally requires compression to achieve space efficiency.

If you want to do it yourself, Googling...:
http://groups.google.com/groups?q=group%3Aborland.*%20read%20unicode%20file

...yields results:
http://groups.google.com/group/borland.public.delphi.rtl.general/msg/143ec5a40db89bd5
http://groups.google.com/group/borland.public.delphi.winapi/msg/8fe74c87f73c5b91

But they are far from perfect, the two above does not support complex encodings such as UTF-8, for example.

There are various components that will do the job for you. Here's one with an unspecified license and an unclear maintainership, as are the norm for these things:
http://www.yunqa.de/delphi/converters/

There's also open source components. SynEdit comes in a Unicode version, which has a LoadFromFile function that will load Unicode files. It's in the official SynEdit VCS by now, see:
http://mh-nexus.de/components.htm
ansgar's profile image ansgar posted 16 years ago Permalink
Btw, loading and saving SQL files and snippets should work in the latest build for any charset, including UTF8 and ANSI.
[expired user #1125]'s profile image [expired user #1125] posted 16 years ago Permalink
There is BOM detection code both in the WideStrings (iirc) Delphi unit and the TNT code recently added to HeidiSQL, so using that to detect Unicode files and load with the correct decoding if a UTF-8 BOM is found should be pretty straightforward.

But HeidiSQL does no such thing yet. File load does not work correctly, it assumes ANSI as of r1388. File save is ASCII, at least for SQL export.

There is also an issue with non-BOMmed files. Non-BOM text files generated by other apps could be in a variety of code pages. The most flexible would be to allow the user to choose at least between ASCII, ANSI codepage and UTF-8, but that's incidentally also the most complex to implement. Perhaps an option in preferences to switch between ASCII, local ANSI cp and UTF-8 for undetectable stuff would do the trick.

Please login to leave a reply, or register at first.