Comparing lower(foo) foo producing incorrect results

hydrophoid's profile image hydrophoid posted 3 years ago in General Permalink

Environment:

MariaDB v 10.6.2 HeidiSQL v 11.3.0.6321 32 bit Windows 10 Charset: utf8mb4 Collation: utf8mb4_bin

Issue

I am working on a function that accepts user input, validates the input, updates a table if input is valid, and builds a return message.

The issue I am running into is that one of the input values can only be lower case letters. To test for an error condition where the user has input capitol letters,

I have the following if statement:

if lower(cast(in_lk_token as char collate utf8mb4_bin))  
    <> cast(in_lk_token as char collate utf8mb4_bin)
    then
        set v_error_ct = v_error_ct + 1;
        set v_msg_out = concat(v_msg_out, '\t', cast(v_error_ct as char), '. Input Parameter: ', in_lk_token, ' not equal to, ', lower(in_lk_token), ' must contain lower case letters only.\r\n');     

    end if;

I also tried the same if statement but wrapped the output of lower() around cast:

if cast(lower(in_lk_token) as char collate utf8mb4_bin)  
    <> cast(in_lk_token as char collate utf8mb4_bin)
    then
        set v_error_ct = v_error_ct + 1;
        set v_msg_out = concat(v_msg_out, '\t', cast(v_error_ct as char), '. Input Parameter: ', in_lk_token, ' not equal to, ', lower(in_lk_token), ' must contain lower case letters only.\r\n');     

    end if;

If I call the procedure from the command line, the procedure behaves as expected, it is only when I call the procedure from within HeidiSQL that the evaluation behaves unexpectedly.

I have tested this procedure with varchar() and currently I am using mediumtext, and in both cases this is evaluating to TRUE for all test cases.

The input value must also follow the format '{lk_xy}' where x and y are the only characters that can be changed. I am saying this because there may be something strange going on with how HS treats string values before it is passed to to MariaDB.

While trying to trouble shoot, I also tried the following SQL from a query tab:

select 'True' `with_cast_should_be_true`
where cast(lower('{lk_aZ') as char collate utf8mb4_bin) 
<> cast('{lk_aZ}' as char collate utf8mb4_bin);

select 'True' `should_be_true`
where lower('{lk_aZ}')
<> '{lk_aZ}';

Since the second select statement is not true even though the two sides are not equal, I thought that the issue was possibly caused by a collation problem, though if this is the cause of the problem, the direct cast in the if statement should have fixed the issue.

Does anyone have any thoughts or ideas that could help me figure out what is causing this issue?

ansgar's profile image ansgar posted 3 years ago Permalink

Try this one:

SELECT
  (cast('{lk_aZ}' as CHAR)) COLLATE UTF8MB4_BIN <>
  lower('{lk_aZ}') COLLATE UTF8MB4_BIN

I got SQL errors with the collate clause inside the cast() function, so I moved it out of it.

I suppose the issue is that you are not really casting to that binary collation, the string keeps being non-binary, so lowercase letters equal uppercase letters.

hydrophoid's profile image hydrophoid posted 3 years ago Permalink

Thank you for your help Ansgar,

I figured out the issue, and I was just logging in to post an update in case someone else finds this issue:

I was calling the function from my SQL history, and the collation must have been set to a case insensitive system when I created the query. If string comparisons are not behaving as expected, retype the query instead of pulling it up from the query history.

For a more detailed explanation, I am not sure if the database was initially set to default to utf8mb4_bin, but as part of the trouble shooting I dropped the database and recreated it using a default utf8mb4_bin.

I then went to my query history and pulled the data so that I didn't have to retype the call to the function, which seems to be the cause of the issue.

When I was testing, I also copy/pasted the strings from the resurrected SQL query when writing the test queries I posted here, so I am guessing that the copy feature in HeidiSQL is smart enough to track the collation of a string in addition to tracking collation from the SQL history.

My program will be more robust because of this, as I ended building a branch of logic that moves the error to the warning messages if a lower() vs upper() check ends up being true, that way the user can override the message.

Thanks for building such a great database management tool! And thanks for taking the time to offer support.

Please login to leave a reply, or register at first.