Concern about export reliability

kalvaro's profile image kalvaro posted 8 years ago in Import/Export Permalink

I've just noticed that when I export a large table with HeidiSQL the queries in log pane look like this:

SELECT * FROM `database_name`.`very_large_table` LIMIT 759838;
SELECT * FROM `database_name`.`very_large_table` LIMIT 759838, 759838;
SELECT * FROM `database_name`.`very_large_table` LIMIT 1519676, 759838;

In other words, a series of select queries with LIMIT clause but no ORDER BY. Unless I'm missing something, that means that HeidiSQL can be happily omitting some rows and exporting other rows several times. Instead, I'd expect to order by a deterministic column set, e.g.:

  • Primary key (if exists):
SELECT *
FROM `database_name`.`very_large_table`
ORDER BY server_id, foo_id
LIMIT 1519676, 759838;
  • All table rows (otherwise):
SELECT *
FROM `database_name`.`very_large_table`
ORDER BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
LIMIT 1519676, 759838;

It might be rare but I've found actual situations where a simple paginator would fail for this same reason.

kalvaro's profile image kalvaro posted 8 years ago Permalink

I hope it isn't a major change... <:-)

ansgar's profile image ansgar posted 8 years ago Permalink

Confirmed. I can check that.

Even with an ORDER BY clause in place, you could miss some rows which were inserted between two of the above SELECT's by another user, and get other ones duplicated. mysqldump wraps the data dumping in a table lock (which can also be disabled). HeidiSQL does not even support doing such locks.

kalvaro's profile image kalvaro posted 8 years ago Permalink

What a mess... There's a good reason why they say that backups never fail, only restores do :)

lemon_juice's profile image lemon_juice posted 8 years ago Permalink

I have the same concerns about the LIMITs in exports. Without an ORDER BY the selects are not guaranteed to be sequential, it is probably rare but can happen sometimes and the consequences can be hard to discover and potentially dangerous.

Is there any reason why Heidi uses multiple selects for a single table? As far as I know a single select without a limit should work even for very large tables and the mysql driver should be able to stream the data over the connection. In fact, this is what mysqldump seems to be doing - by watching the process list I can see mysqldump uses the following query to fetch data from a table:

SELECT /*!40001 SQL_NO_CACHE */ * FROM `mytable`

Simple and safe. Could Heidi do the same?

Please login to leave a reply, or register at first.