Importing into a new table from TSV crashes HeidiSQL

sacrophyte's profile image sacrophyte posted 2 years ago in Import/Export Permalink

I am doing an import from TSV (Tools -> Import CSV file), then in the file open dialog, selecting All files (.) with either ANSI or UTF-8 encoding (both generate the same error). When I select the Destination and a New Table, HeidiSQL scans all the rows from the file, but then abends at the end while analyzing the data.

Any ideas what I can to get around this problem? I tried several different encodings, including Unicode (does not detect any columns) and Auto Detect (same error as UTF-8).

I have successfully imported a different TSV already (title.ratings.tsv, 1.2 million rows, but only 3 columns). I used ANSI encoding just fine. I would speculate that the source character set is causing an issue. I downloaded the public dataset from IMDB, and the one the causing the issue is title.principals.

1 attachment(s):
sacrophyte's profile image sacrophyte posted 2 years ago Permalink

Update: title.principals has 46 million lines. Is this a buffer overflow issue?

ansgar's profile image ansgar posted 2 years ago Permalink

No, looking at my code, it could be the case the header line has less columns than one of the data lines has. Or the column delimiter is not detected correctly.

I would recommend to create the table by hand before you import the file, and not to use the auto-detection.

sacrophyte's profile image sacrophyte posted 2 years ago Permalink

Actually, I think I found the problem. The tsv provided by IMDB is poorly formatted; the last column/field in title.principals contains a JSON list of characters. If I remove the option to enclose fields with quotes, the import is fine (albeit slow).

Please login to leave a reply, or register at first.