Import CSV with field name in first row

1 posts

[expired user #3972] posted 15 years ago in Import/Export

I have an export from a survey SPSS file in CSV format. It's unusual because it has about 2,500 fields. I don't want to manipulate it to be smaller, and Access and SQL Server clearly have limitations (256 & 1024 fields respectively). I'll have very simple queries to run and only a max of 20k rows; and no time issues.

In Access and SQL Server (if the columns weren't a problem), I can import the CSV file and the the table fields are created for me by reference to the first row..

Is there an equivalent function in HeidiSQL that will read the first row and create the table for me?.

(Yes, I know that using a database like this is a crime to the puritans, but I just need to run simple SQL statements like this).[/i]

10153 posts

ansgar posted 15 years ago

There is no auto creation for CSV files based on a first line. But you can create a table manually and then import the file.

Just tested to create a table with 2500 columns and failed: "Too many columns". The maximum number of columns in MySQL seems to be limited to about 2350. Additionally the maximum size of one row is limited so I failed to create columns as varchar with a length of more than 8. At the end I have this crazy table and would be able to import your CSV file using "Import" > "Import CSV file":

create table manycolumns(
col0001 VARCHAR(8),
col0002 VARCHAR(8),
col0003 VARCHAR(8),
col0004 VARCHAR(8),
col0005 VARCHAR(8),
col0006 VARCHAR(8),
col0007 VARCHAR(8),
col0008 VARCHAR(8),
col0009 VARCHAR(8),
col0010 VARCHAR(8),
...
)

10153 posts

ansgar posted 15 years ago

Sorry I wasn't able to post the complete CREATE statement as it is too large for PHPBBs database column. If you need it contact me via www.anse.de so I can mail it to you.

127 posts

[expired user #1125] posted 15 years ago

I like the general idea. But Microsoft's de-facto standard of optionally putting header names in the first line of CSV files is pretty broken. There's no good way to automatically detect if it's there, is there?

Suggestions (and patches!) welcome.

There's also a W3C standard for doing the same. The standard basically says to put "#Fields:" in front of the field names, so that the receiving system can see that these are field names. Example:

#Version: 1.0
#Date: 12-Jan-1996 00:00:00
#Fields: time cs-method cs-uri
00:34:23 GET /foo/bar.html
12:21:16 GET /foo/bar.html
12:45:52 GET /foo/bar.html
12:57:34 GET /foo/bar.html

Defined here: http://www.w3.org/TR/WD-logfile.html

If HeidiSQL supported that, then you could take web log files and shove them at HeidiSQL, and HeidiSQL would figure out how many columns are needed and what their names should be. It could even know that particular column names means particular data types, fx "cs-uri" is a varchar.

Users that regularly import CSV files would need to figure out a way to get the #Fields: header in their CSV files prior to handing them to HeidiSQL.

With regards to maximum # of columns: I think it's not so much a question of the number of columns, but rather of their combined width. If at some point MySQLd stuffs result data into a row buffer in memory, and it's a pretty dumb buffer, I imagine each row takes up a static amount of memory regardless of how much data is made up by the row's field values. I think there's restraints on how MySQLd will shape this buffer, and I think one of the restraints is that a row can take up at most 64KiB.

BLOBs would have to be stored outside such a buffer, so a workaround could be to use TEXT columns. With a pointer being 32-bit on most current systems that would leave you with a maximum of 16.384 TEXT columns.

2 posts

[expired user #3981] posted 15 years ago

I often use biterscripting for processing CSV files. It has no limits on number of columns/fields or number of rows.

It seems to be pretty simple to learn - I am not a programmer - but was able to pick it up after seeing a few sample codes on the internet.

I will provide free sample code for you here.

You said:

I have an export from a survey SPSS file in CSV format. It's unusual because it has about 2,500 fields. I don't want to manipulate it to be smaller, and Access and SQL Server clearly have limitations (256 & 1024 fields respectively). I'll have very simple queries to run and only a max of 20k rows; and no time issues.

I will assume

- The CSV (Comma Separated Values) file is stored at C:/SPSS.csv .
- You want to get the totals of column 2437, which is a real number. (This, of course, is arbitrary - just wanted to show you how things can be done.

Here is the script.

# Set field separator to comma
set $wsep = ","
# Store total here
var real total
# Read data in
var str data ; cat "C:/SPSS.csv" > $data
# Process rows one at a time
while ($data <> "")
do
# Get next row
var str row ; lex -e "1" $data > $row
# Extract column 2437, convert to real.
var str column2437 ; wex "2437" $row > $column2437
var real value ; set $value = makereal(str($column2437))
# Add to total
set $total = $total + $value
done
# Show total
echo $total

Download biterscripting at http://www.biterscripting.com . I thinnk it is free. They have some other sample scripts related to CSV and data files.

Jenni

( I probably posted this reply earlier in another wrong place. So, posting it again here in response to the right question.)