Extreme Import

Rob WeaverNov 9, 2014

Once in a while, I run across a situation where I need to import data from a spreadsheet into a database.

If the data is to update a single table, this is pretty straight forward. On the other hand, if the data is modeled as a set of tables it gets a bit more messy.

In this case, the data was a customer list with information like address, contacts and phone numbers. The database models this as a series of tables that are associated with a company.

What all this means, is that I need to import data into multiple tables, associating each row to be related appropriately.

So I need to create a row in the customer table (which holds the customer name), an address for the customer, and a phone number, as well as an association with the company.

The original spreadsheet is a simple list with name, address and phone for all of the customers for this company.

So the first step is to create a spreadsheet and use the original data to model the tables I’m going to have to update on new tabs:I create an empty tab for the customers table, that contains columns for each of the fields in the customers table:

On the customers tab, I update the cells that correspond to information that is on the “Data” tab that will need to flow into the customers table (in this case abbreviation, name and account_number). References to the other tab are created in the spreadsheet by entering the name of the tab and the corresponding cell as shown below:

For the ID column, I could just start by looking at the table, and adding incremental values, but to be safe, it’s a good idea to use a query in order to always add a new value. That way the inserts can be run even while the application is running.

To do that, we need to write a simple SQL query to get the last ID value and increment it by one:

SELECT 
    max(c2.id) + 1
FROM
    customers c2;

Since this query doesn’t require any calculations, it is simply added as the value for the first column of the customers tab (a similar query will be needed on the other tabs):

The created_by and modified_by columns are supposed to indicate who made this change, so this can be hard-coded with the ID of the user we want to associate with the change. The corresponding dates can be created with the current date and time (for MySQL this is the “Now()” function.

Same process is completed for the other tables. Once I have a tab with all of the tables and the basic data populated, I need to add the relationships. While not a perfect solution, I use a query based approach to relate these items based on the data.

Starting with the customers table, I need to populate the customer_address_id with the id from the addresses table. Since I have populated the “area_name” from the customer name, I can write a query that uses the customer name to get the id.

This involves creating a concatenated string that references the name to find that id:

=CONCATENATE("(select a.id from addresses a where a.area_name = '",H2,"' LIMIT 1)")

This results in a SQL statement like:

select 
    a.id
from
    addresses a
where
    a.area_name = 'AccuWeaver LLC'
LIMIT 1

The same approach can be used to figure out the value for the company_phone_id, but in this case the value comes from the Phones tab:

=CONCATENATE("(select p4.id from phones p4 where p4.phone = '",Phone!B2,"' LIMIT 1)")

The query ends up being a select that looks for the phone number:

select 
    p4.id
from
    phones p4
where
    p4.phone = '(925) 456-4762'
LIMIT 1

The “LIMIT 1” in both of these queries ensures that this will work even if there are multiple matches (we’ll only link to the first one found).

Finally, I go back through all of the sheets and make sure all of the columns are populated with values, and have the value NULL where I don’t want it to end up with a value (but only on fields that allow NULLs).

Once this is done, I export all of the rows to CSV files (I end up with one file or each table). In Google Spreadsheet, this is as simple as using File/Download as/CSV:

Once I have all of the CSV files, the next step is to build the SQL for the inserts, which is fairly simple to do using MySQL Workbench. Using a model that I’ve reverse engineered from the database, I go to each table in turn, and import the CSV for that table.

In the Model, double click the table, and then go to the inserts tab:

Click on the little icon that looks like a camera in front of a spreadsheet to start the import:

Choose the file, and click “Open” and I see the values from my spreadsheet:

I delete the header row, since it’s not actual data. To delete a row, use the little icon that looks like a spreadsheet with a minus symbol:

And click the accept changes icon to save the data in your model:

Then I export the data to a SQL file by choosing the “Forward Engineer as SQL Script”:

I make sure the “Omit Schema Qualifier” and “Generate INSERT statements” are both checked on the first step of the wizard, and then click “Continue”:

On the next step of the wizard, I make sure only the tables I care about are selected. To do so, I click on the “Show Filter”:

In this case I just exclude the companies table, since I’m not actually inserting into that one:

Clicking continue, the generated SQL script shows up, I scroll down to make sure the insert statements are included (table generation SQL will be deleted later):

The next step is to make some edits that are essential to the SQL actually doing what I need it to. I open the SQL script in NetBeans (my editor of choice) and delete the table creation steps:

The first edit I make is to fix some of the values to be what I wanted them to be instead of what Workbench generated. The NULL and Now() values get placed inside quotes, so I just replace them using a global find and replace. I’ve also had to do some other fixes such as escaping a single quote (for names like “O’Brien”):

Once the edits are done, I open a connection to the database in MySQL Workbench and open the SQL script to run it (against my test database). This will show you any errors in the SQL that you might have to go back and fix. Generally I try to make the fixes in the source spreadsheet, but sometimes I have to do the edits in NetBeans as I did above.

Executing the script inserts the data, and builds the relationships. No errors indicates that the data was installed to the database as expected:

And now the customers are in the database ready for my application to reference.

Simple 🙂

Hi, I’m Rob Weaver

All My Articles

A Scary Expiration Episode with AWS and Plesk Free

Recently, I encountered a disconcerting message on the “free” Plesk...

Rob WeaverFeb 10, 2025

Delighted To Resolve Unexpected Consequences In Go JSON Marshal

In this post recall a recent problem caused by an incomplete understanding of the Go language, and how I fixed it with some help from Github CoPilot. I wa delighted to resolve unexpected consequences in Go.

Rob WeaverFeb 10, 2024

Happy Scavenger Hunt, Or How To Attend A Concert delayed

In this post I describe the happy scavenger hunt for tickets to see a Covid-19 delayed concert with James Taylor and Jackson Browne.
Note: This post was actually written in 2021, I just found it unpublished in my site, so added a few Firefly images and pushed Publish ...

Rob WeaverFeb 10, 2024

Hi, I’m Rob Weaver

Related Posts

A Scary Expiration Episode with AWS and Plesk Free

Delighted To Resolve Unexpected Consequences In Go JSON Marshal

Happy Scavenger Hunt, Or How To Attend A Concert delayed