Once in a while, I run across a situation where I need to import data from a spreadsheet into a database.
If the data is to update a single table, this is pretty straight forward. On the other hand, if the data is modeled as a set of tables it gets a bit more messy.
In this case, the data was a customer list with information like address, contacts and phone numbers. The database models this as a series of tables that are associated with a company.
What all this means, is that I need to import data into multiple tables, associating each row to be related appropriately.
So I need to create a row in the customer table (which holds the customer name), an address for the customer, and a phone number, as well as an association with the company.
The original spreadsheet is a simple list with name, address and phone for all of the customers for this company.
So the first step is to create a spreadsheet and use the original data to model the tables I’m going to have to update on new tabs:I create an empty tab for the customers table, that contains columns for each of the fields in the customers table:
On the customers tab, I update the cells that correspond to information that is on the “Data” tab that will need to flow into the customers table (in this case abbreviation, name and account_number). References to the other tab are created in the spreadsheet by entering the name of the tab and the corresponding cell as shown below:
For the ID column, I could just start by looking at the table, and adding incremental values, but to be safe, it’s a good idea to use a query in order to always add a new value. That way the inserts can be run even while the application is running.
To do that, we need to write a simple SQL query to get the last ID value and increment it by one:
SELECT max(c2.id) + 1 FROM customers c2;
Since this query doesn’t require any calculations, it is simply added as the value for the first column of the customers tab (a similar query will be needed on the other tabs):
The created_by and modified_by columns are supposed to indicate who made this change, so this can be hard-coded with the ID of the user we want to associate with the change. The corresponding dates can be created with the current date and time (for MySQL this is the “Now()” function.
Same process is completed for the other tables. Once I have a tab with all of the tables and the basic data populated, I need to add the relationships. While not a perfect solution, I use a query based approach to relate these items based on the data.
Starting with the customers table, I need to populate the customer_address_id with the id from the addresses table. Since I have populated the “area_name” from the customer name, I can write a query that uses the customer name to get the id.
This involves creating a concatenated string that references the name to find that id:
=CONCATENATE("(select a.id from addresses a where a.area_name = '",H2,"' LIMIT 1)")
This results in a SQL statement like:
select a.id from addresses a where a.area_name = 'AccuWeaver LLC' LIMIT 1
The same approach can be used to figure out the value for the company_phone_id, but in this case the value comes from the Phones tab:
=CONCATENATE("(select p4.id from phones p4 where p4.phone = '",Phone!B2,"' LIMIT 1)")
The query ends up being a select that looks for the phone number:
select p4.id from phones p4 where p4.phone = '(925) 456-4762' LIMIT 1
The “LIMIT 1” in both of these queries ensures that this will work even if there are multiple matches (we’ll only link to the first one found).
Finally, I go back through all of the sheets and make sure all of the columns are populated with values, and have the value NULL where I don’t want it to end up with a value (but only on fields that allow NULLs).
Once I have all of the CSV files, the next step is to build the SQL for the inserts, which is fairly simple to do using MySQL Workbench. Using a model that I’ve reverse engineered from the database, I go to each table in turn, and import the CSV for that table.
In the Model, double click the table, and then go to the inserts tab:
Choose the file, and click “Open” and I see the values from my spreadsheet:
And click the accept changes icon to save the data in your model:
Then I export the data to a SQL file by choosing the “Forward Engineer as SQL Script”:
Clicking continue, the generated SQL script shows up, I scroll down to make sure the insert statements are included (table generation SQL will be deleted later):
The first edit I make is to fix some of the values to be what I wanted them to be instead of what Workbench generated. The NULL and Now() values get placed inside quotes, so I just replace them using a global find and replace. I’ve also had to do some other fixes such as escaping a single quote (for names like “O’Brien”):
Once the edits are done, I open a connection to the database in MySQL Workbench and open the SQL script to run it (against my test database). This will show you any errors in the SQL that you might have to go back and fix. Generally I try to make the fixes in the source spreadsheet, but sometimes I have to do the edits in NetBeans as I did above.
Executing the script inserts the data, and builds the relationships. No errors indicates that the data was installed to the database as expected:
And now the customers are in the database ready for my application to reference.