Import CSV Data Into SQL Databases From The Terminal

December 25, 2021 — #Datascience #CSV #SQL

The Data Scientist and several professionals in the same space work with data from several sources. It does not take long for one to face the challenge of importing data from a CSV file into an SQL database. I am a great fan of the terminal so I want to show you how to do it from the terminal.

Prerequisites

I am going to use a Python Package for this. I suggest that you work in a virtual environment. The Real Python's article on virtual environments is a good resource if need be. I assume you are using Python3.

Install the needed packages
```
#!/bin/bash
pip install PyMySQL psycopg2 csvkit
```
PyMySQL: This is a pure Python MySQL driver which helps us to connect to a MySQL database using Python.

psycopg2: This the most popular Postgresql database adapter for Python.

csvkit: It is a suite of command-line tools for converting to and working with CSV. It provides the csvsql module we are going to use to import the data from our CSV file.
Download the Customer Info Data For the sake of this tutorial, I have uploaded a CSV file which contains some customer information you can use to test this tool I am introducing. Click here to download
Create a new MySQL or Postgresql database I named mine test. The data will be imported into a table I have named customer_info.

Importing Your Data

The Quick Approach

This approach automatically generates the Schema from the CSV data, creates the table and inserts the data into the table.

db_user: This is the database user with access to the test database you created.

password: This is the password of the user.

For MySQL:

#!/bin/bash
csvsql --db 'mysql+pymysql://db_user:password@localhost/test'  --tables customer_info --insert customer_info.csv

For Postgresql:

#!/bin/bash
csvsql --db 'postgresql:///test' --tables customer_info  --insert customer_info.csv

The Lengthier (and Safer) Approach

Generate the query for creating the customer_info table from your terminal.

For MySQL:

#!/bin/bash
csvsql -i mysql customer_info.csv

Expected Result:

CREATE TABLE customer_info (
  `contractId` DECIMAL(38, 0) NOT NULL,
  `AccountNumber` DECIMAL(38, 0) NOT NULL,
  `CreatedAt` TIMESTAMP NULL,
  `Amount` DECIMAL(38, 0) NOT NULL,
  `Count` DECIMAL(38, 0) NOT NULL,
  `Duration` DECIMAL(38, 0) NOT NULL
 );

The result above is the query we are going to use to create the customer_info table in the test database. You can edit the generated schema to suit your need. I will leave it as it is in this tutorial.

For Postgresql:

#!/bin/bash
csvsql -i postgresql customer_info.csv

Expected Result:

CREATE TABLE customer_info (
  "contractId" DECIMAL NOT NULL,
  "AccountNumber" DECIMAL NOT NULL,
  "CreatedAt" TIMESTAMP WITHOUT TIME ZONE,
  "Amount" DECIMAL NOT NULL,
  "Count" DECIMAL NOT NULL,
  "Duration" DECIMAL NOT NULL
);

NOTE THE DIFFERENCE IN HOW EACH DATABASE EXPECTS THE QUERY TO BE. MySQL is wrapped in single quotes but Postgresql is wrapped in double-quotes.

Import your data into the table

In MYSQL:

#!/bin/bash
csvsql --db 'mysql+pymysql://db_user:password@host/test' --no-create  --insert customer_info.csv

In Postgresql

#!/bin/bash
csvsql --db 'postgresql:///test' --no-create  --insert customer_info.csv

CONCLUSION

We just learnt how to import data from a CSV file into tables in our MySQL and Postgresql databases with the csvkit package.You can take the direct approach which automatically creates your table and inserts the data.The longer and in my opinion safer approach is to generate the query that creates the table, inspect and edit the generated schema and import your data into the table.

CSVKIT is a powerful tool every Data Scientist should have in his or her toolbox.