Converting csv data to sqlite in golang

Converting csv data to sqlite in golang

Table of contents

No heading

No headings in the article.

I have been learning Go and SQL for a while now, so I decided to test what I have learnt this past month by getting data from a CSV( Comma Separated Values) file and using that to create an SQLite database.

The data I would be using contains Spotify's top 2018 tracks, let's get coding.

Looking through the CSV we see it has 16 columns, namely :

idnameartistsdanceabiltyenergykeyloudnessmodespeechinessacousticinstrumentalnesslivenessvalencetempoduration_mstime_signature
3ee8Jmje8o58CHK66QrVCSAD!XXXTENTACION0.740.6138.0-4.881.00.1450.2580.003720.1230.47375.023166606.04.0

The goal:

  • Open the csv file
  • Read its content
  • Write the content to the database

To begin we create a folder named spotifyTop2018, run go mod init spotifyTop2018. Create a main.go file, download the csv data, and place the CSV in a folder named data, your folder tree should look like this:

 +-- data
  |   +-- top2018.csv
  +-- go.mod
  +-- main.go

In the main.go file we start with this

package main

import (
    "fmt"
    "os"
    "encoding/csv"
)

func main() {


}

First, we start by reading the contents of the file, the go os and encoding/csv packages have the functions we need to accomplish that, so add those to the imports.

func parseCsv(filepath string) ([][]string, []string) {

    file, err := os.Open(filepath)

    if err != nil {
        fmt.Println(err)
    }

    defer file.Close()

    reader := csv.NewReader(file)

    var records [][]string
    for {

        record, err := reader.Read()
        // if we've reached the end of the file, break
        if err == io.EOF {
            break
        }

        if err != nil {
            fmt.Println(err)
        }

        records = append(records, record)
    }

    return records[1:], records[0]

}

The parseCsv function takes in the file path and returns a 2D slice(array/list) of the data in the CSV (excluding the head ) and the header, and the header shows us the column names of the data. You get a 2D slice since CSVs are made up of rows and columns so the first row and first column would be data[0][0].

package main

import (
    "fmt"
    "os"
    "encoding/csv"
)

func parseCsv(filepath string) ([][]string, []string) {
//..
}

func main() {
    fmt.Println(parseCsv("data/top2018.csv"))
}
// outputs the data contained in the CSV

run the code

go run main.go

you should get the data as a 2D slice, great! we need to store the data we get into a struct.

type Song struct {
    id               string
    name             string
    artists          string
    danceability     float64
    energy           float64
    key              int
    loudness         float64
    mode             int
    speechiness      float64
    acousticness     float64
    instrumentalness float64
    liveness         float64
    valence          float64
    tempo            float64
    duration_ms      int
    time_signature   int
}

The struct fields are the columns of the data, this is to help make the data easy to work with in code. A struct would be equivalent to a row in the data.

Recall the data we get from the CSV are strings, so we would need some helper functions to convert them to int and float64.

func stringToFloat(s string) float64 {
    f, err := strconv.ParseFloat(s, 64)
    if err != nil {
        panic(err)
    }
    return f
}

func stringToInt(s string) int {
    s = strings.TrimSuffix(s, ".0")
    i, err := strconv.Atoi(s)
    if err != nil {
        panic(err)
    }
    return i
}

The string to float function is pretty straightforward, it takes a string and converts it to floats, while the string to int takes the string, trims off the decimal point at the end of the string and converts it to int.

Now it's time to take our data and transform it into a slice of songs.


func parseSong(records [][]string) []Song {
    var songs []Song
    for _, record := range records {
        song := Song{
            id:               record[0],
            name:             record[1],
            artists:          record[2],
            danceability:     stringToFloat(record[3]),
            energy:           stringToFloat(record[4]),
            key:              stringToInt(record[5]),
            loudness:         stringToFloat(record[6]),
            mode:             stringToInt(record[7]),
            speechiness:      stringToFloat(record[8]),
            acousticness:     stringToFloat(record[9]),
            instrumentalness: stringToFloat(record[10]),
            liveness:         stringToFloat(record[11]),
            valence:          stringToFloat(record[12]),
            tempo:            stringToFloat(record[13]),
            duration_ms:      stringToInt(record[14]),
            time_signature:   stringToInt(record[15]),
        }
        songs = append(songs, song)
    }
    return songs
}

You take each row, get the data in each column i.e. from index 0 - 15, use it to construct the song struct append that to a slice of songs, and return the slice.

We are pretty much done with reading the data, now it is time to write it into the database. For that we need the database/sql package and a sqlite3 driver, I am using the github.com/mattn/go-sqlite3 driver.

Your import should look like this

import (
    "database/sql"
    "encoding/csv"
    "fmt"
    "io"
    "os"
    "strconv"
    "strings"
    _ "github.com/mattn/go-sqlite3"
)

Run this command to get the sqlite3 driver

go get github.com/mattn/go-sqlite3

You need mingw64 to build the app with the driver

You can use chocolatey to install it.

install chocolatey

install mingw

After installing the driver, your file tree should like this

 +-- data
  |   +-- top2018.csv
  +-- go.mod
  +-- go.sum
  +-- main.go

With that done, we need to open the SQLite database

func openDatabase() *sql.DB {
    if _, err := os.Stat("songs.db"); err == nil {
        // database exists
        db, err := sql.Open("sqlite3", "./songs.db")
        if err != nil {
            fmt.Println(err)
        }
        return db

     } else {
        // database does not exist so create a table and return the database
        db, err := sql.Open("sqlite3", "./songs.db")
        if err != nil {
            fmt.Println(err)
        }
        createTable(db)
        return db
     }
}

First, we check if the database exists if it doesn't, the sql.Open function would create the database and create a table in the database, if it exists it just opens the database. Let's define the createTable function.

func createTable(db *sql.DB) {

    sqlStmt := `
  CREATE TABLE IF NOT EXISTS songs (
    id TEXT NOT NULL PRIMARY KEY,
    name TEXT,
    artists TEXT,
    danceability FLOAT,
    energy FLOAT,
    key INT,
    loudness FLOAT,
    mode INT,
    speechiness FLOAT,
    acousticness FLOAT,
    instrumentalness FLOAT,
    liveness FLOAT,
    valence FLOAT,
    tempo FLOAT,
    duration_ms INT,
    time_signature INT
  );
  `
    _, err := db.Exec(sqlStmt)
    if err != nil {
        fmt.Println(err)
    }
}

We use the CREATE TABLE statement with the appropriate columns with their datatypes then execute it.

With the table created let's view the data in it.

// takes in the database and returns a song slice
func findAll(db *sql.DB) []Song {
    rows, err := db.Query("SELECT * FROM songs")
    if err != nil {
        fmt.Println(err)
    }
    defer rows.Close()
    var songs []Song
    for rows.Next() {
        var song Song
    // assign the song struct the data from each column in the row
        err := rows.Scan(&song.id, &song.name, &song.artists, &song.danceability, &song.energy, &song.key, &song.loudness, &song.mode, &song.speechiness, &song.acousticness, &song.instrumentalness, &song.liveness, &song.valence, &song.tempo, &song.duration_ms, &song.time_signature)
        if err != nil {
            fmt.Println(err)
        }
        songs = append(songs, song)
    }
    err = rows.Err()
    if err != nil {
        fmt.Println(err)
    }
    return songs
}

The SQL query SELECT * FROM songs gives us all the data stored in the songs table as rows, we then loop through it appending each row into a Song slice, when the loop is done we return the Song slice.

Using the openDatabase and findAll function in the main function

func main() {

    db := openDatabase()
    defer db.Close()

    // query data
    songs := findAll(db)

    for _, song := range songs {
        fmt.Println(song)
    }

}

After running it you should get the list of songs in your terminal, but it is currently empty so nothing shows up. You should also notice the songs.db file that has been created in the folder, Your file tree should now be.

 +-- data
  |   +-- top2018.csv
  +-- go.mod
  +-- go.sum
  +-- main.go
  +-- songs.db

Great! now we need to insert the data into our empty database.

func insertData(db *sql.DB, songs []Song) {
      for _, song := range songs {
    // the ? allows us to send the values during execution
        sqlStmt := `
    INSERT INTO songs(id, name, artists, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms, time_signature) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    `

        _, err := db.Exec(sqlStmt, song.id, song.name, song.artists, song.danceability, song.energy, song.key, song.loudness, song.mode, song.speechiness, song.acousticness, song.instrumentalness, song.liveness, song.valence, song.tempo, song.duration_ms, song.time_signature)
        if err != nil {
            fmt.Println(err)
        }
    }
}

To insert the data we pass the database and the data into the insertData function , we loop through each song in the slice and use the INSERT INTO SQL query to insert each song into the database.

func main() {

    db := openDatabase()
    defer db.Close()

    data, _ := parseCsv("data/top2018.csv")
    insertData(db, parseSong(data))

    // query data
    songs := findAll(db)

    for _, song := range songs {
        fmt.Println(song)
    }

}

After running we get all the songs in the database🥳🎉 We are pretty much done.

Let's extend the code by implementing a function that searches for songs made by the artist we query for.

func findArtist(db *sql.DB, artist string) []Song {

    sqlStmt := `
    SELECT * FROM songs WHERE artists LIKE ?
    `
    rows, err := db.Query(sqlStmt, "%"+artist+"%")
    if err != nil {
        fmt.Println(err)
    }
    defer rows.Close()
    var songs []Song
    for rows.Next() {
        var song Song
        err := rows.Scan(&song.id, &song.name, &song.artists, &song.danceability, &song.energy, &song.key, &song.loudness, &song.mode, &song.speechiness, &song.acousticness, &song.instrumentalness, &song.liveness, &song.valence, &song.tempo, &song.duration_ms, &song.time_signature)
        if err != nil {
            fmt.Println(err)
        }
        songs = append(songs, song)
    }
    err = rows.Err()
    if err != nil {
        fmt.Println(err)
    }
    return songs
}

Running it

func main() {

    db := openDatabase()
    defer db.Close()
    // query data
    songs := findArtist(db, "Drake")

    for _, song := range songs {
        fmt.Println(song)
    }
}

You would get all songs released by drake in 2018 that are in the database, He had four.

To get your feet wet you can write functions that analyzes the data, e.g. the correlation between the duration, energy and so on.

Thanks for reading.

source code