Table of contents
No headings in the article.
I have been learning Go and SQL for a while now, so I decided to test what I have learnt this past month by getting data from a CSV( Comma Separated Values) file and using that to create an SQLite database.
The data I would be using contains Spotify's top 2018 tracks, let's get coding.
Looking through the CSV we see it has 16 columns, namely :
id | name | artists | danceabilty | energy | key | loudness | mode | speechiness | acoustic | instrumentalness | liveness | valence | tempo | duration_ms | time_signature |
3ee8Jmje8o58CHK66QrVC | SAD! | XXXTENTACION | 0.74 | 0.613 | 8.0 | -4.88 | 1.0 | 0.145 | 0.258 | 0.00372 | 0.123 | 0.473 | 75.023 | 166606.0 | 4.0 |
The goal:
- Open the csv file
- Read its content
- Write the content to the database
To begin we create a folder named spotifyTop2018, run go mod init spotifyTop2018
. Create a main.go
file, download the csv data, and place the CSV in a folder named data, your folder tree should look like this:
+-- data
| +-- top2018.csv
+-- go.mod
+-- main.go
In the main.go
file we start with this
package main
import (
"fmt"
"os"
"encoding/csv"
)
func main() {
}
First, we start by reading the contents of the file, the go os
and encoding/csv
packages have the functions we need to accomplish that, so add those to the imports.
func parseCsv(filepath string) ([][]string, []string) {
file, err := os.Open(filepath)
if err != nil {
fmt.Println(err)
}
defer file.Close()
reader := csv.NewReader(file)
var records [][]string
for {
record, err := reader.Read()
// if we've reached the end of the file, break
if err == io.EOF {
break
}
if err != nil {
fmt.Println(err)
}
records = append(records, record)
}
return records[1:], records[0]
}
The parseCsv function takes in the file path and returns a 2D slice(array/list) of the data in the CSV (excluding the head ) and the header, and the header shows us the column names of the data. You get a 2D slice since CSVs are made up of rows and columns so the first row and first column would be data[0][0]
.
package main
import (
"fmt"
"os"
"encoding/csv"
)
func parseCsv(filepath string) ([][]string, []string) {
//..
}
func main() {
fmt.Println(parseCsv("data/top2018.csv"))
}
// outputs the data contained in the CSV
run the code
go run main.go
you should get the data as a 2D slice, great! we need to store the data we get into a struct.
type Song struct {
id string
name string
artists string
danceability float64
energy float64
key int
loudness float64
mode int
speechiness float64
acousticness float64
instrumentalness float64
liveness float64
valence float64
tempo float64
duration_ms int
time_signature int
}
The struct fields are the columns of the data, this is to help make the data easy to work with in code. A struct would be equivalent to a row in the data.
Recall the data we get from the CSV are strings, so we would need some helper functions to convert them to int
and float64
.
func stringToFloat(s string) float64 {
f, err := strconv.ParseFloat(s, 64)
if err != nil {
panic(err)
}
return f
}
func stringToInt(s string) int {
s = strings.TrimSuffix(s, ".0")
i, err := strconv.Atoi(s)
if err != nil {
panic(err)
}
return i
}
The string to float function is pretty straightforward, it takes a string and converts it to floats, while the string to int takes the string, trims off the decimal point at the end of the string and converts it to int.
Now it's time to take our data and transform it into a slice of songs.
func parseSong(records [][]string) []Song {
var songs []Song
for _, record := range records {
song := Song{
id: record[0],
name: record[1],
artists: record[2],
danceability: stringToFloat(record[3]),
energy: stringToFloat(record[4]),
key: stringToInt(record[5]),
loudness: stringToFloat(record[6]),
mode: stringToInt(record[7]),
speechiness: stringToFloat(record[8]),
acousticness: stringToFloat(record[9]),
instrumentalness: stringToFloat(record[10]),
liveness: stringToFloat(record[11]),
valence: stringToFloat(record[12]),
tempo: stringToFloat(record[13]),
duration_ms: stringToInt(record[14]),
time_signature: stringToInt(record[15]),
}
songs = append(songs, song)
}
return songs
}
You take each row, get the data in each column i.e. from index 0 - 15, use it to construct the song struct append that to a slice of songs, and return the slice.
We are pretty much done with reading the data, now it is time to write it into the database.
For that we need the database/sql
package and a sqlite3 driver, I am using the github.com/mattn/go-sqlite3
driver.
Your import should look like this
import (
"database/sql"
"encoding/csv"
"fmt"
"io"
"os"
"strconv"
"strings"
_ "github.com/mattn/go-sqlite3"
)
Run this command to get the sqlite3 driver
go get github.com/mattn/go-sqlite3
You need mingw64 to build the app with the driver
You can use chocolatey to install it.
After installing the driver, your file tree should like this
+-- data
| +-- top2018.csv
+-- go.mod
+-- go.sum
+-- main.go
With that done, we need to open the SQLite database
func openDatabase() *sql.DB {
if _, err := os.Stat("songs.db"); err == nil {
// database exists
db, err := sql.Open("sqlite3", "./songs.db")
if err != nil {
fmt.Println(err)
}
return db
} else {
// database does not exist so create a table and return the database
db, err := sql.Open("sqlite3", "./songs.db")
if err != nil {
fmt.Println(err)
}
createTable(db)
return db
}
}
First, we check if the database exists if it doesn't, the sql.Open
function would create the database and create a table in the database, if it exists it just opens the database. Let's define the createTable
function.
func createTable(db *sql.DB) {
sqlStmt := `
CREATE TABLE IF NOT EXISTS songs (
id TEXT NOT NULL PRIMARY KEY,
name TEXT,
artists TEXT,
danceability FLOAT,
energy FLOAT,
key INT,
loudness FLOAT,
mode INT,
speechiness FLOAT,
acousticness FLOAT,
instrumentalness FLOAT,
liveness FLOAT,
valence FLOAT,
tempo FLOAT,
duration_ms INT,
time_signature INT
);
`
_, err := db.Exec(sqlStmt)
if err != nil {
fmt.Println(err)
}
}
We use the CREATE TABLE
statement with the appropriate columns with their datatypes then execute it.
With the table created let's view the data in it.
// takes in the database and returns a song slice
func findAll(db *sql.DB) []Song {
rows, err := db.Query("SELECT * FROM songs")
if err != nil {
fmt.Println(err)
}
defer rows.Close()
var songs []Song
for rows.Next() {
var song Song
// assign the song struct the data from each column in the row
err := rows.Scan(&song.id, &song.name, &song.artists, &song.danceability, &song.energy, &song.key, &song.loudness, &song.mode, &song.speechiness, &song.acousticness, &song.instrumentalness, &song.liveness, &song.valence, &song.tempo, &song.duration_ms, &song.time_signature)
if err != nil {
fmt.Println(err)
}
songs = append(songs, song)
}
err = rows.Err()
if err != nil {
fmt.Println(err)
}
return songs
}
The SQL query SELECT * FROM songs
gives us all the data stored in the songs table as rows, we then loop through it appending each row into a Song slice, when the loop is done we return the Song slice.
Using the openDatabase
and findAll
function in the main function
func main() {
db := openDatabase()
defer db.Close()
// query data
songs := findAll(db)
for _, song := range songs {
fmt.Println(song)
}
}
After running it you should get the list of songs in your terminal, but it is currently empty so nothing shows up. You should also notice the songs.db
file that has been created in the folder, Your file tree should now be.
+-- data
| +-- top2018.csv
+-- go.mod
+-- go.sum
+-- main.go
+-- songs.db
Great! now we need to insert the data into our empty database.
func insertData(db *sql.DB, songs []Song) {
for _, song := range songs {
// the ? allows us to send the values during execution
sqlStmt := `
INSERT INTO songs(id, name, artists, danceability, energy, key, loudness, mode, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration_ms, time_signature) VALUES(?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
`
_, err := db.Exec(sqlStmt, song.id, song.name, song.artists, song.danceability, song.energy, song.key, song.loudness, song.mode, song.speechiness, song.acousticness, song.instrumentalness, song.liveness, song.valence, song.tempo, song.duration_ms, song.time_signature)
if err != nil {
fmt.Println(err)
}
}
}
To insert the data we pass the database and the data into the insertData
function , we loop through each song in the slice and use the INSERT INTO
SQL query to insert each song into the database.
func main() {
db := openDatabase()
defer db.Close()
data, _ := parseCsv("data/top2018.csv")
insertData(db, parseSong(data))
// query data
songs := findAll(db)
for _, song := range songs {
fmt.Println(song)
}
}
After running we get all the songs in the database🥳🎉 We are pretty much done.
Let's extend the code by implementing a function that searches for songs made by the artist we query for.
func findArtist(db *sql.DB, artist string) []Song {
sqlStmt := `
SELECT * FROM songs WHERE artists LIKE ?
`
rows, err := db.Query(sqlStmt, "%"+artist+"%")
if err != nil {
fmt.Println(err)
}
defer rows.Close()
var songs []Song
for rows.Next() {
var song Song
err := rows.Scan(&song.id, &song.name, &song.artists, &song.danceability, &song.energy, &song.key, &song.loudness, &song.mode, &song.speechiness, &song.acousticness, &song.instrumentalness, &song.liveness, &song.valence, &song.tempo, &song.duration_ms, &song.time_signature)
if err != nil {
fmt.Println(err)
}
songs = append(songs, song)
}
err = rows.Err()
if err != nil {
fmt.Println(err)
}
return songs
}
Running it
func main() {
db := openDatabase()
defer db.Close()
// query data
songs := findArtist(db, "Drake")
for _, song := range songs {
fmt.Println(song)
}
}
You would get all songs released by drake in 2018 that are in the database, He had four.
To get your feet wet you can write functions that analyzes the data, e.g. the correlation between the duration, energy and so on.
Thanks for reading.