In today’s tutorial I give you a quick introduction to data.table and I show you how you can filter rows, select columns and get some basic statistics by group.

You can find the data set here: https://www.kaggle.com/russellyates88/suicide-rates-overview-1985-to-2016

Since I used a data set about suicide it is important to point out that you should talk about suicidal thoughts if you have them. You can find a list of suicide hotlines for a lot of countries here: http://ibpf.org/resource/list-international-suicide-hotlines
If your country isn’t listed here please just google “suicide hotline” and your country

The video:

The code:

setwd("PATHTOYOURWORKINDIRECTORY")

library(data.table)

data <- data.table::fread("suicide-rates-overview-1985-to-2016/master.csv", check.names = TRUE)

str(data)

#filter rows
data[country == "Austria"]

#select columns
data[country == "Austria", .(Number_of_Suicides = suicides_no, Pop = population)]

#transform columns
data[country == "Austria", .(sum_of_suicides = sum(suicides_no), Pop = sum(population))]

#sort by
data[country == "Austria", .(sum_of_suicides = sum(suicides_no)), by = c("year", "sex")]