Time Series analysis for Facebook message volume
Introduction to the data set
The data is the amount of Facebook messages a user (me) sent and received between 2011 and 2016. The data was exported from Facebook, converted from HTML to JSON and subsequently read into R. For more details on how this was done, please consult here.
The original data source contains 413,101 timestamps and messages between 2008 and 2016. The volume of messages is very low between 2008-2011, and highly atypical to the rest of the series, so it was not considered in the sample. An example is shown below:
thread | sender | date | message |
---|---|---|---|
Boaz Sobrado-Lolita Honich | Boaz Sobrado | 2012-08-13 03:15:00 | kik nem hagynak? :D |
In order to create a suitable format for time series analysis, the padr package was used in order to fill in missing observations with null values and aggregate all timestamps into weekly measurements. Message volume is defined as the quantity of messages sent and received by the user.
rawTs<- df %>%mutate(date = lubridate::with_tz(date,"CET"))%>% # change the time zone
thicken( interval = "month") %>% #creates a column with date_month
group_by(date_week) %>% # groups the date hour observations
summarise(amount = n()) %>% # counts the amount of messages sent or recieved
pad %>% # creates missing values for intervals where observations are missing
fill_by_value(amount)%>% # fill missing values with 0
as_tbl_time(date_month) %>% # creating a new object class to ease time filtering
time_filter(2011 ~ 2016) # filter to 2011-2016
Now the data is in a suitable format to create time series objects.A train set was created for the first three years, with a test set created for the final year. Both were adjusted to account for calendar effects.
totalTs <- ts(data = rawTs[,2],
start = c(2011,1),# start
end = c(2016,3), # end with the last month omitted
frequency =12 #months in a year
)
totalTs.adj <- totalTs/monthdays(totalTs)
train.ts <- window(totalTs.adj,
end = c(2015,2))
test.ts <- window(totalTs.adj,start = c(2015,3),end = c(2016,3))