---
title: "Bayesian modeling with brms"
output: html_document
date: "2023-03-26"
---
  
Purpose of this exercise is to replicate the earlier frequentistic linear regression analysis with Bayesian approach. Our goal is to see if "competence" (dependent variable) is predicted by two independent variables ("autonomy" and "score" of math exam).  
  
Update 31.03.2023: Code developed plus "Exercises" added.  
  
  
```{r setup, include=FALSE}

knitr::opts_chunk$set(echo = TRUE)

# set custom functions for displaying 0-3 decimals in inline text

dec0 <- function(x){
        result <- paste(sprintf("%1.0f", x), collapse = ", ")
        return(result)
}

dec1 <- function(x){
        result <- paste(sprintf("%1.1f", x), collapse = ", ")
        return(result)
}

dec2 <- function(x){
        result <- paste(sprintf("%1.2f", x), collapse = ", ")
        return(result)
}

dec3 <- function(x){
        result <- paste(sprintf("%1.3f", x), collapse = ", ")
        return(result)
}

```
  
  
Read in the "dfttu_wide_cross_complete_final2.txt" data and center predictor variables.  
  
```{r prepare data, message = F, warning = F}

# ----------------
# read in the data
dfttu_wide <- read.table("dfttu_wide_cross_complete_final2.txt", header=TRUE, sep= "\t")

# ---------------------
# center and scale IV's

library(magrittr) # for the %>% operator
library(dplyr) # sym
varlist <- c("autonomy","score")

for (varname in varlist) {
  cgm <- paste0(varname,".cgm") # center grand mean
  cgmsd <- paste0(varname,".cgmsd") # cgm scaled (M = 0, SD = 1)
  dfttu_wide <- dfttu_wide %>% # pipe needs library(magrittr)
  # Centering Grand Mean (CGM)
    dplyr::mutate(
      !!sym(cgm) := !!sym(varname) - mean(!!sym(varname), na.rm = T),
      !!sym(cgmsd) := (!!sym(varname) - mean(!!sym(varname), na.rm = T)) / sd(!!sym(varname), na.rm = T)
      # !!sym(cgmsd) := scale(!!sym(varname), center = T, scale = T) # optional way
    )
}

dput(names(dfttu_wide))
# c("id", "group", "gender", "autonomy", "competence", "deep", 
# "organised", "score", "score3class_chr", "score3class", "autonomy_chr", 
# "gender_chr", "group_chr", "index", "autonomy.cgm", "autonomy.cgmsd", 
# "score.cgm", "score.cgmsd")

# now we have centered (.cgm) and centered+scaled (.cgmsd) versions of autonomy and score
# scaling is always a good option if the value range of predictors is very different
# in this case
range(dfttu_wide$autonomy)
# [1] 1.25 4.00
range(dfttu_wide$score)
# [1]   0 100

```

  
# Bayesian regression analysis

## RQ: Do autonomy and mathematics achievement predict higher education students' competence development?  
  
Research question was addressed with a predictive model where the change in the dependent variable ("y" or "DV": "Competence") values were predicted by two independent variables ("x" or "IV": "Autonomy", "Score").  
  
Following analyses were conducted using default priors with "Bayesian Regression Models using ’Stan’" (*brms*,  Bürkner, 2017) package in *R* (R Core Team, 2022). *brms* is based on *Stan* (Stan Development Team, 2022) that creates representative samples of parameter values from a posterior distribution for complex hierarchical models using Hamiltonian Monte Carlo (HMC). Stan is named after Stanislaw Ulam (1909–1984), who was a pioneer of Monte Carlo methods.  
  
  
```{r set up brms, message = F, warning = F}

library(brms)
library(bayesplot)
library(rstan)

# analyze data with NA's (excluded from analysis)
rstan_options(auto_write = TRUE)

# detect number of cores in YOUR computer
n_cores <- parallel::detectCores()

```
  
  
We'll start by defining the distribution family and link function for the dependent variable (DV) "competence".  
  
Family: defines the distribution of the DV (competence)  
  
Link function: connects the predictors (IV's) with the expected value of the response variable (DV) in a linear way  
  
Here is a short list of different regression analyses and related *brms* distribution families and link functions (depending on whether DV is continuous or categorical):  
  
  
### DV as continuous  
  
#### 1. linear regression  
family = gaussian() (parameters: location, scale)  
*Note*: if the linear regression assumptions do not hold we could also use  
family = student() - robust linear regression that is less influenced by outliers  
family = skew_normal() - skewed responses in linear regression  
  
link = identity [XΒ = μ] (maps every element of a set to itself: linear model directly predicts the outcome  
other choices are: log, inverse, softplus  
  
*Note*: Distributions have max of four parameters: location, scale, shape, threshold  
For example, normal distribution uses two of these: location and scale  
  
  
#### 2. non-linear regression  
see https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html  
  
  
### DV as categorical  
  
#### 3. binary logistic regression (DV has two values)  
family = bernoulli()  
link = logit [XΒ = ln(μ / 1 – μ)], probit, probit_approx, cloglog, cauchit, identity, log  
  
  
#### 4. binomial logistic regression (DV may have more than two values)  
family = binomial()  
link = logit [XΒ = ln(μ / 1 – μ)], probit, probit_approx, cloglog, cauchit, identity, log  
  
  
#### 5. ordinal logistic regression (DV has at least three groups with natural order)  
family = cumulative()  
link = logit, probit, probit_approx, cloglog, cauchit  
  
  
#### 6. multinomial logistic regression (DV has at least three groups which do not have natural order)  
family = categorical() or multinomial()  
link = logit  
  
  
#### 7. poisson regression (DV is count of items)  
family = poisson()  
link = log, identity, sqrt, softplus  
  
*Note*: Poisson regression is not the best choice if  
1) the variance is greater than the mean (overdispersion)  
2) the data has too many zeros to follow the Poisson distribution (zero-inflation)  
  
family = negbinomial() overcomes the limitation 1) above  
link = log, identity, sqrt, softplus  
  
family = zero_inflated_negbinomial() overcomes the limitation 2) above  
link = log, identity, sqrt, softplus  
  
  
#### 8. survival regression  
family = gamma()  
link = log, identity, inverse, softplus  
  
For more details, see ?brmsfamily and vignette("brms_families")  
https://statisticsbyjim.com/regression/choosing-regression-analysis/  
  
First, we need to investigate what kind of distribution is appropriate for the DV (competence).  
  
  
```{r investigate DV distribution, message = F, warning = F}

# let's see how the DV (competence) distribution looks

# the DV is an aggregated (average) variable of four single self-rating items (questions) from an online survey: 
# 1. I feel confident that I can do things well on my studies.
# 2. In my studies, I feel capable at what I do.
# 3. When I am at the university, I feel competent to achieve my goals.
# 4. In my studies, I feel I can successfully complete difficult tasks.

# the five response options for each item were: 1 = totally disagree, ... , 5 = totally agree

# first, remove NA's from the DV (delete rows with missing values)
comp <- dfttu_wide$competence[!is.na(dfttu_wide$competence)]

# second, look at the value distribution
summary(comp)
  #  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  # 2.500   3.500   4.000   3.907   4.250   5.000 


# third, investigate skewness and kurtosis to see if normal distribution could be used
 
# SKEWNESS is a measure of the asymmetry of a distribution. This value can be positive or negative
#   - negative skew indicates that the tail is on the left side of the distribution, which extends towards more negative values
#   - positive skew indicates that the tail is on the right side of the distribution, which extends towards more positive values
#   - value of zero indicates that there is no skewness in the distribution at all, meaning the distribution is perfectly symmetrical
# 
library(moments)
skewness(comp)
# [1] -0.2349759 # small negative skew = participants have used more higher (e.g., 4,5) response options, but close to zero (as desired)

# KURTOSIS is a measure of whether or not a distribution is heavy-tailed or light-tailed relative to a normal distribution
# kurtosis of a normal distribution is 3
#   kurtosis < 3 is said to be playkurtic (fewer and less extreme outliers than the normal distribution)
#   kurtosis > 3 is said to be leptokurtic (more outliers than the normal distribution) 
kurtosis(comp)
# [1] 2.721777 # lightly platykurtic, but the value is quite close to 3 (as desired)

# it looks that normal distribution might be good for the competence DV, but let's dig deeper ...

# plot values
plot(comp, pch = 20) # range from 2.5 - 5.0
title(main="competence")

# produce histrogram
hist(comp)

```
  
  
A special package *fitdistrplus* (Delignette-Muller & Dutang, 2015) can be used to further investigate the most appropriate distribution for our DV.  
  
  
```{r use fitdistrplus to investigate DV, message = F, warning = F}

# https://stats.stackexchange.com/questions/132652/how-to-determine-which-distribution-fits-my-data-best
library(fitdistrplus)

# plot empirical density function from cumulative distribution function
fitdistrplus::plotdist(comp, histo = TRUE, demp = TRUE) # looks more or less normal distribution

# first, treat DV as continuous
fitdistrplus::descdist(comp, 
                       discrete = FALSE, # treat DV as continuous
                       boot=300) # create 300 bootstraps (orange dots in the picture)
# closest hit is "normal" distribution

# second, as competence is measured on a Likert scale from 1-5, we could also see it as categorical
fitdistrplus::descdist(comp, 
                       discrete = TRUE,  # treat DV as categorical
                       boot=300) 
# closest hit is still "normal" distribution

# so, we proceed with normal distribution

# fit normal distribution
fit_competence_normal <- fitdistrplus::fitdist(comp, "norm")
plot(fit_competence_normal)
summary(fit_competence_normal)
# Fitting of the distribution ' norm ' by maximum likelihood 
# Parameters : 
#       estimate Std. Error
# mean 3.9074519 0.04185205
# sd   0.6035988 0.02959350
# Loglikelihood:  -190.1313   AIC:  384.2627   BIC:  390.9377 
# Correlation matrix:
#              mean           sd
# mean 1.000000e+00 7.040333e-11
# sd   7.040333e-11 1.000000e+00

# look at goodness-of-fit
fitdistrplus::gofstat(fit_competence_normal, fitnames = "Normal")
#                                 Normal
# Kolmogorov-Smirnov statistic 0.1422944 # is greater than 0.05
# Cramer-von Mises statistic   0.6234797 # is greater than 0.05
# Anderson-Darling statistic   3.2860570 # is greater than 0.05
# 
# Goodness-of-fit criteria
#                                  Normal
# Akaike's Information Criterion 384.2627
# Bayesian Information Criterion 390.9377

# everything looks quite good so far, normal distribution is appropriate for the DV (competence)

```
  
  
Let's take a look at the default priors that *stan* (stan_glm) would use:  
  
```{r default priors, message = F, warning = F}
# install.packages("rstanarm")
library(rstanarm)
# https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html

# -----------
# empty model

filename <- "default_stan_priors_null.Rds"
modelname <- "default_stan_priors_null"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- stan_glm(competence ~ 1, 
                                    data = dfttu_wide, 
                                    chains = 4, 
                                    warmup = iter/2, 
                                    iter = 4000,
                                    seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

default_stan_priors_null <- readRDS("default_stan_priors_null.Rds")

# prior_summary(default_stan_priors_null)
# # Intercept (after predictors centered)
# #   Specified prior:
# #     ~ normal(location = 3.9, scale = 2.5)
# #   Adjusted prior:
# #     ~ normal(location = 3.9, scale = 1.5) # intercept prior
# # 
# # Auxiliary (sigma)
# #   Specified prior:
# #     ~ exponential(rate = 1)
# #   Adjusted prior:
# #     ~ exponential(rate = 1.7)             # error variance prior

# summary(default_stan_priors_null)
# # Estimates:
# #               mean   sd   10%   50%   90%
# # (Intercept) 3.9    0.0  3.9   3.9   4.0  
# # sigma       0.6    0.0  0.6   0.6   0.6  


# ----------------------------
# add predictors in raw format

filename <- "default_stan_priors_raw.Rds"
modelname <- "default_stan_priors_raw"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- stan_glm(competence ~ autonomy + score, 
                                    data = dfttu_wide, 
                                    chains = 4, 
                                    warmup = iter/2, 
                                    iter = 4000,
                                    seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

default_stan_priors_raw <- readRDS("default_stan_priors_raw.Rds")

# prior_summary(default_stan_priors_raw)
# # Intercept (after predictors centered) # stan_glm will center predictors
# #   Specified prior:
# #     ~ normal(location = 3.9, scale = 2.5)
# #   Adjusted prior:
# #     ~ normal(location = 3.9, scale = 1.5)
# # 
# # Coefficients
# #   Specified prior:
# #     ~ normal(location = [0,0], scale = [2.5,2.5])
# #   Adjusted prior:
# #     ~ normal(location = [0,0], scale = [2.493,0.052]) # raw predictor values: scales of autonomy and score are totally different
# # 
# # Auxiliary (sigma)
# #   Specified prior:
# #     ~ exponential(rate = 1)
# #   Adjusted prior:
# #     ~ exponential(rate = 1.7)  # error SD "sigma" prior exponential(1.7)

# summary(default_stan_priors_raw)
# # Estimates:
# #               mean   sd   10%   50%   90%
# # (Intercept) 1.0    0.2  0.8   1.0   1.2  # competence is 1.0 when autonomy and score have value of zero
# # autonomy    0.8    0.0  0.7   0.8   0.8  # one unit increase in autonomy increases competence by 0.8
# # score       0.0    0.0  0.0   0.0   0.0  # one unit increase in score has no effect on competence
# # sigma       0.4    0.0  0.4   0.4   0.4  # error SD = 0.4 and variance 0.4^2 = 0.16 (smaller than in the null model)


# -----------------------
# add predictors centered

filename <- "default_stan_priors_centered.Rds"
modelname <- "default_stan_priors_centered"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- stan_glm(competence ~ autonomy.cgm + score.cgm, 
                                    data = dfttu_wide, 
                                    chains = 4, 
                                    warmup = iter/2, 
                                    iter = 4000,
                                    seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

default_stan_priors_centered <- readRDS("default_stan_priors_centered.Rds")

# prior_summary(default_stan_priors_centered)
# # Intercept (after predictors centered)
# #   Specified prior:
# #     ~ normal(location = 3.9, scale = 2.5)
# #   Adjusted prior:
# #     ~ normal(location = 3.9, scale = 1.5)
# # 
# # Coefficients
# #   Specified prior:
# #     ~ normal(location = [0,0], scale = [2.5,2.5])
# #   Adjusted prior:
# #     ~ normal(location = [0,0], scale = [2.493,0.052]) # everything stays the same
# # 
# # Auxiliary (sigma)
# #   Specified prior:
# #     ~ exponential(rate = 1)
# #   Adjusted prior:
# #     ~ exponential(rate = 1.7)

# summary(default_stan_priors_centered)
# # Estimates:
# #                mean   sd   10%   50%   90%
# # (Intercept)  3.9    0.0  3.9   3.9   3.9  # competence is 3.9 when autonomy and score are at their average values
# # autonomy.cgm 0.8    0.0  0.7   0.8   0.8  # these stay the same
# # score.cgm    0.0    0.0  0.0   0.0   0.0  
# # sigma        0.4    0.0  0.4   0.4   0.4 


# ----------------------------------
# add predictors centered and scaled


filename <- "default_stan_priors_scaled.Rds"
modelname <- "default_stan_priors_scaled"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- stan_glm(competence ~ autonomy.cgmsd + score.cgmsd, 
                                    data = dfttu_wide, 
                                    chains = 4, 
                                    warmup = iter/2, 
                                    iter = 4000,
                                    seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

default_stan_priors_scaled <- readRDS("default_stan_priors_scaled.Rds")

# prior_summary(default_stan_priors_scaled)
# # Intercept (after predictors centered)
# #   Specified prior:
# #     ~ normal(location = 3.9, scale = 2.5)
# #   Adjusted prior:
# #     ~ normal(location = 3.9, scale = 1.5) 
# # 
# # Coefficients
# #   Specified prior:
# #     ~ normal(location = [0,0], scale = [2.5,2.5])
# #   Adjusted prior:
# #     ~ normal(location = [0,0], scale = [1.51,1.51]) # predictors scaled: autonomy N(0,1.51) and score N(0,1.51) priors
# # 
# # Auxiliary (sigma)
# #   Specified prior:
# #     ~ exponential(rate = 1)
# #   Adjusted prior:
# #     ~ exponential(rate = 1.7)  

# summary(default_stan_priors_scaled)
# # Estimates:
# #                  mean   sd   10%   50%   90%
# # (Intercept)    3.9    0.0  3.9   3.9   3.9    # competence is 3.9 when autonomy and score are at their average (scaled) values
# # autonomy.cgmsd 0.5    0.0  0.4   0.5   0.5    # one SD increase in autonomy increases competence by 0.5 SD
# # score.cgmsd    0.0    0.0  0.0   0.0   0.1    # one SD increase in score increases competence by 0.0 SD
# # sigma          0.4    0.0  0.4   0.4   0.4  

```
  
  
*stan* default priors are not "flat", but intended to be weakly informative (more details [here](https://cran.r-project.org/web/packages/rstanarm/vignettes/priors.html)). That is, they are designed to provide moderate regularization and help stabilize computation. In the above code, "Adjusted prior" is the prior location and scale that was actually used in the analysis (after automatic rescaling).    
  
Now we have enough information to fit the Model 1 (competence ~ autonomy) with *brms* default priors.  
  
  
```{r fit bayesian regression model 1, message = F, warning = F}

# define parameters that brms needs
iter <- 4000 # brms default = 2000
warmup <- iter/2 # brms default 
cores <- n_cores
chains <- 4 # brms default = 2
thin <- 1 # brms default
family <- gaussian() # default link = identity
data <- dfttu_wide

# fit the model 1 and store the result into object and file

filename <- "dfttu_competence_bayesian_regression1.Rds"
modelname <- "dfttu_competence_bayesian_regression1"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
# system.time( # use this to take time of the process ...
modelname <- brm(competence ~ autonomy.cgmsd, 
              data = data,
              warmup = warmup, # burnin iterations before actual iterations
              iter = iter, # n of total iterations per chain (including burnin)
              cores = cores, # could also use n_cores
              chains = chains, # we'll use the default, min is 2 
              sample_prior = TRUE, # plot the difference between the prior and the posterior distribution
              # save_pars = save_pars(all = TRUE), # need to activate if Pareto k values are problematic
              thin = thin, # default = 1, use thin > 1 to save computational time
              family = family, # gaussian() with identity link fuction is the default
              # control = list(adapt_delta = 0.90), # need to use if divergent transitions
              seed = 2023)
# ) # uncomment if you use system.time
saveRDS(modelname, file = filename, compress = F)
}

# ------------------------------------------------------------------------------
# an interesting question: how core setting affects computing time?

# cores = n_cores*2 # max n of logical cores
  #  user  system elapsed 
  # 0.732   0.168   8.522

# cores = n_cores*2-2 # leave two out
  #  user  system elapsed 
  # 0.717   0.149   8.218 

# cores = n_cores # max n of physical cores
#    user  system elapsed 
#   0.712   0.152   8.286 

# cores = 1
  #  user  system elapsed 
  # 8.112   0.392   8.719 

# Comment: it seems that "n_cores" is OK solution ...
# ------------------------------------------------------------------------------


# ------------------------------------------------------------------------------
# how many chains do we need?
# more than one chain are needed to assess the model performance 
# need to make sure that posterior is being sampled adequately and appropriately

# set n of chains to this computers maximum capacity 
# cores = n_cores
# chains = n_cores*2  # 20 chains is SLOW!
  #  user  system elapsed 
  # 1.330   0.132   1.569 

# n of chains equals the n of physical cores
# cores = n_cores 
# chains = n_cores    # 10 chains is STILL QUITE SLOW!!
  #  user  system elapsed 
  # 0.722   0.068   0.857 

# use typical value for the chains
# cores = n_cores
# chains = 4          # BETTER!!
  #  user  system elapsed 
  # 0.396   0.029   0.495 

# Comment: all the above chain settings produced similar results ...
# ------------------------------------------------------------------------------


# read the saved fit into object for later LOO comparisons
dfttu_competence_bayesian_regression1 <- readRDS("dfttu_competence_bayesian_regression1.Rds")

# investigate priors used in the model
brms::prior_summary(modelname)
 #                prior     class           coef group resp dpar nlpar lb ub       source
 #               (flat)         b                                                 default
 #               (flat)         b autonomy.cgmsd                             (vectorized)
 # student_t(3, 4, 2.5) Intercept                                                 default
 # student_t(3, 0, 2.5)     sigma                                       0         default

# to see the stan code type:
# stancode(dfttu_competence_bayesian_regression1)

# ------------------------------------------------------------------------------
# what these default priors are?
# default brms priors are discussed in https://search.r-project.org/CRAN/refmans/brms/html/set_prior.html
#
# "flat" priors are said to be "weak and non-informative" (as they are computed from the data) 
# however, in some cases "... they are actually quite informative and pull the posterior 
# towards extreme values that can bias our inferences." 
# https://mc-stan.org/users/documentation/case-studies/weakly_informative_shapes.html
# this is true when the n of observed data is small ...
# 
# brms default Student-t prior definition is more explicit than previous "flat":
# student_t(3,   # df    = default degrees of freedom that lead to better convergence of the models (fatter tails than normal distribution)
#           4,   # mu    = round(median(dfttu_wide$competence),1)
#           2.5) # sigma = if mad(y) < 2.5, otherwise mad(y)
#                  here it is 2.5 as mad(dfttu_wide$competence) < 2.5 (0.37065)
# 
# 
# compare these to the earlier "stan_glm" priors:
# Intercept: normal(3.9, 1.5)
# autonomy:  normal(0, 1.51)
# score:     normal(0, 1.51)
# sigma:     exponential(rate = 1.7)
# 
# ------------------------------------------------------------------------------

summary(modelname)
#  Family: gaussian 
#   Links: mu = identity; sigma = identity 
# Formula: competence ~ autonomy.cgmsd 
#    Data: dfttu_wide (Number of observations: 208) 
#   Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
#          total post-warmup draws = 8000
# 
# Population-Level Effects: 
#                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept          3.91      0.03     3.85     3.96 1.00     8636     6036
# autonomy.cgmsd     0.46      0.03     0.40     0.51 1.00     8052     6143
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.40      0.02     0.36     0.44 1.00     7336     6100

# compare to earlier stan_glm results (they are similar):  
# Estimates:
#                  mean   sd   10%   50%   90%
# (Intercept)    3.9    0.0  3.9   3.9   3.9  
# autonomy.cgmsd 0.5    0.0  0.4   0.5   0.5 
# score.cgmsd    0.0    0.0  0.0   0.0   0.1  
# sigma          0.4    0.0  0.4   0.4   0.4  


# ------------------------------------------------------------------------------
# well - what these results tell us?
# our family choice was "gaussian()" as it is good for linear regression
# "student" can be used for robust linear regression that is less influenced by outliers
# "skew_normal" can handle skewed responses in linear regression
# "poisson", "negbinomial", and "geometric" can be used for regression of unbounded count data
# "bernoulli", "binomial", and "beta_binomial" can be used for binary regression (i.e., most commonly logistic regression)
#
# default gaussian() link is identity, but also log, inverse, and softplus links are available for this distribution
# 
# 
# Estimate = mean of the posterior distribution
# 
# Est.Error = SD of the posterior distribution  
# 
# l-95% CI, u-95% CI = two-sided lower and upper 95% credible intervals based on quantiles  
# 
# Rhat = convergence of the algorithm (Gelman & Rubin, 1992): If Rhat is considerably greater than 1 (i.e., > 1.1), the chains have not yet converged and it is necessary to run more iterations and/or set stronger priors  
# 
# Bulk_ESS = The ess_bulk function produces an estimated Bulk Effective Sample Size (bulk-ESS) using rank normalized draws. Bulk-ESS is useful measure for sampling efficiency in the bulk of the distribution (related e.g. to efficiency of mean and median estimates), and is well defined even if the chains do not have finite mean or variance (Vehtari et al., 2021)  
# 
# Tail_ESS = The ess_tail function produces an estimated Tail Effective Sample Size (tail-ESS) by computing the minimum of effective sample sizes for 5% and 95% quantiles. Tail-ESS is useful measure for sampling efficiency in the tails of the distribution (related e.g. to efficiency of variance and tail quantile estimates) (Vehtari et al., 2021)
# 
# -------------------------
# Population-Level Effects:  
# Intercept: Mean value of posterior samples of competence
# One SD change in predictor results in [Estimate] change in DV when all other predictors have zero value 
# One SD in autonomy.cgmsd is 0.6068433 points on a five-point (1 - 5) scale: 
sd(dfttu_wide$autonomy, na.rm = T) # 0.6068433
sd(dfttu_wide$competence, na.rm = T) # 0.605055
# so, one SD increase in autonomy.cgmsd (0.6068433 units) increases competence by 0.46 SD's 
# that is, 0.46*0.605055=0.2783253 units
# as the average of competence was
mean(dfttu_wide$competence, na.rm = T) # 3.907452
# one SD increase in autonomy results to (3.907452+0.2783253) competence level of 4.185777
# two SD increase in autonomy results to (3.907452+(2*0.2783253)) competence level of 4.464103
# we might say that autonomy is quite strong positive predictor of competence!!
# 
# we could do hypothesis test to confirm:
hypothesis(dfttu_competence_bayesian_regression1, "autonomy.cgmsd > 0")
# Hypothesis Tests for class b:
#             Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
# 1 (autonomy.cgmsd) > 0     0.46      0.03     0.41      0.5        Inf         1    *
# ---
# 'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
# '*': For one-sided hypotheses, the posterior probability exceeds 95%;
# for two-sided hypotheses, the value tested against lies outside the 95%-CI.
#
# 
# ---------------------------
# Family Specific Parameters: 
# sigma is an indication of variance of competence 
# Estimate is reported as SD, so the variance is 0.40^2 = 0.16.
# ------------------------------------------------------------------------------


# more specific population-level (fixed) effects:
brms::fixef(modelname)
#                 Estimate  Est.Error      Q2.5     Q97.5
# Intercept      3.9079508 0.02781036 3.8536483 3.9625497
# autonomy.cgmsd 0.4563687 0.02726202 0.4039599 0.5101277


# posterior predictive checks
# https://m-clark.github.io/easy-bayes/posterior-predictive-checks.html

# resulting posterior predictive distribution is the distribution of the outcome variable implied by a model after using the observed data y (a vector of outcome values), and typically predictors X, to update our beliefs about the unknown parameters θ in the model
brms::pp_check(modelname) # posterior samples follow quite nicely the distribution of DV
# brms::pp_check(modelname, ndraws = 100, 
#          type = 'ecdf_overlay') # better for categorical
# # Winter & Bürkner, 2021: the argument type = 'ecdf_overlay' to return an empirical cumulative distribution function (ECDF). By default, pp_check() returns a smoothed output which may be inappropriate for discrete data, such as count data. The black line (the cumulative distribution function of the data) should reasonably fall within the simulated data. 
brms::pp_check(modelname, type = "stat", stat = "mean") # mean of DV was well captured
brms::pp_check(modelname, type = "error_scatter_avg") # not too much predictive error

brms::mcmc_plot(modelname)
brms::mcmc_plot(modelname, type = "trace") # chains converge?
brms::mcmc_plot(modelname, type = "acf_bar") #  autocorrelation?
brms::mcmc_plot(modelname, type = "areas", prob = 0.95) # zero included?
plot(conditional_effects(modelname), points = TRUE, ask = FALSE)


# check Pareto-smoothed importance sampling (PSIS-LOO)
# point is to check if there are observations for which the leave-one-out posteriors are different enough from the full posterior (and importance-sampling is not able to correct the difference)
loo_model1 <- LOO(dfttu_competence_bayesian_regression1
    # moment_match = TRUE # Use if Pareto k diagnostic values are high
    )

print(loo_model1)
#          Estimate   SE
# elpd_loo   -105.6 10.4
# p_loo         2.8  0.3
# looic       211.2 20.8
# ------
# Monte Carlo SE of elpd_loo is 0.0.
# All Pareto k estimates are good (k < 0.5).

# https://search.r-project.org/CRAN/refmans/loo/html/loo-glossary.html
# elpd_loo = Bayesian LOO estimate of the expected log pointwise predictive density and is a sum of N individual pointwise log predictive densities. Probability densities can be smaller or larger than 1, and thus log predictive densities can be negative or positive
# elpd_loo SE = calculated by using the standard deviation of the N components and multiplying by sqrt(N). This standard error is a coarse description of our uncertainty about the predictive performance for unknown future data (smaller is better)

# p_loo = difference between elpd_loo and the non-cross-validated log posterior predictive density. It describes how much more difficult it is to predict future data than the observed data. Asymptotically under certain regularity conditions, p_loo can be interpreted as the effective number of parameters. In well behaving cases p_loo < N and p_loo < p, where p is the total number of parameters in the model. If p_loo > N or p_loo > p, the model has very weak predictive capability and may indicate a severe model misspecification

# looic = -2*elpd_loo

# The Pareto k estimate = Diagnostic for Pareto smoothed importance sampling (PSIS), which is used to compute components of elpd_loo. In importance-sampling LOO (the full posterior distribution is used as the proposal distribution). The Pareto k diagnostic estimates how far an individual leave-one-out distribution is from the full distribution. If leaving out an observation changes the posterior too much then importance sampling is not able to give reliable estimate. 

#   k < 0.5       = the corresponding component of elpd_loo is estimated with high accuracy
#   0.5 < k < 0.7 =⁠⁠ the accuracy is lower, but still ok
#   k > 0.7       = importance sampling is not able to provide useful estimate for that component/observation 

# Very high k values often indicate model misspecification, outliers or mistakes in data processing


plot(loo_model1)

# everything seems to be ok!

```
  
  
Just to make sure that *brms* is an interface for *Stan*, we may type   
  
```{r confess, message = F, warning = F}

modelname$model

```
  
... to see the underlying Stan code.  
  
  
Let's add a predictor (math achievement = "score") to the second model:  
  
  
```{r fit bayesian regression model 2, message = F, warning = F}

# fit the model 2 and store the result into object and file
filename <- "dfttu_competence_bayesian_regression2.Rds"
modelname <- "dfttu_competence_bayesian_regression2"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- brm(competence ~ autonomy.cgmsd + score.cgmsd, 
              data = data,
              warmup = warmup,
              iter = iter,
              cores = cores,
              chains = chains,
              # save_pars = save_pars(all = TRUE), # need to activate if Pareto k values are problematic
              thin = thin,
              family = family,
              # control = list(adapt_delta = 0.90), # need to use if divergent transitions
              seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

dfttu_competence_bayesian_regression2 <- readRDS("dfttu_competence_bayesian_regression2.Rds")

brms::prior_summary(modelname)
 #                prior     class           coef group resp dpar nlpar lb ub       source
 #               (flat)         b                                                 default
 #               (flat)         b autonomy.cgmsd                             (vectorized)
 #               (flat)         b    score.cgmsd                             (vectorized)
 # student_t(3, 4, 2.5) Intercept                                                 default
 # student_t(3, 0, 2.5)     sigma                                       0         default


summary(modelname)
#  Family: gaussian 
#   Links: mu = identity; sigma = identity 
# Formula: competence ~ autonomy.cgmsd + score.cgmsd 
#    Data: dfttu_wide (Number of observations: 208) 
#   Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
#          total post-warmup draws = 8000
# 
# Population-Level Effects: 
#                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept          3.91      0.03     3.85     3.96 1.00     9939     6027
# autonomy.cgmsd     0.46      0.03     0.40     0.51 1.00     8259     5754
# score.cgmsd        0.05      0.03    -0.01     0.10 1.00     9100     5890
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.40      0.02     0.36     0.44 1.00     8308     6133

# One SD increase in autonomy.cgmsd (0.6068433) increases competence by 0.45 (on 1 - 5 scale)
sd(dfttu_wide$score, na.rm = T) # 29.11842
sd(dfttu_wide$competence, na.rm = T) # 0.605055
# One SD increase in score (29.11842) increases competence by 0.05 SD's (0.05*0.605055) and 0.03025275 units
# as the average of competence was
mean(dfttu_wide$competence, na.rm = T) # 3.907452
# one SD increase in score results to the (3.907452+0.03025275) competence level of 3.937705
# two SD increase in autonomy results to (3.907452+(2*0.03025275)) competence level of 3.967958
# we might say that score is quite weak positive predictor of competence!!

# fixed effects
brms::fixef(modelname)
#                  Estimate  Est.Error         Q2.5     Q97.5
# Intercept      3.90760865 0.02791627  3.853406080 3.9625727
# autonomy.cgmsd 0.45734451 0.02745892  0.402625288 0.5105117
# score.cgmsd    0.04647312 0.02781814 -0.007490549 0.1012415


# posterior predictive checks
brms::pp_check(modelname)
brms::pp_check(modelname, ndraws = 100, 
         type = 'ecdf_overlay') # better for categorical

brms::mcmc_plot(modelname)
brms::mcmc_plot(modelname, type = "trace") # chains converge?
brms::mcmc_plot(modelname, type = "acf_bar") #  autocorrelation?
brms::mcmc_plot(modelname, type = "areas", prob = 0.95) # zero included?
plot(conditional_effects(modelname), points = TRUE, ask = FALSE)


# LOO(dfttu_competence_bayesian_regression2
#     # moment_match = TRUE # Use if Pareto k diagnostic values are high
#     )
# #          Estimate   SE
# # elpd_loo   -105.2 10.5
# # p_loo         3.7  0.4
# # looic       210.3 21.1
# # ------
# # Monte Carlo SE of elpd_loo is 0.0.
# # All Pareto k estimates are good (k < 0.5)

-105.2 + c(-2,2) * 10.5
# [1] -126.2  -84.2

# save values for the results section
aut.cwc_Est <- summary(dfttu_competence_bayesian_regression2)$fixed[2,1]
aut.cwc_LCI <- summary(dfttu_competence_bayesian_regression2)$fixed[2,3]
aut.cwc_UCI <- summary(dfttu_competence_bayesian_regression2)$fixed[2,4]

sco.cwc_Est <- summary(dfttu_competence_bayesian_regression2)$fixed[3,1]
sco.cwc_LCI <- summary(dfttu_competence_bayesian_regression2)$fixed[3,3]
sco.cwc_UCI <- summary(dfttu_competence_bayesian_regression2)$fixed[3,4]

```
  
  
Compare Model 1 and 2 with *loo* to see which one of them describes the data best:  
  
  
```{r use LOO to compare models 1 and 2, message = F, warning = F}

# we compare (simpler) Model 1 with a more complex Model 2
loo_compare_m1m2 <- LOO(dfttu_competence_bayesian_regression1, dfttu_competence_bayesian_regression2)
print(loo_compare_m1m2)
# Model comparisons:
#                                       elpd_diff se_diff
# dfttu_competence_bayesian_regression2  0.0       0.0   
# dfttu_competence_bayesian_regression1 -0.5       1.6    

# https://avehtari.github.io/modelselection/CV-FAQ.html
# elpd_diff < 4  = small difference between the two models
# elpd_diff >= 4 = compare difference to se_diff:
# https://discourse.mc-stan.org/t/if-elpd-diff-se-diff-2-is-this-noteworthy/20549/11

# save diff params
loo_elpd_diff_m1m2 <- loo_compare_m1m2$diffs[2,1] 
loo_elpd_diff_m1m2 # -0.5089953
loo_se_diff_m1m2 <- loo_compare_m1m2$diffs[2,2] 
loo_se_diff_m1m2  # 1.582903

loo_elpd_diff_interval_m1m2 <- loo_elpd_diff_m1m2 + c(-2,2) * loo_se_diff_m1m2 
loo_elpd_diff_interval_m1m2 # [1] -3.674801  2.656810 

loo_elpd_diff_low_int_m1m2 <- loo_elpd_diff_interval_m1m2[1]
loo_elpd_diff_upp_int_m1m2 <- loo_elpd_diff_interval_m1m2[2]

# as the elpd diff interval contains 0 (implying non-equal LOO), the two models are not different


```
  
```{r create APA style table with sjPlot, message = F, warning = F}

# https://strengejacke.github.io/sjPlot/articles/tab_bayes.html

library(sjPlot)

sjPlot::tab_model(dfttu_competence_bayesian_regression1, 
                  dfttu_competence_bayesian_regression2,
                  dv.labels = c("Model 1<sub><i>comp ~ aut</i></sub>",
                                "Model 2<sub><i>comp ~ aut + score</i></sub>"))


```

  
Two models did not differ, so we might prefer the more simpler Model 1 (competence ~ autonomy) over the Model 2 (competence ~ autonomy + score).  
  
However, for the instructional purposes, we'll proceed with the Model 2.    
  
Next, we are going to see if the prior distributions matter (so far we have used the default "flat" priors given by *brms*).  
  
  
## Define priors  
  
## Weakly informative priors  
  
We will use the previous *stan* (stan_glm) weakly informative priors here.  
  
```{r define weakly informative priors, message = F, warning = F}

# a prior distribution normal(0,1) for autonomy.cgmsd and score.cgmsd implies that the parameter of interest (competence) is believed to lie between -2 and +2 with the probability of 95%. This makes sense as the competence follows normal distribution: approx. 95% of the values will lie 2 SD's (2*1=2) below and above the M (zero)
# this is a regularizing, weakly informative prior (Gelman, Simpson, & Betancourt, 2017)

# earlier we saw that stan_glm approximated weakly informative priors for the two predictors in the Model 2 to be:
# Intercept (after predictors centered)
#   Adjusted prior:
#     ~ normal(location = 3.9, scale = 1.5)
# 
# Coefficients
#   Adjusted prior:
#     ~ normal(location = [0,0], scale = [1.51,1.51])
# 
# Auxiliary (sigma)
#   Adjusted prior:
#     ~ exponential(rate = 1.7)

# so, we'll use them 
weakly_inf_priors <- c(set_prior("normal(3.9,1.5)", class="Intercept"),
                       set_prior("exponential(1.7)", class="sigma"),
                       set_prior("normal(0,1.5)", class = "b", coef= "autonomy.cgmsd"),
                       set_prior("normal(0,1.5)", class = "b", coef= "score.cgmsd"))

# further reading: Gabry et al. (2018). Visualization in Bayesian workflow https://arxiv.org/pdf/1709.01449.pdf

```
  
  
```{r fit Model 2 with weakly informative priors, message = F, warning = F}

# fit the model 2 with weakly informative priors
filename  <- "regression2weak.Rds"
modelname <- "regression2weak"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- brm(competence ~ autonomy.cgmsd + score.cgmsd,
              data = data,
              prior = weakly_inf_priors,
              sample_prior = TRUE, # to plot the difference between the prior and the posterior distribution
              warmup = warmup,
              iter = iter,
              cores = cores,
              chains = chains,
              # save_pars = save_pars(all = TRUE), # need to activate if Pareto k values are problematic
              thin = thin,
              family = family,
              # control = list(adapt_delta = 0.90), # need to use if divergent transitions
              seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

regression2weak <- readRDS("regression2weak.Rds")

prior_summary(modelname)
 #            prior     class           coef group resp dpar nlpar lb ub  source
 #           (flat)         b                                            default
 #    normal(0,1.5)         b autonomy.cgmsd                                user
 #    normal(0,1.5)         b    score.cgmsd                                user
 #  normal(3.9,1.5) Intercept                                               user
 # exponential(1.7)     sigma                                       0       user

summary(modelname)
# Population-Level Effects: 
#                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept          3.91      0.03     3.85     3.96 1.00     9197     6698
# autonomy.cgmsd     0.46      0.03     0.40     0.51 1.00     8172     6150
# score.cgmsd        0.05      0.03    -0.01     0.10 1.00     9695     6074
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.40      0.02     0.36     0.44 1.00     8120     6277

# we note that the weakly informative priors do not change the results

# test hypothesis that autonomy > 0 and score > 0
hypothesis(regression2weak, "autonomy.cgmsd > 0")
hypothesis(regression2weak, "score.cgmsd > 0")
# Hypothesis Tests for class b:
#             Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
# 1 (autonomy.cgmsd) > 0     0.46      0.03     0.41      0.5        Inf         1    * # autonomy = YES!
#          Hypothesis Estimate Est.Error CI.Lower CI.Upper Evid.Ratio Post.Prob Star
# 1 (score.cgmsd) > 0     0.05      0.03        0     0.09      20.92      0.95    *    # score = NO!

```
  
  
```{r look at weakly informative priors}

plot(hypothesis(regression2weak, "autonomy.cgmsd > 0")) 
plot(hypothesis(regression2weak, "score.cgmsd > 0"))

# both prior and posterior distribution locations are close

```


### Reasonable priors  
  
Next, we will define "reasonable" priors ...  
  
  
```{r define reasonable priors, message = F, warning = F}

# http://svmiller.com/blog/2021/02/thinking-about-your-priors-bayesian-analysis/

################################################################################
# DV: 

# we see that the DV has plausible values from 2.5 to 5
table(round(dfttu_wide$competence,2))
 # 2.5 2.75    3 3.25  3.5 3.75    4 4.25  4.5 4.75    5 
 #   4   11   12   11   15   37   55   17   19   13   14

# min
DVmin <- min(round(dfttu_wide$competence,2))
# 2.5

# max
DVmax <- max(round(dfttu_wide$competence,2))
# 5

# mean
DVmean <- mean(dfttu_wide$competence, na.rm = T)
# 3.907452

# SD
DVsd <- sd(dfttu_wide$competence, na.rm = T)
# 0.605055

# priors of the DV might be set to normal(4,0.6) 


################################################################################
# IV's

# IV 1: 

# min
IV1cwc_min <- min(round(dfttu_wide$autonomy.cgmsd,2))
IV1cwc_min
# -2.53

# max
IV1cwc_max <- max(round(dfttu_wide$autonomy.cgmsd,2))
IV1cwc_max
# 2

# mean
IV1cwc_mean <- mean(dfttu_wide$autonomy.cgmsd, na.rm = T)
IV1cwc_mean
# 3.835073e-16 # close to zero as this one is centered

# SD
IV1cwc_sd <- sd(dfttu_wide$autonomy.cgmsd, na.rm = T)
IV1cwc_sd
# 1

# define autonomy prior
# by calculating hypothetical maximum effect of a unit-change in autonomy
IV1cwc_priorSD <- (DVmax-DVmin)/(IV1cwc_max-IV1cwc_min)/2.58 # 99% of the distribution is within about 2.58 SD from the mean
IV1cwc_priorSD
# 0.2139056 # standard deviation of a normal distribution with a mean of 0, i.e., normal(0,0.21)
# we might use normal(0,0.2)


# IV 2: 

# min
IV2cwc_min <- min(round(dfttu_wide$score.cgmsd,2))
IV2cwc_min
# -1.64

# max
IV2cwc_max <- max(round(dfttu_wide$score.cgmsd,2))
IV2cwc_max
# 1.79

# mean
IV2cwc_mean <- mean(dfttu_wide$score.cgmsd, na.rm = T)
IV2cwc_mean
# -6.405133e-17 # close to zero, as this variable is centered

# SD
IV2cwc_sd <- sd(dfttu_wide$score.cgmsd, na.rm = T)
IV2cwc_sd
# 1

# define score prior SD
# by calculating hypothetical maximum effect of a unit-change in score
IV2cwc_priorSD <- (DVmax-DVmin)/(IV2cwc_max-IV2cwc_min)/2.58 # 99% of the distribution is within about 2.58 SD from the mean
IV2cwc_priorSD
# 0.282505 # standard deviation of a normal distribution with a mean of 0, i.e., normal(0,0.28)
# we might use normal(0,0.3)

# store reasonable priors
reasonable_priors <- c(set_prior("normal(0,0.2)", class = "b", coef= "autonomy.cgmsd"),
                       set_prior("normal(0,0.3)", class = "b", coef= "score.cgmsd"),
                       set_prior("normal(4,0.6)", class="Intercept"))


```
  
  
```{r fit Model 2 with reasonable priors, message = F, warning = F}

# fit the model 2 with reasonable priors
filename <- "regression2reas.Rds"
modelname <- "regression2reas"

if (file.exists(filename)) {
modelname <- readRDS(filename)
} else {
options(buildtools.check = function(action)TRUE)
modelname <- brm(competence ~ autonomy.cgmsd + score.cgmsd, 
              data = data,
              prior = reasonable_priors,
              sample_prior = TRUE,
              warmup = warmup,
              iter = iter,
              cores = cores,
              chains = chains,
              # save_pars = save_pars(all = TRUE), # need to activate if Pareto k values are problematic
              thin = thin,
              family = family,
              # control = list(adapt_delta = 0.90), # need to use if divergent transitions
              seed = 2023)
saveRDS(modelname, file = filename, compress = F)
}

regression2reas <- readRDS("regression2reas.Rds")

prior_summary(modelname)
 #                prior     class           coef group resp dpar nlpar lb ub  source
 #               (flat)         b                                            default
 #        normal(0,0.2)         b autonomy.cgmsd                                user
 #        normal(0,0.3)         b    score.cgmsd                                user
 #        normal(4,0.6) Intercept                                               user
 # student_t(3, 0, 2.5)     sigma                                       0    default

summary(modelname)
# Population-Level Effects: 
#                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# Intercept          3.91      0.03     3.85     3.96 1.00     9214     6328
# autonomy.cgmsd     0.45      0.03     0.39     0.50 1.00     8138     5885
# score.cgmsd        0.05      0.03    -0.01     0.10 1.00     8922     5871
# 
# Family Specific Parameters: 
#       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
# sigma     0.40      0.02     0.36     0.44 1.00     8540     6462

```
  
```{r look at reasonably priors}

plot(hypothesis(regression2reas, "autonomy.cgmsd > 0")) # difference in prior and posterior locations!
plot(hypothesis(regression2reas, "score.cgmsd > 0")) # prior and posterior distribution locations are close

```
  
```{r compare flat/weak/reas priors with LOO, message = F, warning = F}

loo_compare_m2flat_weak_reas <- LOO(dfttu_competence_bayesian_regression2, # default brms priors
                                    regression2weak,                       # weakly informative priors from stan_glm
                                    regression2reas)                       # reasonable priors

print(loo_compare_m2flat_weak_reas)
# Model comparisons:
#                                       elpd_diff se_diff
# regression2weak                        0.0       0.0   
# dfttu_competence_bayesian_regression2  0.0       0.0   
# regression2reas                       -0.2       0.3   # differs from the two other models

# so let's check if that difference matters
loo_elpd_diff_m2weak_reas <- loo_compare_m2flat_weak_reas$diffs[3,1] # -0.1596373
loo_elpd_diff_m2weak_reas
loo_se_diff_m2weak_reas <- loo_compare_m2flat_weak_reas$diffs[3,2] # 0.297847
loo_se_diff_m2weak_reas

loo_elpd_diff_interval_m2weak_reas <- loo_elpd_diff_m2weak_reas + c(-2,2) * loo_se_diff_m2weak_reas 
loo_elpd_diff_interval_m2weak_reas # [1] -0.7553314  0.4360568

# as the interval contains zero, we conclude that the three models do not differ

```
  
We conclude that with this data, prior settings do not change the results (conclusions).  
  
  
```{r create APA style table with sjPlot to compare full/weak/reas priors on Model 2, message = F, warning = F}

sjPlot::tab_model(dfttu_competence_bayesian_regression2,
                  regression2weak,
                  regression2reas,
                  dv.labels = c("Model 2<sub><i>FLAT</i></sub>",
                                "Model 2<sub><i>WEAK</i></sub>",
                                "Model 2<sub><i>REAS</i></sub>")
                  )

```
  
  
**Report the results**
  
Research question was addressed with Bayesian regression (*brms*, Bürkner, 2017; *Stan*, Stan Development Team, 2022; *R*, R Core Team, 2022) where participants self-assessed competence (DV, scale from 1=totally disagree to 5=totally agree) was regressed on their self-assessed autonomy (IV1, 1=totally disagree to 5=totally agree) and score from a mathematics test (IV2, low=0, high=100). Predictors were centered and scaled. Distribution family for the DV was investigated with *fitdistrplus* (Delignette-Muller & Dutang, 2015). Results showed that the gaussian distribution (with identity link function) described the DV distribution best. Bayesian regression analyses were conducted with the following settings: warmup = 1500, iteration = 3000, chains = 4, thinning = 1. This resulted in 6000 warmup iterations (4 chains * 1500 iterations) and 6000 sampling iterations (4 chains * 3000-1500 iterations). Posterior sampling was investigated with *brms::pp_check* and *brms::mcmc_plot*. Results showed that the posterior predictive distribution (after using the observed data and prior information) was close to the distribution of DV and the four chains converged well.  
  
Predictor selection was based on PSIS-LOO (*loo*, Vehtari et al., 2022). Results of model comparison with *loo* showed that the Model 2 (competence ~ autonomy + score) was not significantly better than the more simple Model 1 (competence ~ autonomy) as the interval of Bayesian leave-one-out estimate of the expected log pointwise predictive density (elpd) difference between the two models did not contain zero value: $\Delta$ elpd = `r dec2(loo_elpd_diff_m1m2)` (interval from `r dec2(loo_elpd_diff_low_int_m1m2)` to `r dec2(loo_elpd_diff_upp_int_m1m2)`). Both models were analysed with three types of prior distributions: 1) *brms* default flat priors, 2) weakly informative priors [intercept= `r prior_summary(regression2weak)$prior[1]`; autonomy = `r prior_summary(regression2weak)$prior[2]`; score = `r prior_summary(regression2weak)$prior[3]`] and 3) reasonable priors [intercept= `r prior_summary(regression2reas)$prior[1]`; autonomy = `r prior_summary(regression2reas)$prior[2]`; score = `r prior_summary(regression2reas)$prior[3]`]. However, the following results are reported with flat priors as the priors only had a minor effect on the posteriori estimates. Results in Table 1 showed that the autonomy was a strong positive predictor of competence ($\beta$<sub>cgmsd</sub> = `r dec2(aut.cwc_Est)`), with the 95 % Credible Interval (CI) range `r dec2(aut.cwc_LCI)` to `r dec2(aut.cwc_UCI)`). Mathematical achievement was not related to competence ($\beta$<sub>cgmsd</sub> = `r dec2(sco.cwc_Est)`, 95%CI = `r dec2(sco.cwc_LCI)` - `r dec2(sco.cwc_UCI)`).
  
  
## Exercises  
  
Set strong priors to Model 2: Your belief (before seeing the data) is that there is a strong negative association between autonomy and competence and a strong positive association between score and competence. Compare the fit of this model to the previous best fitting model ("regression2weak).   
  
  
Use student() distribution with identity link to fit Model 2 with *brms* default flat priors. Compare the resulted fit to the previous "dfttu_competence_bayesian_regression2" model (gaussian distribution with identity link and flat priors). Should we use student() distribution for the DV (competence) instead of gaussian() distribution?  
  
  
**Course homepage:**  
[https://homepages.tuni.fi/petri.nokelainen/mmx9120/](https://homepages.tuni.fi/petri.nokelainen/mmx9120/)  
  

## References  
  
Bürkner, P-C. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. *Journal of Statistical Software*, *80*(1).  
  
Delignette-Muller, M. L., & Dutang, C. (2015). fitdistrplus: An R Package for Fitting Distributions. *Journal of Statistical Software*, *64*(4), 1-34.  
  
R Core Team. (2022). *R: A language and environment for statistical computing*. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/  
  
RStudio Team. (2016). *RStudio: Integrated Development for R*. URL http://www.rstudio.com/  
  
Stan Development Team. (2022). *RStan: the R interface to Stan*. R package version 2.21.5. https://mc-stan.org/  
  
Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P-C., Paananen, T., & Gelman, A. (2022). *loo: Efficient leave-one-out cross-validation and WAIC for Bayesian models.* R package version 2.5.1. https://mc-stan.org/loo/
  
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P-C. (2021). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC. *Bayesian Analysis*, *16*(2), 667–718.