Take-home Exercise 3: Be Weatherwise or Otherwise

Published

February 14, 2024

Modified

February 18, 2024

1. Overview

The objective of this exercise is to create an analytics-driven data visualization to validate the claims that daily mean temperatures are projected to increase by 1.4 to 4.6. To achieve this, we will employ techniques of visual interactivity and uncertainty visualization.

The historical daily temperature datasets were downloaded from Meteorological Service Singapore website, consisting of daily mean temperatures recorded for January in the year 1983, 1993, 2003, 2013, and 2023 at the Changi weather station.

2. Getting Started

2.1 Installing and loading the required R packages

In this exercise, we use p_load() of pacman package to load required R packages. The packages that will be used are:

  • tidyverse a family of R packages for data science process,

  • ggstatsplot package to create visual graphics with rich statistical information,

  • ggiraph for making ‘ggplot’ graphics interactive,

  • patchwork for combining multiple ggplot2 graphs into one figure.

Code
pacman::p_load(tidyverse, ggiraph, ggstatsplot, patchwork)

2.2 Importing the data

The downloaded datasets consist of five separate CSV files. The code chunk below imports all the five files into R environment by using read_csv() function of readr package.

data_1983 <- read_csv("data/DAILYDATA_S24_198301.csv",locale=locale(encoding="latin1"))
data_1993 <- read_csv("data/DAILYDATA_S24_199301.csv",locale=locale(encoding="latin1"))
data_2003 <- read_csv("data/DAILYDATA_S24_200301.csv",locale=locale(encoding="latin1"))
data_2013 <- read_csv("data/DAILYDATA_S24_201301.csv",locale=locale(encoding="latin1"))
data_2023 <- read_csv("data/DAILYDATA_S24_202301.csv",locale=locale(encoding = "UTF-8"))

Next, we will merge the data and save the resulting object to an RDS file, which will then be loaded into the working environment.

temp_data <- bind_rows(data_1983, data_1993, data_2003, data_2013, data_2023)
write_rds(temp_data,"data/temp_data.rds")
temp_data <- read_rds("data/temp_data.rds")
Code
head(temp_data)
# A tibble: 6 × 16
  Station  Year Month   Day `Daily Rainfall Total (mm)` Highest 30 Min Rainfal…¹
  <chr>   <dbl> <dbl> <dbl>                       <dbl> <chr>                   
1 Changi   1983     1     1                         0.3 "\u0097"                
2 Changi   1983     1     2                         0.4 "\u0097"                
3 Changi   1983     1     3                         2.9 "\u0097"                
4 Changi   1983     1     4                         0   "\u0097"                
5 Changi   1983     1     5                         0   "\u0097"                
6 Changi   1983     1     6                         0   "\u0097"                
# ℹ abbreviated name: ¹​`Highest 30 Min Rainfall (mm)`
# ℹ 10 more variables: `Highest 60 Min Rainfall (mm)` <chr>,
#   `Highest 120 Min Rainfall (mm)` <chr>, `Mean Temperature (°C)` <dbl>,
#   `Maximum Temperature (°C)` <dbl>, `Minimum Temperature (°C)` <dbl>,
#   `Mean Wind Speed (km/h)` <dbl>, `Max Wind Speed (km/h)` <dbl>,
#   `Highest 30 min Rainfall (mm)` <dbl>, `Highest 60 min Rainfall (mm)` <dbl>,
#   `Highest 120 min Rainfall (mm)` <dbl>

2.3 Variables Selection

We will select our variables of interest from 13 variables and narrow them down to 6 variables. They are: Year, Month, Day, Mean Temperature (°C), Maximum Temperature (°C), Minimum Temperature (°C). Subsequently, we will simplify the variable names for convenience.

# select variables
subset <- temp_data %>%
  select('Year','Month','Day','Mean Temperature (°C)','Maximum Temperature (°C)',
         'Minimum Temperature (°C)')

# rename
subset <- subset %>%
  rename('Mean_temp'='Mean Temperature (°C)',
         'Max_temp'='Maximum Temperature (°C)',
         'Min_temp'='Minimum Temperature (°C)')
Code
head(subset)
# A tibble: 6 × 6
   Year Month   Day Mean_temp Max_temp Min_temp
  <dbl> <dbl> <dbl>     <dbl>    <dbl>    <dbl>
1  1983     1     1      26.5     28.7     25.1
2  1983     1     2      26.8     30.6     24.8
3  1983     1     3      27       31.3     24.5
4  1983     1     4      27.3     30.8     25  
5  1983     1     5      27.1     31.8     23.7
6  1983     1     6      27.2     32.1     23.7

3. Data Visualisation

3.1 Visualizing the Uncertainty of Historical Temperature

A boxplot will be generated to visualize the distribution of historical daily mean temperature data for January of each year. Additionally, confidence intervals of the mean temperature by year will be also be plotted. This will establish a baseline for comparison with projected increases.

Firstly, code chunk below will be used to derive the necessary summary statistics.

my_sum <- subset %>%
  group_by(Year) %>%
  summarise(
    n=n(),
    mean=mean(Mean_temp),
    sd=sd(Mean_temp)
    ) %>%
  mutate(se=sd/sqrt(n-1))

Next, the code chunk below will be used to display my_sum tibble data frame in an html table format.

knitr::kable(head(my_sum), format = 'html')
Year n mean sd se
1983 31 26.45161 0.6587215 0.1202655
1993 31 26.20645 0.8390214 0.1531837
2003 31 26.66129 0.8179965 0.1493450
2013 31 27.04516 0.9804212 0.1789996
2023 31 26.53548 1.2755517 0.2328828

Now we are ready to create the visualization.

Code
p1 <- ggplot(my_sum) + 
            geom_errorbar_interactive(aes(x=factor(Year),
                              ymin=mean-1.96*se, 
                              ymax=mean+1.96*se), 
                              data_id = my_sum$Year,
                              width=0.2, 
                              colour="black", 
                              alpha=0.9, 
                              size=0.5) +
                   geom_point_interactive(aes(x=factor(Year), 
                                  y=mean, 
                                  data_id = Year,
                                  tooltip = paste("Year:", `Year`, 
                                  "<br>Mean Temperature:", round(mean, digits = 2),
                                  "<br>95% CI:[", 
                                  round((mean-1.96*se), digits = 2), ",",
                                  round((mean+1.96*se), digits = 2),"]")),
                     stat="identity", 
                     color="red", 
                     size = 1.5, 
                     alpha=1) + 
             
                   ylab("Mean Temperature (°C)") + 
                  coord_cartesian(ylim = c(23.5, 29.5)) +
                   theme_minimal() + 
                   theme(axis.text.x = element_text(
                     angle = 45, vjust = 0.5, hjust=1),
                     axis.title.x = element_blank()) +
                   ggtitle("95% Confidence Interval of Mean\n Temperature by Year")

p2 <- ggplot(subset,
             aes(x = factor(Year), y = Mean_temp)) +
  geom_boxplot_interactive(
    aes(tooltip = paste("Year: ", Year,
                        "<br>Median Temperature:", round(median(Mean_temp), digits = 2)),
        data_id = Year),
        fill = "grey") +
  coord_cartesian(ylim = c(23.5, 29.5)) +
    labs(title = "Daily Mean Temperature Distribution\n for January", x = "Year") +
  theme_minimal() +
theme(axis.text.x = element_text(angle = 45, vjust = 0.5, hjust=1),
  axis.title.x = element_blank(),
  axis.title.y = element_blank())


girafe(                                  
  code = print(p1 + p2),                             
  width_svg = 8,                         
  height_svg = 8*0.618,
  options = list(
         opts_hover(css = "stroke-width:1"),
         opts_hover_inv(css = "opacity:0.2;")
         ))     

3.2 Visualizing the Future Temperature Projection

First, a linear regression model will be used to analyze the trend and predict future temperatures.

# Fit linear regression model
model <- lm(mean ~ Year, my_sum)
Code
summary(model)

Call:
lm(formula = mean ~ Year, data = my_sum)

Residuals:
       1        2        3        4        5 
 0.07290 -0.27290  0.08129  0.36452 -0.24581 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  6.420774  19.340654   0.332    0.762
Year         0.010065   0.009656   1.042    0.374

Residual standard error: 0.3053 on 3 degrees of freedom
Multiple R-squared:  0.2659,    Adjusted R-squared:  0.02117 
F-statistic: 1.086 on 1 and 3 DF,  p-value: 0.3739

Next, we will predict the temperatures with 95% confidence interval for future 10 years, and combine the predicted data with the historical data.

Code
# Predict temperatures for future years
future_years <- c(2033, 2043, 2053, 2063, 2073, 2083, 2093, 2103, 2113, 2123)
predicted_temperatures <- predict(model, newdata = data.frame(Year = future_years),interval = "confidence")

# Combine historical and predicted data
hist_temperatures <- cbind(my_sum$mean, my_sum$mean-1.96*my_sum$se, my_sum$mean+1.96*my_sum$se )
all_years <- c(my_sum$Year, future_years)
all_temperatures <- rbind(hist_temperatures, predicted_temperatures)
all_data <- data.frame(Year = all_years, all_temperatures)

knitr::kable(all_data,row.names = FALSE, format = 'html')
Year fit lwr upr
1983 26.45161 26.21589 26.68733
1993 26.20645 25.90621 26.50669
2003 26.66129 26.36857 26.95401
2013 27.04516 26.69432 27.39600
2023 26.53548 26.07903 26.99193
2033 26.88194 25.86279 27.90108
2043 26.98258 25.67888 28.28628
2053 27.08323 25.48653 28.67992
2063 27.18387 25.28964 29.07810
2073 27.28452 25.09007 29.47897
2083 27.38516 24.88877 29.88155
2093 27.48581 24.68631 30.28530
2103 27.58645 24.48303 30.68987
2113 27.68710 24.27915 31.09505
2123 27.78774 24.07481 31.50067

Now we are ready to create the visualization.

Code
p<-ggplot(all_data, aes(x = Year, y = fit)) +
  geom_errorbar(
    aes(ymin=lwr, 
        ymax=upr), 
    width=0.2, 
    colour="black", 
    alpha=0.9, 
    size=0.5) +
  geom_point_interactive(aes(y=all_data$fit), 
           tooltip = paste("Year:", all_data$Year, 
                           "<br>Mean Temperature:", round(all_data$fit, digits = 2),
                           "<br>95% CI:[", 
                            round(all_data$lwr, digits = 2), ",",
                            round(all_data$upr, digits = 2),"]"),
           stat="identity", 
           color="red",
           size = 1.5,
           alpha=1) +
  geom_smooth(method = "lm", se=FALSE) + 
  labs(title = "Historical and Predicted Mean Temperatures",
       x = "Year",
       y = "Mean Temperature (°C)") +
  theme_minimal()

girafe(                                
  ggobj = p,                       
  width_svg = 8,                         
  height_svg = 8*0.618,
  options = list(
         opts_hover(css = "stroke-width:1"),
         opts_hover_inv(css = "opacity:0.2;")
         )) 

4. Conclusion

The data presented in the above plots clearly demonstrate a consistent increase in temperature over the past five decades. The upward trend of mean temperatures for January is evident, showing a rise from 26.45°C in 1983 to 27.05°C in 2013, indicating an average annual increase of approximately 0.15°C.

Drawing from this observed pattern, predictive models project that by 2123, the monthly mean temperature is anticipated to reach 27.79°C, with an upper boundary of 31.5°C. This suggests a projected temperature escalation of roughly 1.25°C to 4.51°C over the next decade, slightly below the previously claimed range of 1.4°C to 4.6°C.