+ - 0:00:00
Notes for current slide
Notes for next slide

What this means is that we are also in different places in our R journey - from beginner to advanced.

Can we do this in R?

Answering questions about air quality, one question at a time

Meenakshi Kushwaha

ILK Labs

6th July, 2021

1 / 35

Who we are?

Multidisciplinary team

2 / 35

What this means is that we are also in different places in our R journey - from beginner to advanced.

What we do?

We measure air quality to answer questions like

  • Is one block dirtier than other?
  • Is one season different than other?
  • Do low cost instruments answer questions reliably?
  • Can we adapt cutting edge methods from different contexts for India?

Also...

  • National level surveys
  • Model building with satellite and ground data
3 / 35

2 types of Air Quality Measurements

Staionary monitoring

Stationary reference grade monitor

4 / 35

2 types of Air Quality Measurements

Staionary monitoring

Stationary reference grade monitor

Mobile monitoring

Sensors in a car

4 / 35

Typically, we have two main types of air quality measurements.

Stationary Monitoring

Network of air quality sensors in Bangalore, India

5 / 35

This is what stationary monitoring network looks like. Multiple sensors recording data at real time ~ 1 min frequency. Gives information about trends at the city scale

Mobile Monitoring

Individual instruments for each parameter

6 / 35

For a neighborhood level measurement, we use mobile monitoring. This schematic shows the many instruments that go in the mobile platform. You can imagine the complexity, diff instruments, have different download methods, from the point of view of analysis - different data formats, different time stamps, etc.

This is what it looks like in practise

Internal set up in the mobile monitoring platform

7 / 35

We are looking at the inside of the car now. We have a laptop as a logger for some instruments. And all insruments are secured in the blue tray with inlets outside the vehicle. This is what looks like in pracise,...

and, sometimes like this...

Unplanned and unanticipated road blocks

8 / 35

All that to say that many times, things are out of our control in the field. Next,

Working in R allows us to control all aspects of data analysis

9 / 35

I will talk about some of our favourite R packages and functions and what we use them for

Data Cleaning

10 / 35

We use all tidyverse packages for data cleaning. Ggplot for highly flexible plotting, purrr and map functions for more efficient and faster code instead of for loops, and forcats for functions. Readr, read_csv automatically parses date-time objects, that is is very helpful

Consistent timestamps with lubridate

Data from different instruments often have different...

  • Parsing date-times with ymd_hms(), dmy_hms()...
  • Assigning time zone with with_tz()
11 / 35

Step 1 is consistent time stamps so that we can join all data sets together. Some instruments are either in UTC time zone, or depending on the country of origin in a diff time zone.

Consitent column names with janitor

## [1] "Elevation" "Year"
## [3] "Month" "Day"
## [5] "Season" "Julian Day"
## [7] "PM2.5-11hrs" "PM2.5 -14hrs"
## [9] "PM2.5 -dailymean" "PM2.5-10-14 hrs mean"
## [11] "AOD-Terra" "AOD-Aqua"
## [13] "AOD-Terra-Aqua mean" "NDVI"
## [15] "CWV-Terra" "CWV-Aqua"
## [17] "CWV-Terra-Aqua mean" "2m Temperatue-11hrs"
## [19] "2m Temperature -14hrs" "2m Temperature-dailymean"
## [21] "2m Temperature-10-14 hrs mean"
library(janitor)
my_data <- my_data %>%
clean_names()
names(my_data)
## [1] "elevation" "year"
## [3] "month" "day"
## [5] "season" "julian_day"
## [7] "pm2_5_11hrs" "pm2_5_14hrs"
## [9] "pm2_5_dailymean" "pm2_5_10_14_hrs_mean"
## [11] "aod_terra" "aod_aqua"
## [13] "aod_terra_aqua_mean" "ndvi"
## [15] "cwv_terra" "cwv_aqua"
## [17] "cwv_terra_aqua_mean" "x2m_temperatue_11hrs"
## [19] "x2m_temperature_14hrs" "x2m_temperature_dailymean"
## [21] "x2m_temperature_10_14_hrs_mean"
12 / 35

There are several inconsistent features here. Use of dash vs space, colum names starting with numbers, capital letters vs small letters.

Interactive plots with plotly

plot1 <- mydf %>%
ggplot(aes(x=date, y=BC)) +
geom_line() +
theme_minimal() +
xlab(" ")
plot1

library(plotly)
ggplotly(plot1)
13 / 35

Interactive maps with leaflet

Neihborhood level pollution map

14 / 35

demo we are looking at averages over several rides, so these are stable maps. You may recall the purpose of mobile monitoring is high resolution pollution maps combining location and pollution data

Statistical Modelling

  • Base R

    • linear, multilinear lm()
    • logit glm()
    • pca prcomp()
  • lme4 package

    • Linear mixed effect models
  • gam package
    • General additive models ]
15 / 35

https://davidcarslaw.github.io/openair/

openair

by
Carslaw
&
Ropkins

Pollutant concentration and wind direction

16 / 35

cool plot options.

Using shiny apps

To simplify repeat data analysis tasks

17 / 35

There are plenty of repeat tasks. Mapping each neighborhood 20+ times and then mapping several neighborhoods

https://github.com/meenakshi-kushwaha/mmaqshiny

Mobile monitoring

Data from multiple monitors

18 / 35

Option to upload each file and automatically joins them. There are alarm and settings so we know if something is wrong. and time series plots for daily qa/qc

https://github.com/adithirgis/pollucheck

Staionary data

Cleaning and analysing open source data

19 / 35

another shiny app that we have developed aims to simplify cleaning data from open access data from stationary monitors and automatically generate statistical plots.

Elevator Pitch

6th July

6:30 to 8:00 pm IST

Pitch no - 26

20 / 35

Beyond data analysis

Organizing and sharing using R

21 / 35

R projects

+

version control

+

README

  • Create an R studio project for each new project
  • Use github for version control
  • Initiate with a README file that contains
    • Project description
    • File name descriptions
    • Other metadata
22 / 35

project description could be collection dates, locations, etc.

Reporting with R Markdown

  • Code and outputs side by side

  • Facilitates sharing and reviewing

  • Repdroduce analysis easily

  • Different output formats possible

a <- 2
I have `r a` cats and `r a+1` dogs.

I have 2 cats and 3 dogs.

---
title: "TOC"
output:
html_document:
toc: true
toc_float: true
---

Example of Rmarkdown report with floating toc

Individual diagnostic reports using parameters

---
title: "Diagnostic Report"
output: html_document
params:
year: 2021
region: Site_1
data: file.csv
---

Use parameterized rmarkdown reports for each sensor

23 / 35

What we have learnt from the R community

We did not set out to learn these lessons

But we are so glad we did!

24 / 35

Lesson 1

Importance of Community

25 / 35

If someone had said, join R because of amazing community, i would never have believed it. I just needed technology to learn to make cool plots.

Slack channels

Book clubs

Meet ups

26 / 35

Lesson 2

Using artwork for teaching

27 / 35

https://github.com/allisonhorst/stats-illustrations

Fuzzy R monsters

By
Allison Horst

28 / 35

https://tinystats.github.io/teacups-giraffes-and-statistics/index.html

Teacup Giraffes

By Walum & Leon

"A delightful series of modules to learn statistics and R coding for students, scientists, and stats-enthusiasts"

29 / 35

Lesson 3

Open source and collaboration

30 / 35

Coding practices for easier collaboration

  • Using projects and version control for organizing

    • Descriptive README files
  • Following tidyverse style guide for code

    • Consistent naming for functions and objects
  • Commenting for "why" of code and new packages

  • Using relative file paths and avoid hard-coding file locations

    • Use package here instead of setwd()

31 / 35

What next?

  • Bookdown project

    • Organizing field protocols and code
  • Tidymodels

  • Accessibility

32 / 35

Now that we have the lu

Resources used

33 / 35

Research Partners

34 / 35

Thank You!

Twitter:
envhealthspeak

Github: meenakshi-kushwaha

Email: meenakshi@ilklabs.com

35 / 35

Who we are?

Multidisciplinary team

2 / 35

What this means is that we are also in different places in our R journey - from beginner to advanced.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow