class: center, middle, title-slide # Can we do this in R? ## Answering questions about air quality, one question at a time ### Meenakshi Kushwaha ### ILK Labs ### 6th July, 2021 --- # Who we are? ![Multidisciplinary team](img/team.png) ??? What this means is that we are also in different places in our R journey - from beginner to advanced. --- # What we do? ### We measure air quality to answer questions like - Is one block dirtier than other? - Is one season different than other? - Do low cost instruments answer questions reliably? - Can we adapt cutting edge methods from different contexts for India? ### Also... - National level surveys - Model building with satellite and ground data ??? --- # 2 types of Air Quality Measurements .left-column[ ###Staionary monitoring ![Stationary reference grade monitor](img/bam.png) ] -- .right-column[ ###Mobile monitoring ![Sensors in a car](img/google-street-view.png) ] ??? Typically, we have two main types of air quality measurements. --- # Stationary Monitoring ![Network of air quality sensors in Bangalore, India](img/pa.png) ??? This is what stationary monitoring network looks like. Multiple sensors recording data at real time ~ 1 min frequency. Gives information about trends at the city scale --- # Mobile Monitoring ![Individual instruments for each parameter](img/setup.png) ??? For a neighborhood level measurement, we use mobile monitoring. This schematic shows the many instruments that go in the mobile platform. You can imagine the complexity, diff instruments, have different download methods, from the point of view of analysis - different data formats, different time stamps, etc. --- .left-column[ ###This is what it looks like in practise ] .right-column[ ![Internal set up in the mobile monitoring platform](img/cng.jpg) ] ??? We are looking at the inside of the car now. We have a laptop as a logger for some instruments. And all insruments are secured in the blue tray with inlets outside the vehicle. This is what looks like in pracise,... --- .left-column[ ###and, sometimes like this... ] .right-column[ ![Unplanned and unanticipated road blocks](img/challenges.jpg) ] ??? All that to say that many times, things are out of our control in the field. Next, --- class: top, right background-image: url(img/image-from-rawpixel-id-3049987-jpeg.jpg) baciground-size: cover .pull-right[ ## Working in R allows us to control all aspects of data analysis ] ??? I will talk about some of our favourite R packages and functions and what we use them for --- .left-column[ #Data Cleaning ![](img/tidyverse_logo.png) ] .right-column[ ![](img/tidyverse.png) ] ??? We use all tidyverse packages for data cleaning. Ggplot for highly flexible plotting, purrr and map functions for more efficient and faster code instead of for loops, and forcats for functions. Readr, read_csv automatically parses date-time objects, that is is very helpful --- # Consistent timestamps with `lubridate` .left-column[ ![](img/lubridate.jpeg) ] .right-column[ Data from different instruments often have different... ![](img/data_issues.png) - Parsing date-times with `ymd_hms()`, `dmy_hms()`... - Assigning time zone with `with_tz()` ] ??? Step 1 is consistent time stamps so that we can join all data sets together. Some instruments are either in UTC time zone, or depending on the country of origin in a diff time zone. --- #Consitent column names with `janitor` .panelset[ .panel[.panel-name[Before] ``` ## [1] "Elevation" "Year" ## [3] "Month" "Day" ## [5] "Season" "Julian Day" ## [7] "PM2.5-11hrs" "PM2.5 -14hrs" ## [9] "PM2.5 -dailymean" "PM2.5-10-14 hrs mean" ## [11] "AOD-Terra" "AOD-Aqua" ## [13] "AOD-Terra-Aqua mean" "NDVI" ## [15] "CWV-Terra" "CWV-Aqua" ## [17] "CWV-Terra-Aqua mean" "2m Temperatue-11hrs" ## [19] "2m Temperature -14hrs" "2m Temperature-dailymean" ## [21] "2m Temperature-10-14 hrs mean" ``` ] .panel[.panel-name[After] ```r library(janitor) my_data <- my_data %>% clean_names() names(my_data) ``` ``` ## [1] "elevation" "year" ## [3] "month" "day" ## [5] "season" "julian_day" ## [7] "pm2_5_11hrs" "pm2_5_14hrs" ## [9] "pm2_5_dailymean" "pm2_5_10_14_hrs_mean" ## [11] "aod_terra" "aod_aqua" ## [13] "aod_terra_aqua_mean" "ndvi" ## [15] "cwv_terra" "cwv_aqua" ## [17] "cwv_terra_aqua_mean" "x2m_temperatue_11hrs" ## [19] "x2m_temperature_14hrs" "x2m_temperature_dailymean" ## [21] "x2m_temperature_10_14_hrs_mean" ``` ] ] ??? There are several inconsistent features here. Use of dash vs space, colum names starting with numbers, capital letters vs small letters. --- # Interactive plots with `plotly` .panelset[ .panel[.panel-name[ggplot] .pull-left[ ```r plot1 <- mydf %>% ggplot(aes(x=date, y=BC)) + geom_line() + theme_minimal() + xlab(" ") ``` ] .pull-right[ ```r plot1 ``` ![](index_files/figure-html/unnamed-chunk-6-1.png)<!-- --> ] ] .panel[.panel-name[plotly] .left-column[ ```r library(plotly) ggplotly(plot1) ``` ] .right-column[
] ] ] --- # Interactive maps with `leaflet` ![Neihborhood level pollution map](img/leaflet.png) ??? demo we are looking at averages over several rides, so these are stable maps. You may recall the purpose of mobile monitoring is high resolution pollution maps combining location and pollution data --- # Statistical Modelling - Base R - linear, multilinear `lm()` - logit `glm()` - pca `prcomp()` - `lme4` package - Linear mixed effect models - `gam` package - General additive models ] --- class: middle .footnote[https://davidcarslaw.github.io/openair/] .left-column[ ###`openair` by Carslaw & Ropkins ] .right-column[ ![Pollutant concentration and wind direction](img/calendar_plot.png) ] ??? cool plot options. --- class: middle, center background-image:url(img/image-from-rawpixel-id-2809525-jpeg.jpg) background-size: contain # Using shiny apps ## To simplify repeat data analysis tasks ??? There are plenty of repeat tasks. Mapping each neighborhood 20+ times and then mapping several neighborhoods --- .footnote[https://github.com/meenakshi-kushwaha/mmaqshiny] .left-column[ ##Mobile monitoring ###Data from multiple monitors ] .right-column[ ![](img/mmaqshiny.png) ] ??? Option to upload each file and automatically joins them. There are alarm and settings so we know if something is wrong. and time series plots for daily qa/qc --- .footnote[https://github.com/adithirgis/pollucheck] .left-column[ ## Staionary data ###Cleaning and analysing open source data ] .right-column[ ![](img/pollucheck.png) ] ??? another shiny app that we have developed aims to simplify cleaning data from open access data from stationary monitors and automatically generate statistical plots. --- .left-column[ # Elevator Pitch ### 6th July ####6:30 to 8:00 pm IST Pitch no - 26 ] .right-column[ ![](img/adithi_talk.png) ] --- class: inverse, center, bottom background-image: url(img/image-from-rawpixel-id-3286706-original.jpg) background-size: cover # Beyond data analysis ## Organizing and sharing using R --- .left-column[ #R projects + ##version control + ###README ] .right-column[ ![](img/rstudio-project-1.png) - Create an R studio project for each new project - Use github for version control - Initiate with a README file that contains - Project description - File name descriptions - Other metadata ] ??? project description could be collection dates, locations, etc. --- # Reporting with R Markdown .panelset[ .panel[.panel-name[Why?] - Code and outputs side by side - Facilitates sharing and reviewing - Repdroduce analysis easily - Different output formats possible ] .panel[.panel-name[In line text] ```r a <- 2 ``` ```r I have `r a` cats and `r a+1` dogs. ``` I have 2 cats and 3 dogs. ] .panel[.panel-name[Table of contents] .pull-left[ ```r --- title: "TOC" output: html_document: toc: true toc_float: true --- ``` ] .pull-right[ ![Example of Rmarkdown report with floating toc](img/toc.png) ] ] .panel[.panel-name[Parameterized reports] .pull-left[ Individual diagnostic reports using parameters ```r --- title: "Diagnostic Report" output: html_document params: year: 2021 region: Site_1 data: file.csv --- ``` ] .pull-right[ ![Use parameterized rmarkdown reports for each sensor](img/pa.png) ] ] ] --- class: middle, center, inverse background-image: url(img/image-from-rawpixel-id-2294530-jpeg.jpg) background-size:cover # What we have learnt from the R community ### We did not set out to learn these lessons #### But we are so glad we did! --- class: middle, center #Lesson 1 ## Importance of Community ![](img/image-from-rawpixel-id-431441-jpeg.jpg) ??? If someone had said, join R because of amazing community, i would never have believed it. I just needed technology to learn to make cool plots. --- .left-column[ ## Slack channels ## Book clubs ## Meet ups ] ![](img/Rladies.jpeg) ![](img/MiR.jpeg) ![](img/tidytuesday.png) --- class:middle, center background-image: url(img/image-from-rawpixel-id-594536-png.png) background-size:cover #Lesson 2 ##Using artwork for teaching --- .footnote[https://github.com/allisonhorst/stats-illustrations] .left-column[ # Fuzzy R monsters By Allison Horst ] .right-column[ ![](img/dplyr_mutate.png) ] --- .footnote[https://tinystats.github.io/teacups-giraffes-and-statistics/index.html] .left-column[ # Teacup Giraffes By Walum & Leon "A delightful series of modules to learn statistics and R coding for students, scientists, and stats-enthusiasts" ] .right-column[ ![](img/teacup.png) ] --- background-image: url(img/image-from-rawpixel-id-2466603-jpeg.jpg) background-size:cover class: middle, center #Lesson 3 ## Open source and collaboration --- # Coding practices for easier collaboration .pull-left[ - Using projects and version control for organizing - Descriptive README files - Following tidyverse style guide for code - Consistent naming for functions and objects - Commenting for "why" of code and new packages - **Using relative file paths** and avoid hard-coding file locations - Use package `here` instead of `setwd()` ] .pull-right[ ![](img/here.png) ] --- background-image: url(img/image-from-rawpixel-id-430321-jpeg.jpg) background-size: cover # What next? .pull-left[ - Bookdown project - Organizing field protocols and code - Tidymodels - Accessibility ] ??? Now that we have the lu --- # Resources used .pull-left[ - R packages - [xaringan](https://github.com/yihui/xaringan) - [xaringanextra](https://github.com/gadenbuie/xaringanExtra) - Images from [rawpixel](https://www.rawpixel.com/) - Logos from [The Noun Project](https://thenounproject.com/) ] .pull-right[ ![](img/image-from-rawpixel-id-523356-jpeg.jpg) ] --- # Research Partners ![](img/research partners.png) --- .left-column[ #Thank You! Twitter: [envhealthspeak](https://twitter.com/envhealthspeak) Github: [meenakshi-kushwaha](https://github.com/meenakshi-kushwaha) Email: meenakshi@ilklabs.com ] .right-column[ ![](img/code_hero.jpg) ]