R project using two data sets.
I have specific data sets I would prefer we use. More information uploaded below but...my overall vision is to use California data from CalAdapt (could use from https://github.com/ucanr-igis/caladaptr/) to get historical baseline data and future projected data on extreme heat in California then use data from EPA EJ screen to overlay socioeconomic factors of age and low-income status.
I also uploaded a classmates presentation with some examples of graphs that others are using. I will do the written portions of the assignment
R Project outline
Overall goal: Use historic, current, and projected data of extreme heat in California, overlaying data about socioeconomic status and age (over 65) to find if there’s any correlation between age, socioeconomic status, and current/projected extreme heat
Data 1: https://cal-adapt.org/data/download/ :The data sets below will be used to build the spatial map
· Use this: https://ucanr-igis.github.io/caladaptr/
Gridded observed meteorological data derived products: Historical observed daily temperature data from approximately 20,000 NOAA Cooperative Observer (COOP) stations form the basis of this gridded dataset from 1950–2013 at a spatial resolution of 1/16º (approximately 6 km). Observation-based meteorological data sets offer insights into changes to the hydro-climatic system by diagnosing spatio-temporal characteristics and providing a historical baseline for future projections. Details are described in Livneh et al., 2015.
LOCA derived products: Datasets created from LOCA downscaled CMIP5 climate projections for Cal-Adapt tools. These currently include the modeled annual variability envelope (maximum and minimum from range of annual average values from all 32 GCMs); precalculated data tables of extreme heat counts for California counties and census tracts for 4 priority models and 2 scenarios.
LOCA downscaled CMIP5 climate projections: Daily climate projections for California at a resolution of 1/16° (about 6 km, or 3.7 miles) generated to support climate change impact studies for California’s Fourth Climate Change Assessment. The data, derived from 32 coarse-resolution (~100 km) global climate models from the CMIP5 archive, were bias corrected and downscaled using the Localized Constructed Analogues (LOCA) statistical method. The data cover 1950-2005 for the historical period and 2006-2100 (some models stop in 2099) for two future climate projections.
Data 2: https://www.epa.gov/ejscreen/download-ejscreen-data
· EJ screen package; https://github.com/ejanalysis/ejscreen
· This will provide contains data on demographics that will be relevant such as ago, and low-income status
Wildfire Occurrence and Drought Severity in CA during 2020
EPM Graduate student
ESP 106 Final WQ 2022
ESP 106 Final Project Guidelines and Rubric
For the final project, you should identify one or two interesting data-sets relevant to a question or problem you are interested in. You will use your R coding skills to analyze the data and generate some insights to your problem. You should turn in both the write-up and the R code (either separately or in a combined Rmarkdown file). You can work in pairs or individually.
· Description of the background to the question / area you are interested in (3-4 paragraphs with references)
· Description of the dataset and exploratory data analysis (variable distribution, correlation etc), (~1-3 plots OR data tables plus 3-4 paragraphs).
· More detailed plots / statistical analysis (3-4 plots OR regressions OR tables + about 6-7 paragraphs of writing)
You will want to identify the data you want to work with EARLY. Don’t under-estimate how much time it takes to get data into a useable format that you can start working with.
A couple sources of interesting data-sets:
· Tidy Tuesday Project: https://www.tidytuesday.com/
· Our World in Data: https://ourworldindata.org/
· WorldClim (Climate Data): https://www.worldclim.org/data/worldclim21.html
· EJ Screen (Pollution Exposure and Socio-Demographic Data in the US): https://www.epa.gov/ejscreen
R packages focused on accessing particular data types or datasets such as geodata (spatial data), tidycensus (US census data) , cdlTools (USDA Datasets) … , and many more
Research data at https://dataverse.harvard.edu/ and similar repositories
Does Not Meet Expectations