EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 1
APPLICATIONS IN GEOGRAPHIC INFORMATION SYSTEMS (GIS)
FINAL PROJECT: PREDICTING PLANT DISTRIBUTIONS USING CLIMATIC DATA - COSTA RICA
Plant voucher specimens exist in herbaria around the world and provide a reliable and verifiable record
of plant distribution. Often botanists use "locational" maps, created by displaying point locations of
voucher specimens, to produce distribution maps. Yet, these maps may overestimate or underestimate
a species’ range due to a number of problems. First, most of these maps are depicted as solid polygons
and thus assume that all habitats within the polygon are suitable. In general, there are parts of the
range that are suitable and parts that are not suitable due to heterogeneity in environmental
conditions. However, in some cases ranges might be underestimated in cases where the species has
been under-collected and the true range extends well beyond the borders. GIS offers tools to improve
prediction of species ranges by using locality data together with climatic data. When such data are
taken together with historical data (e.g., physical or biotic barriers), the possibility of creating more
accurate depictions of species ranges are enhanced.
In this project, you will use data taken from the Missouri Botanical Garden databases (generously
provided by Dr. Robert Magill, Director of Research at the Missouri Botanical) and environmental data
taken WORLDCLIM (www.worldclim.org/version2) to model distributions of selected plant taxa in
Costa Rica. In addition, these modeled distributions will be combined to display species richness
patterns for this taxa and to help identify sites that may have been under-collected. To conduct this
project we need to assume that plant distributions can be predicted from existing climatic data and
these climatic maps are accurate. The use of climatic data to define vegetation communities is not
new. As pointed out in Skov and Borchsenius (1997), Leslie Holdridge (1967) used three environmental
characters (mean annual temperature, precipitation, and humidity) to define major life zones. Despite
the fact that others have used such climatic data to model plant and animal distributions, you should
discuss the underlying assumptions of doing so in your paper.
Provided are many data files that can be used in the analysis; however, feel free to expand this with
other data sets as you need to address your questions. The outlined questions are to be used as a
guide versus a step-by-step instruction of what is required in each step. That is, you can expand the
questions, shift the approach, or focus on specific issues. The key point is to keep in mind is that this is
not just making a map to communicate a pattern in the area but instead must include spatial analyses
using ArcMap and numerical summaries of the data. It is suggested you do at least 2-3 sections that
are outlined below; however, this can shift depending on the modifications you make to each section.
Please make sure you discuss the details of your various analyses with the professor and TA to make
sure you are going far enough with both the spatial analyses and interpretations!
Modeling Plant Distributions in Costa Rica with Bioclimatic Data
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 2
To obtain the data:
1. Copy the files from directory, open a new document in ArcMap, and add the layers.
2. Add the table Arecaceae.xls from your directory, display the points, and save it to a shape file. Then
sort, select, and save out the individual species you are interested in working with. See below for
3. Remember to set your Geoprocessing – Environments. Set the Working Directory to your folder,
set your Processing Extent and Geoprocessing Mask to match the Costa Rica boundary level, and
set your cell size to match your bioclimatic data.
Data files located on WashU’s Box in the ‘finalproject/costarica’ folder:
LAYER NAME DESCRIPTION
• utilities networks
• transportation structure
(DCW version 5 – www.mapability.com)
rivers (see www.naturalearthdata.com and www.diva-gis.org/data
lakes (see www.naturalearthdata.com and www.diva-gis.org/data
protected areas of Costa Rica (see www.protectedplanet.net for
more details on the World Database on Protected Areas)
cr_alt altitude: Elevation (m) above sea level. Data is from SRTM and has a
cell size is 0.0083333 dd x 0.0083333 dd (~1km x 1km)
(see www.worldclim.org for details).
cr_bio_1 thru cr_bio_19 bioclimatic variables: Cell size is 0.0083333 dd x 0.0083333 dd (~1km
x 1km) (see www.worldclim.org/version2) for details).
Arecaceae.xls Excel file of palm locations from MOBOT Tropicos database
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 3
The map units for the data frame should be set to decimal degrees.
You will need to manipulate the plant data to transfer it from a dbase file to a layer in ArcMap. To do
this, add the Arecaceae.xls table to the new document. From within the Table of Contents (TOC), right-
click on the layer and select Display XY Data and use Arecaceae.xls as the table and “Longitude” and
“Latitude” as the X-coordinate and Y-coordinate, respectively. Latitude and longitude are in decimal
degrees and will default to a GCS_WGS84 projection. Display the new plant layer and "clean up" any
voucher records that appear inaccurate. You may also wish to delete other taxa from the plant layer
that will not be a part of your project.
It is important to know that this process does not mean that the rest of the data is “inaccurate”. These
are part of a massive dataset which is constantly being updated and corrected. When you download
any data from the web, this is part of double checking it to make sure it makes sense and using your
knowledge to make “judgment calls”.
Note: On occasion only a genus name is listed in the taxon record or even no name at all. This means
that either the specimen is only known to genus level, or only known to some higher taxonomic level.
You do not want to use these records, unless perhaps you are modeling distribution of a genus (but
beware as this contains unknown sets of taxa and may not be representative of the entire genus!!).
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 4
I. Modeling Plant Species Distribution using Environmental Data
The Costa Rica plant table (Arecaceae.xls) contains only records of the palm family Arecaceae from the
Missouri Botanical Garden database. For this section, select either (1) 6-8 relatively common species
within the genus Chamaedorea or (2) 6-8 common species (10 records each) from across any genera in
which to model plant distributions. We have selected palms because of their importance in most
tropical ecosystems and their economic and ecological value. Your selection can also be looking at
those that are narrowly distributed versus broadly distributed.
With the available data, answer the following questions:
What are the potential and probable distributions of your selected palm species?
The actual distribution range is often smaller than the potential range due to historical events, biotic
interactions, anthropogenic changes and habitat loss, or random extinction over parts of the species’
original range. Therefore, you will examine both potential and probable distributions following
methods outlined in Skov and Borchsenius (1997). You have some leeway in deciding which climatic
variables you will include in your species distribution models. It will be your decision to decide which of
these environmental variables to include in your model. I would suggest using at least 3 to 5 variables
(do not hesitate to discuss this with the Professor or TA).
After selecting environmental variables and plant species, you can model potential distributions by
overlaying plant collections on environmental variables to determine the range of conditions in which
the plant species is found. Potential distributions can then be defined by selecting all cells within the
specified ranges of the environmental variables. Label cells in the resulting map as either "0" (outside
potential distribution) and "1" (inside potential distribution).
To determine probable distribution maps you will follow the methods of Skov and Borchsenius (1997).
In this method, you need to determine how far each cell in the potential distribution maps is from
known localities. Probable distributions, then, assume that a likelihood of encountering a species in
any given cell with suitable habitat will increase the closer that cell is to a known locality (collection
site). In this case, using only cells inside the potential distribution layer (where cell value = 1), give
cells new values based on the distance from a known locality. The range of this new value should go
from 0 to 100 where 100 is in the same cell as locality, 99 is 1 km away, 98 is 2 km away, …., 1 is 100
km away, and 0 are any cells that are more than 100 km away from a known collection. The 100 km
limit is based on the size of the country, and is really quite arbitrary. Cells in the probable distribution
layer, then, have values from 0 to 100 with higher values being located closer to the collection locality.
You can have some leeway in calculating probable distribution maps if you prefer. For example, you
might consider existing physical barriers in Costa Rica when modifying potential distributions. One
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 5
example, is that the high Talamanca mountains in the southern part of the country have been
identified as a barrier to dispersal of lowland forest species found on Pacific and Atlantic slopes. If
there are no known collections on one slope (e.g., Pacific), you might want to reclassify all cells on that
slope to "0" when creating probably distribution maps.
Note: It is important that you set the extent to the shapefile of your country. You can also set a mask to
be the shapefile of your country, so that you do not need to go back and “extract by mask” your final
raster grids. With both the probably and potential distributions, you want to make sure your new grid
extends over the entire country so that there are no datum conflicts when you start to combine layers.
For example, if the probable distribution only extends to 100km, you will have areas with no data
>100km that cannot be combined or compared to the potential distribution. In contrast, if you have a
grid that extends over the entire country, you can reclassify it to fit your criteria with no datum
Note: You do not need to type in the individual values for 1km or 10km breaks but instead explore the
option to use “defined interval” under the classification options.
Note: Remember this is your project, so if you wish to explore other avenues to do this data processing
you can. For example, you could explore using MaxEnt software, which is routinely used for species
habitat modelling (http://www.cs.princeton.edu/~schapire/maxent/). This software does not go
outside of the skills you have learned and requires ArcMap for viewing and analyzing the output.
Another idea is to compare the two approaches, Skov and Borchsenius (1997) with MaxEnt.
Reference: Skov, F. and F. Borchsenius. 1997. Predicting plant species distribution patterns using
simple climatic parameters: a case study of Ecuadorian palms. Ecography 20:347-355.
To help answer this question:
(a) Produce a summary table that shows the specified ranges of environmental variables in which
each species is known to occur.
(b) Produce layout(s) showing potential and probable distribution maps for each plant taxa.
(c) Produce a summary table of how potential distribution varies across the species in terms of total
area versus percentage relative to total country area. Are there patterns among the data for the
species in terms of narrow/broad distributions or few/many points?
(a) Optional: Produce a layout depicting how distribution models might be used to aid in
identification of unknown collections.
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 6
To do this: For a set (n=10 records) of unknown species within the selected genera (i.e., records having
only genera name and not specific name), use the modeled distributions to help narrow the set of
species that might be represented by this collection.
II. Estimating Collection Biases by Combining Potential and Probable Species Richness
In this section, answer the following questions:
What are the geographic patterns of species richness for palms modeled in this project?
Where are potentially poorly collected sites in Costa Rica?
Species diversity patterns of groups change across geographic and environmental gradients. To
estimate palm species patterns in Costa Rica using your subset of data, produce species richness maps
based on potential and probable distribution models. For potential species richness, simply combine
the distribution models. For probable species richness maps, combine all probable distributions and
divide resulting grid cells by the maximum (e.g., 100).
Probable and potential species richness maps can then be used to identify areas that have apparently
been poorly collected (at least for these taxa). To do this, examine the ratio of potential and probable
species distributions. Low ratio values mean that the potential species richness total is closer to
probable species richness total and suggests that the region has been well collected. High ratios, on
the other hand, have greater number of species based on potential distributions than based on
probable distributions. This indicates that the region may be poorly collected.
Note: Your probable and potential species richness maps are both integer grids; therefore, you must
convert them to a float when you do the collection bias calculation. To do this, you will just add the
“Float” command (available in the upper-right box in raster calculator) prior to each of your grids, so
that it is [Float (grid name)/Float (grid name)] for the equation. This will allow you to have decimals in
the final collection bias.
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 7
To help answer these questions:
(a) Produce a layout showing potential and probable species richness maps to highlight geographic
patterns of species richness. Summarize how these geographic patterns relate to % areas for the
different levels of species richness.
(b) Produce a layout depicting relative collecting "success". Show areas as "well collected",
"relatively well collected", "relatively poorly collected", and "poorly collected".
(c) Produce a table or chart that summarizes the area represented by the 4 collection groups.
III. Protected Areas and Palm Richness
In this section, answer the question:
How well do protected areas serve to "capture" palm species richness?
Costa Rica has been viewed as an international model for conservation as a relatively large proportion
of the country is under some form of protection. However, national parks are established for many
reasons (e.g., to protect one’s boundaries, watershed protection, etc.) and may not be located in the
most strategic places in terms of biodiversity protection. Although the data set is extremely small,
examine how well these national parks do to protect palm species modeled in this project.
To help answer this question:
(a) Answer the following question: What is the average species richness in protected areas versus
nonprotected areas in Costa Rica? Compare this the proportion (% of area) of each species
richness in protected areas versus nonprotected areas.
(b) Produce a layout displaying protected areas of Costa Rica and identifying sites that might warrant
protection. What is the total of the new area that would be added and how does this capture
EnSt 380: Applications in GIS Final Project – Costa Rica Palms – Page 8
IV. Examining the Effect of Road Accessibility on Collection Success.
Is there a relationship between apparent collection success and accessibility of the area?
Collection success (as defined in section II above) might be influenced by a variety of factors, including
accessibility, interest of collector, habitat condition, distance from urban areas, political stability
among others. For example, natural areas closer to roads are hypothesized to be better collected than
areas further from roads. Moreover, collections might be concentrated in protected areas of Costa
Rica given that most natural habitats are now restricted to these sites given large anthropogenic
influences on the landscape. Explore collection effort as a function of accessibility to roads using areas
defined by the ratio of potential and probable species richness.
To help answer this question:
(a) Provide an answer to the following question: Does the average distance from a road differ
among each collection effort groups as defined above?
(b) Produce a layout showing relationship between collection effort and accessibility from roads.
Define accessibility into 4 classes:
High within 500 m of roads
Medium between 500 m and 1 km of roads
Low between 1 km and 5 km of roads
Very low more than 5 km of roads
V. Planning for the Future
Discuss potential reasons for the patterns of species richness and species distributions that you
observed in Costa Rica. Discuss the efficacy and limitations of the distribution models you produced.
Based on your own opinions and analyses, what recommendations would you give regarding further
collection efforts in Costa Rica and strategies for protecting both individual palm species and palm
"hot-spots" (areas with high palm species richness).
EnSt 380: Applications in GIS Final Project – General – Page 1
APPLICATIONS IN GEOGRAPHIC INFORMATION SYSTEMS (GIS):
Due (BOTH document and presentation materials): Monday, April 26th at the start of class.
This is true for EVERYONE, even if you are not presenting on the first day of class. No hardcopies
The project is worth 30 points (20 points for the presentation and 10 points for the paper) and
will be weighted to be worth 40% of your final grade.
Note: ALL presentations/documents are due on April 26th AND EVERYONE is expected to attend
all three presentation dates. So, please plan your holiday travels accordingly!
Please name all documents as LASTNAME-Presentation and LASTNAME–Paper. All maps, figures,
and charts should be included in the paper versus separate. Your final presentation and written
report must be copied to MyCanvas (https://MyCanvas.wustl.edu) by the start of class (April
The scope of the analyses will depend on the data and questions that you work with. No two
projects will be the same, even if they are working with the same data and the same set of
questions. So, it is impossible to list a minimum number of tools/analyses that need to be used!
The goal of this project is to show that you are able to take your analyses beyond those
conducted with the earlier assignments/projects. That is not to say you cannot follow similar
methods; however, the level of interpretation MUST go beyond any of these previous
assignments. For example, if you use a similar technique you must take the result and conduct a
secondary analysis or do a completely separate analysis. All analyses must go beyond a visual
assessment to include numerical summaries of the results, comparisons of the results against
other data, and a conclusion/interpretation of your findings. Please use the in-class work
sessions (or their office hours) to check in with the professor/TA to make sure things are on-
track and “enough” is being done for the analyses you are doing.
There will be in-class work time provided to work on the Final Project. Just like regular class
periods, attendance during these work sessions are considered mandatory and it is expected
you will work on this project versus other assignments. This time is to give you time to work on
the Final Project and achieve the level of analyses that are expected in the Final Project.
Remember, it is 40% of your grade, so take advantage of this time and the presence of your
professor/TA. This is the time to make sure your analyses and interpretations are going to the
level that is expected!
EnSt 380: Applications in GIS Final Project – General – Page 2
Paper: You will conduct the necessary spatial analyses using ArcMap and then prepare an 8-10-
page report (double-spaced, 1-inch margins, 12-point font, and scientific format) that
summarizes your findings. Graphs, tables, and maps do not count in the page limit. The Final
Project paper should be in scientific format, which follows the format of the papers we have
read during the semester. Make sure you cite things were needed and use proper scientific
format (see journal articles for examples).
General guideline to paper:
▪ Introduction where questions are raised and the conceptual model presented. This should
include information that sets up the project you are working with, including information on
the questions you are posing and the general approach you are taking. You can also
highlight the types of data you used but save the specifics of the data sources for the
▪ Methods where data variables are described, including sources, and the specific analytical
plan explained. This section should follow the scientific papers we have been reading in class
in terms of detail and information included. It should highlight the key tools you used and
how you worked with the data without providing details on every click you made. For
example, if applying the Euclidean Distance tool, what was your cell size, extent, and reclass
breaks versus range? It should NOT be about the colors you used on your map, labels for
the classes, or the number of data frames in your layout. When talking about your data, do
not use the names of your files (as these are assigned by the user and mean nothing to the
reader) but instead the source of the data and what type of data is covered in it (e.g.,
description, year). Using tables may help you organize your thoughts and ideas, especially
the reclassification and data types.
▪ Results/Discussion is where you need to explain what you found and how you interpret this
in the larger picture. Results are where you have a summary of what you found when you
applied your methods and are often in the format of tables/figures. Discussion is where you
explain these results and interpret them in the bigger picture and questions you are looking
to address. Subheadings can help you organize your results/discussion if there are multiple
questions you are addressing. These subheadings often follow subheadings you might have
used in the methods. Because no two projects are the same, it is impossible to provide a
clear outline of what to “write” but use the scientific papers we read in class as a guideline.
Also, use the in-class time to work with the professor/TA on your questions/ideas/outputs.
We will not “write” the paper for you but will try to provide you with guidance to
interpreting your results, which is the Discussion section of a paper.
When you work through your analyses, results, and discussion make sure that you about the
pros/cons of your method that you do not “throw out” your analyses/results when talking
about the cons. Remember to talk about the data you have and you can even talk about
data you want but put this in a framework that provides support to your results versus
discounting the quality of what you did. You want to highlight the reality and limitations of
EnSt 380: Applications in GIS Final Project – General – Page 3
things but you do not want this to be the “take-home” story in that what you did was
worthless and we should pay no attention to anything you found. Does that make sense? For
example, your data may be “older” but that does not mean it is bad. It might be all that is
available (e.g., 2010 census data in 2019) and you can use other years and general patterns
to predict future trends from these data.
▪ Literature Cited should include a list of the literature you refer to in the paper and should
follow scientific format that is what we read in the papers from class. For example, every
citation should have a year and if there are multiple authors it is referred to as “et al.”. We
are not using footnotes and web citations should be kept to a minimum. In addition, direct
quotes should be kept to a minimum or better yet completely avoided, as in most cases you
can put it in your own words and quotes are not needed. If you use outside sources for
information though they must be cited to avoid plagiarism.
• You will also make a 10-minute presentation on April 26th, April 28th, or May 3rd to
communicate your findings to your colleagues. The Final Project presentation is worth more
(2x) than the paper because this is where you will not only talk about the project
(introduction, methods, results/discussion) but will ALSO show your mapping skills. You
should not focus on a “Final Map” but instead show your various analyses plus your results
in a series of maps. The skills you have allow you to show things visually and support it with
things you discuss. Remember that maps are a great way to do a quick summary of your
methods, results, and conclusions; so, take advantage of the skills you developed in the first
assignments. Presentations will be timed and stopped when 10 minutes is reached whether
or not you are done. A good guideline is 1 slide/minute with more or less depending on how
much you pack into each slide.
General guideline to presentation:
• Follow the scientific format that you used in the paper: Introduction, Methods, and
• Remember, you are an “expert” at making maps now, so use this to your advantage to
communicate a lot of information about your methods and results effectively! Showing only
a “final map product” with no overview of the data and methods that went into your
analysis typically results in a confusing summary of overall approach, methods, and analyses.
I would recommend NOT doing this approach but instead showing the various steps you
used including data, methods, and outputs.
• When representing your maps in the presentation is it best to save them out as images (e.g.,
*.jpg or *.bmp) versus *.pdf. The latter typically give you a blurry image in the Powerpoint.
EnSt 380: Applications in GIS Final Project – General – Page 4
• You will be timed and STOPPED at 10-minutes. So, if you are only partially thru your
presentation that is ALL that will be included in your grade, NOT the slides that were not
covered. In addition, you will LOSE points if you are not able to complete your presentation
in the allotted 10 minutes, which ends up being a double mark against you because you will
not be able to complete explaining your project plus the time penalty! This stringent guide is
to keep things real, as at conferences or other presentations time limits are reinforced! I
would recommend using a guide of 1 minute per slide (so ~11 slides with 1 being the title
slide) and practicing your talk before class! This practicing is essential to make sure you take
advantage of the full 10 minutes versus cutting it too short and not covering enough.