Modeling Species Distributions in R

José Hidasi-Neto

Post-Doc em Ecologia e Evolu??o | Desenvolvedor de Jogos | Data Analyst | Unity, Unreal | Game Designer | Power BI | R, Python, C#, C++ | Brasileiro-Húngaro |

发布日期: 2024年9月23日

Hello there! My name is José Hidasi Neto and I'm here with a both theoretical and practical class in Ecological Modelling. Share this with friends, students, advisors, and let's go.

Summary:

First Steps
Modelling the Distributions of Species and Communities under Distinct Climate Scenarios
BIOENSEMBLES: The Usage of Ensembles in Ecological Modelling

1. First Steps

Today we'll learn a bit about species distribution modeling in R. In this section, we'll see how to overlay species occurrences into a map, how to import species occurrence data from the internet (from GBIF network), how to?overlay abiotic conditions (like lower or higher temperatures) into a map (i.e. how to work with raster files), and how to model (and predict) a species distribution according to BIOCLIM?method. First, let us understand what is to "model a species distribution". We must first understand that this distribution we are talking about is a species' potential distribution (where we can possibly find some individuals of a specific species). Here I have an image to exemplify the idea of this process of distribution modeling:

We have a region with different types of habitats (Mountain, Grassland and Woodland). On the left we can observe that researchers found a species' individuals mainly in mountains, but also in grasslands (in lower proportion). Based on this known occurrence, we can predict where we can find individuals of this species using a mathematical representation of its realized ecological niche. In other words, we can model the species' potential distribution. On the right we have this distribution, where regions in darker gray are the ones with higher possibility of finding individuals of the studied species. You can observe, for example, that individuals are not expected to occur in woodland. Indeed, no individuals were previously found in this type of habitat.

Everything seems great, right? And it indeed is. However, there are lots of methods to predict where a species can potentially occur. Here we'll learn first how to use BIOCLIM.

To know more about this subject, see: https://onlinelibrary.wiley.com/doi/10.1111/ddi.12144/pdf

For now on follow the next steps in R. I'll try to make things as easy as I can.

### After entering R, install and load the following packages:

install.packages("raster")
install.packages("rgdal")
install.packages("dismo")
install.packages("XML")
install.packages("maps")

library(raster)
library(rgdal)
library(dismo)
library(XML)
library(maps)

## MAPPING SPECIES OCCURRENCES # Download these two files (locations for two imaginary species: Great insectus and Greater insectus). Then, open them in Excel to see how they are structured (keep in mind that this is not the only possible structure). Optionally, you can add more lines into each file with new latitudinal and longitudinal values for each species.

Download:

https://www.mediafire.com/view/7imv28b0h6rq1gh/Great_insectus.csv

https://www.mediafire.com/view/xsbbbrllfkc9m4b/Greater_insectus.csv

# Load the 'Great_insectus' table
great<-read.csv(choose.files())

# Show a world map
map()

# Based on the loaded 'Great_insectus' table, locate where individuals were previously found. To do that, you must plot species occurrences according to their longitudinal ("lon") and latitudinal ("lat") values. Like this:

points(great$lon,great$lat,col="red",pch=19,xlab="Longitude",ylab="Latitude")

# Let us now zoom into South America (where most individuals are found). To do that, you must first plot species locations specifying longitudinal ("xlim") and latitudinal ("ylim") limits of South America.

plot(great$lon,great$lat,xlim=c(-90,-25),ylim=c(-60,20),col="red",pch=19,xlab="Longitude",ylab="Latitude")

# Then, you add the map image
map(add=T)

# Ok, now load the 'Greater_insectus' table
greater<-read.csv(choose.files())

# You can repeat what we did before but using the Greater insectus species' locations. You can even plot both species together by first plotting locations for Great insectus and then adding locations for Greater insectus:

# Great insectus
plot(great$lon,great$lat,xlim=c(-90,-25),ylim=c(-60,20),col="red",pch=19,xlab="Longitude",ylab="Latitude")

# Add Greater insectus
points(greater$lon,greater$lat,col="blue",pch=19,xlab="Longitude",ylab="Latitude")

# Plot map image
map(add=T)

# And you can even create a legend specifying what color corresponds to each species

legend("bottomright",legend=expression(italic("Great insectus"),italic("Greater insectus")),pch=16,col=c("red","blue"),box.col="grey",bg="grey95",text.col="black",title="Species")

## IMPORTING SPECIES OCCURRENCE DATA FROM THE INTERNET

# This is as easy as downloading an excel table from the internet and importing it in R. Actually, this is exactly what we are going to do: download and import a table with species locations (just as we did before). However, we are going to do that for any real species by using only one simple function: "gbif()".

# Let us use Eudocimus ruber as an example. If you want to use another species, just change "genus" and "epithet" names in the function's arguments.

ruber<-gbif("Eudocimus","ruber",download=T,geo=T)

# Doing that, we loaded a table from the internet (from GBIF network) with some locations for Eudocimus ruber. We can now plot its locations (just as we did before for Great insectus and Greater insectus).

# As you know, use the following code to plot its locations. Here we'll use South America, but you can specify other latitudinal and longitudinal limits if you want. You can, for example, see if this bird occurs in your country =].

plot(ruber$lon,ruber$lat,xlim=c(-90,-25),ylim=c(-60,20),col="red",pch=19,xlab="Longitude",ylab="Latitude")

map(add=T)

## MAPPING ABIOTIC CONDITIONS (WORKING WITH RASTERS/GRIDS)

Now we are going to understand how we can work with abiotic conditions. In that way, we can, for example, locate regions with higher or lower temperatures or altitudes. First, we must know that people usually use "raster files" or "grids" to map all sorts of things (not only abiotic conditions) when analyzing spatial patterns in any software (R, ArcGIS...). But what are these so-called "rasters"?

This is a raster file. In practice, it's a matrix you overlay into a map. A pixel (sort of a spatialized matrix cell) has a specific value and it's located within latitudinal and longitudinal ranges. If you choose any coordinate within the ranges of one specific pixel, these coordinates will all present the same values.

Ok, so this is what happens when you ovelay an empty raster or grid into the map of Brazil. Nothing great, just a spatialized matrix or whatever, right?

However, here we have a world map overlayed with a raster/grid showing how many species of a specific bird family are present in regions. You can observe that regions with higher richness have hotter colors (e.g red). Notice that in this example we are not mapping an abiotic condition. You can learn how to do this type of richness map by reading my other post:?

Before we enter R again, we must download some of these rasters so that we're able to map abiotic conditions and finally model a species distribution. We are going to download some rasters with current abiotic conditions from WorldClim website. Download "Bioclim" and "Altitude" under 10 arc-minutes under Generic grids.

WorldClim 1.4 — WorldClim 1 documentation

***Note that WordClim 2 is already available, so use the new version if you are going to use WorldClim in your study.***

Before you google it, Lower arc-minutes = smaller pixel = higher resolution. Also, to know what are Bioclim variables (please, don't confuse them with the BIOCLIM modeling method), see: https://www.worldclim.org/bioclim. Now extract each .zip file into different folder (like 'bioclim' folder and 'altitude' folder).

### Now that we downloaded some rasters, we can enter R again.

# Let us load the "altitude raster" in our "altitude folder".
altitude<-raster(choose.files())

# You can also change the color palette so that colors vary from blue to red. More specifically, you can make colors vary among 10 categories of colors, from a darker blue to a darker red.

palette<-colorRampPalette(c("blue","white","red"))
plot(altitude,col=palette(10))

# We can also zoom our raster to show only South America. To do that, we can crop our raster according to our desired latitudinal and longitudinal limits.

# We first define these limits.
ext<-extent(-90,-25,-60,20)

# Then we crop the raster according to them.
altmod<-crop(altitude,ext)

plot(altmod,col=palette(10))

We can now observe which are the lowest and highest regions of South America.

# Let's see where our?Eudocimus ruber?species occurs to see if it's usually found in lower or higher regions.points(ruber$lon,ruber$lat,col="black",pch=16)

# We are now going to combine lots of rasters into a single file. People call this type of file as a "stack of rasters". To do that, you must first change your working directory to be a folder with lots of rasters. We can use our 'bioclim' folder (with our Bioclim variables).

# Change your working directory. Choose your 'bioclim' folder.
setwd(choose.dir())

# Now we'll make a list of all files in that folder that have the '.bil' extension. This is a commonly used raster extension.
rasters<-list.files(pattern=".bil")

# Then, we combine all these rasters into a single file.
stackrasters<-raster(rasters[1])
for(x in rasters){
stackrasters<-stack(stackrasters,raster(x))
}

# We can even plot some of these rasters.
plot(stackrasters,main=NA)

# Let us now crop all rasters so that they all zoom into South America. First, we define our latitudinal and longitudinal limits.
ext<-extent(-90,-25,-60,20)

# And then we crop all rasters inside the 'stack'.
southamer<-crop(stackrasters,ext)

# We can see some of these cropped rasters
plot(southamer,main=NA)

## MODELING SPECIES DISTRIBUTIONS (BIOCLIM)

# We'll use the previously cropped stack of rasters to model the distribution of Eudocimus ruber outside our previously defined South American region.

# To create our BIOCLIM model, we must first define the coordinates of the species occurrences.
coord<-data.frame(ruber$lon,ruber$lat)

# Then, we use the stack of rasters and the coordinates to create the model for Eudocimus ruber. Doing that, we have a model that explains where we can usually find individuals of that species in the previously specified region (in our example, South America).
bio.ruber<-bioclim(southamer,coord)

# Based on our model, we can predict where individuals of our species are more likely to be found in South America.
map.ruber<-predict(bio.ruber, southamer)
plot(map.ruber)
map(add=T)

Colored regions are more suitable to the occurrence of?

# However, a better way to observe these colored regions is to reclassify their values so that colors are better defined or easier to interpret. For example, let us make all values between 0 and 0.05 to become 0.25; all values between 0.051 and 0.1 to become 0.5; and all values between 0.11 and 0.4 to become 1.

bio.rec<-reclassify(map.ruber,c(0,0.05,0.25, 0.051,0.1,0.5, 0.11,0.4,1))
plot(bio.rec)
map(add=T)

Now green areas are the ones more suitable to the occurrence of?

# Let us use our model (created using individuals and abiotic conditions from South America) to find out where it would be possible to find Eudocimus ruber in Africa based on its abiotic conditions.

# First we define the latitudinal and longitudinal limits of Africa.
ext.africa<-extent(-25,50,-40,40)

# Then we crop all our rasters in our stack file based on these limits.
africa.rasters<-crop(stackrasters,ext.africa)


# Finally, we predict where the most suitable areas for Eudocimus ruber in Africa are. To do that, we must use our cropped rasters for Africa (africa.rasters), but predict where to find the species according to our model created when using South America individuals and cropped rasters (bio.ruber).

map.ruber.africa<-predict(bio.ruber,africa.rasters)

# And now we can reclassify values to change colors on the map (just as we did in the last image).

bio.rec.africa<-reclassify(map.ruber.africa,c(0,0.05,0.25, 0.051,0.1,0.5, 0.11,0.4,1))
plot(bio.rec.africa)
map(add=T)

As you can see, according to our BIOCLIM model, Central Africa would have suitable habitats to?

We can use methods of distribution modeling to study all sorts of things (climate change, epidemiology, community assembly, etc...). Just think of a question and try to answer it by using what you learned here =]. Your next step is to search about "training" and "testing" data sets, and model accuracy/overfitting. In other words, we must be able to observe if our models are good at predicting species occurrences.

2. Modelling the Distributions of Species and Communities under Distinct Climate Scenarios

We are going to see how to use R and maxent together in order to model the effects of climate change on almost any species you want. We'll go from the basics to the more specific concepts. In general, we will download a georeferenced occurrence list of a species, then we will get the bioclimatic environmental variables (such as temperature) for a region of the world, then we will use both species and bioclimatic information to do MaxEnt model, to be able to predict the habitat suitability of our studied species in the present and in the future. At the end of the post, you will understand studies such as this one: https://www.gjesm.net/article_28776.html. Also, you will be able to do your own studies!

Modelling climate change effects using Maxent modelling technique.

The first thing you must do is to have an R version installed on your computer. So, enter the following website and download the Microsoft R 3.5.3. Note that we will install the 64 bits version, but you can install the x86 if your machine does not handle the 64 bits.

https://mran.microsoft.com/open

Now, download RStudio from the following website.

https://rstudio.com/products/rstudio/download/

Now, open RStudio on your computer, go to Tools and Global Options. Then, see if Microsoft R 3.5.3 is being used as your R version. If so, let's continue.

Now, go to File, then New Project, and create a new folder for our project. You can name it ClimateChange or something like that. We can now begin to code!

First, let us install and load the following packages:

install.packages("dismo")
install.packages("maps")
install.packages("raster")

library(dismo)
library(maps)
library(raster)

We will download the occurrences of a species. In our example it will be the Maned Wolf (Chrysocyon brachyurus).

guara<-gbif("Chrysocyon","brachyurus",download=T,geo=T)

# Let us see the occurrences:

plot(guara$lon,guara$lat,col="red",pch=19,xlab="Longitude",ylab="Latitude")
map(add=T)

Note that there are two very strange locations for the maned wolf. They are certainly artifacts. Let us define an extent to which the occurrences can really exist. The minimum longitude will be -90, the maximum longitude will be -25, the minimum latitude will be -45, and the maximum latitude will be 0.

guara<-guara[which(guara$lon > -90 & guara$lon < -25),] 

guara<-guara[which(guara$lat > -45 & guara$lat < 0),]

plot(guara$lon,guara$lat,xlim=c(-90,-25),ylim=c(-45,0),col="red",pch=19,xlab="Longitude",ylab="Latitude")

map(add=T)

Now, let us download the shapefile of Brazil, which we can use in the future.

my0 = getData('GADM', country="BRA", level=0)
plot(my0)

# Ok, now we will download the bioclimatic variables from worldclim (https://www.worldclim.org/data/bioclim.html). We can do that in R using getData():

climate=getData("worldclim", var="bio",res=2.5)

# We can plot the first bioclimatic variable, which is temperature, using plot():

plot(climate$bio1, main="Annual Mean Temperature")

MAXENT

We can now use the MaxEnt modelling. It is an algorithm to get a function that is able to predict habitat suitability for species depending on bioclimatic variables. First, let us prepare maxent in R. Try to install the following packages:

install.packages("Rtools")
install.packages("installr")

# You will probably get stuck trying to install Rtools. So, use installr to install Rtools:

library(installr)
install.Rtools()

Choose version 3.3 and let it download and install. A window will probably appear. Click to advance and install everything. If the window to choose version does not appear, enter the website the warning message tells you to access. You can download and install Rtools from there.

Now that we have Rtools, we don't need to load installr again. We can now download and install Java in our computer in https://www.java.com/en/download/. Pay attention, because you need to install the Java version corresponding to your R version (in the beginning of the post we said 64 bits or x86). After you install the correct version of Java, install and load rJava package in R.

install.packages("rJava")
library(rJava)

Finally, open the following website and download maxent: https://biodiversityinformatics.amnh.org/open_source/maxent/. Now get the .jar file (the bigger file) and place it in Documents/R/win-library/3.5/dismo/java.

Ok, we can continue our coding in R. We will organize the longitude and latitude values in a new file.

guarapoints=guara[,c('lon','lat')]

# Now let us divide this file into 4 parts. 75% of the data will be used to train the model (to create the mode), and 25% will be used to test if it is a good model.

group <- kfold(guarapoints,4)
pres_train <- guarapoints[group!=1,]
pres_test <- guarapoints[group==1,]

# Next, we will get the climatic data only to a part of South America (remember the minimum and maximum coordinates we talked before):

ext = extent(-90,-25,-45,0)
southamerworldclim=crop(climate,ext)

# Ok, we can now use the maxent modeling. Note that we will use our bioclimatic variables (cropped for our desired extent) and the 75% of the occurrence data which will train the model.

xm <- maxent(southamerworldclim,pres_train)
plot(xm)

In the graph above we can see the contribution of each bioclimatic variable in training the model. We shall now create pseudo-absences (places were we did not find the species; we call pseudo because no one really knows if someone recorded the absence in any area). We will use randomPoints() to create 1000 pseudo-absences. Then, we will also divide these data into 5 parts.

backg = randomPoints(southamerworldclim, n=1000, ext=ext, extf=1.25)
colnames(backg) <- c("lon","lat")
group=kfold(backg, 5)
backg_train <- backg[group!=1,]
backg_test <- backg[group==1,]

Ok, we can evaluate the model. In other words, we can see if the model is good or bad. To do that, we will use the occurrence data for testing (25% not used for training the model), the absence data for testing (also 25%), the maxent model, and the bioclimatic variables.

e = evaluate(pres_test, backg_test, xm, southamerworldclim)
plot(e,'ROC')

The greater the AUC value is from 0.5, the better is our model. Our AUC was 0.981, so nothing to worry about. Now, based on the Maned Wolf occurrences we have, we can see the habitat suitability for it in South America. We do that using the predict() function.

p <- predict(southamerworldclim, xm, ext=ext, progress='')
plot(p, main="Maxent, raw values - Present")

Green areas are the best (near to 1) for the occurrence of Maned Wolf. Now, let us see if the habitat suitability will change based on a relatively middle scenario (slowly decreasing carbon emissions) called RCP45. To get the data for the year 2070:

futbio<-getData('CMIP5', var='bio', res=2.5, rcp=45, model='MI', year=70)

Then, use the same names we used for creating our maxent model, the use predict function using the future bioclimatic variables. Finally, we have our new habitat suitability for the year 2070 (we can use our Brazil shapefile - my0 from the beginning of the post - to see how the habitat suitability of the Maned Wolf is compared to Brazil's region):

领英推荐

Deep Dive: Impact AI on Marine Ecology. Part 1.

Amazinum 1 年前

Introducing Carto: A New AI Tool for the Dutch…

Dutch Caribbean Nature Alliance 4 个月前

AI Ecology vs AI Marketplace

DEGA 5 个月前

names(futbio)<-names(climate)
pfut<-predict(futbio, xm, ext=ext, progress='')
plot(pfut, main="Maxent, raw values - Future")
plot(my0,add=T)

Comparing the habitat suitability for both present and future, we can see that best habitats will no longer be inside Brazil. The maned wolf will have to go South in order to follow climate changes. Will countries be ready for these shifts?

You now know the basics of how to use MaxEnt and how to model the effects of climate change using R!

Ok, let us continue our studies on modelling the effects of climate change using R. This time we will model the effects of climate change on whole communities.

First, we will make a function that automatically creates habitat suitability maps and, if the value of habitat suitability is greater than 0.5, the value of presence of that species becomes 1 in a grid cell. Here is the function. Just copy and paste into R:

changingworld<-function(gen, epi, climate, climatefocus, futclimatefocus){
listdata<-list()   
spdata<-gbif(as.character(gen),as.character(epi),download=T,geo=T)
sppoints<-spdata[,c('lon','lat')]   
group <- kfold(sppoints,5)   
pres_train <- sppoints[group!=1,]   
pres_test <- sppoints[group==1,]   
xm <- maxent(climate,pres_train)   
backg = randomPoints(climate, n=10000, ext=ext, extf=1.25)
colnames(backg) <- c("lon","lat")   
group=kfold(backg, 5)   
backg_train <- backg[group!=1,]   
backg_test <- backg[group==1,]   
e = evaluate(pres_test, backg_test, xm, climate)   
listdata[[1]]<-e@auc   
p <- predict(climatefocus, xm, progress='')   
presoccur<-ifelse(values(p)>0.5,1,0)
values(p)=presoccur
listdata[[2]]<-p   
pfut<-predict(futclimatefocus, xm, progress='')   
presoccur<-ifelse(values(pfut)>0.5,1,0)
values(pfut)=presoccur
listdata[[3]]<-pfut
return(listdata)}

Now, download and extract the border of Cerrado biome. We will use this later. It is from https://terrabrasilis.dpi.inpe.br/downloads/.

https://www.mediafire.com/file/p4exj781j3p6t5r/cerrado_border.zip/file

Also, download the cerrado species list (123 species, from https://www.perspectecolconserv.com/en-climate-change-will-drive-mammal-articulo-S2530064418301378, excluding species with no data):

https://www.mediafire.com/file/ywg7f2uzh01l9tn/MammalsCerrado.csv/file

Ok, you can now follow the codes:

library(rgdal)
library(dismo)
library(maps)
library(raster)
library(rJava)
shape <- readOGR(choose.files(), "limite_cerrado") # these are the borders of cerrado
sp<-read.csv(choose.files(),sep=";") # this is the species list
climate=getData("worldclim", var='bio',res=2.5)

ext = extent(-90,-30,-60,0) # extent of south america (you can remove these two lines if you want, but it will take more time to analyze)
climate=crop(climate,ext)

futbio<-getData('CMIP5', var='bio', res=2.5, rcp=85, model='MI', year=70)
crs(shape)=crs(climate)
names(futbio)<-names(climate)
climatefocus=crop(climate,shape)
futclimatefocus=crop(futbio,shape)
climatefocus<-mask(climatefocus, shape)
futclimatefocus<-mask(futclimatefocus, shape)

Now, we will use the function we made on each species, generating occurrences for each of them for both the present and the future:

listspecies<-list()
for(i in 1:length(sp[,1])){
listspecies[[i]]<-changingworld(sp[i,1], sp[i,2], climate, climatefocus, futclimatefocus)}

You can let R work for some time. Each species may take some minutes. You can even stop it after some hours if you want, so that you can continue the tutorial.

val<-values(listspecies[[1]][[2]])

val<-ifelse(val>0,0,0)
present<-val
future<-val
for(i in 1:length(sp[,1])){
present<-present+values(listspecies[[i]][[2]])
future<-future+values(listspecies[[i]][[3]])
}

We now can create maps for the present and the future with the occurrence of species.

presraster<-listspecies[[1]][[2]]
values(presraster)=present
plot(presraster)

Distribution of present-day Cerrado mammal species in the present:

futraster<-listspecies[[1]][[2]]
values(futraster)=future
plot(futraster)

Distribution of present-day Cerrado mammal species in the future (year 2070 in a scenario of high concentration of greenhouse gases):

You can see that climate change does serious changes in the occurrence of present-day species. Observe that invasive species (species that may come to cerrado in the future) are not considered in our study. We could also make some auxiliary analyzes to know if the more sparse distribution of species is due to small rodents, and if they follow the North of Cerrado because of the still relatively great concentration of native vegetation. For example, if we analyze the Giant Anteater, we can see that its occurrence will decrease in the Cerrado:

plot(listspecies[[69]][[2]])

Presence of Giant Anteater today:

plot(listspecies[[69]][[3]])

Presence of Giant Anteater in the future:

Lots of things to further analyze!

3. BIOENSEMBLES: The Usage of Ensembles in Ecological Modelling

Now I will show you how to use the "sdm" R package to analyze the distribution of species or even whole communities. The "sdm" package was made by two main authors: Babak Naimi and Miguel Bastos Araújo . We'll first see how to model species distributions, and how to create ensembles (we'll see what these are). Then we will see how to analyze whole communities in order to see how climate change will affect them.

Ok, now let's go to R!

# Well, first thing you have to do is install and load "sdm" package. Note you only have to install it one time. If you open R again later, you can only load it.

install.packages("sdm")
library(sdm)

# You can now use the installAll() function to install all associated packages. You only need to use this function once.

installAll()

# Now, as we did in the post mentioned at the beginning, we will use the Maned Wolf as an example. First, search for its occurrences.

install.packages("mapview")
library(mapview)
library(dismo)
library(dplyr)
library(tidyr)

guara<-gbif("Chrysocyon","brachyurus",download=T,geo=T)

# Now we can see the basis of record for the observations of the Maned Wolf occurrences. You can use this information to filter your data.

table(guara$basisOfRecord)

# For example, we can use use the "filter" function to filter occurrences that are from human observation.

guara<-filter(guara, basisOfRecord %in% c("HUMAN_OBSERVATION"))

# Now only maintain lon and lat coordinates:

guarac<-select(guara,lon,lat)

# And remove occurrences with no lon and lat:

guarac<-drop_na(guarac)

# Create a "species" variable. We will use this later. It is just a column full of 1.

guarac$species<-1

# Now transform these occurrences into a spatial object:

coordinates(guarac)<- c("lon","lat")

# Now download the climatic data from bioclim! The resolution is 2.5 degrees. Don't forget that you must have read the previous post!

library(raster)

climate=raster::getData("worldclim", var="bio",res=2.5)

# Let us plot our occurrences over the temperature climatic data.

plot(climate[[1]])

points(guarac,cex=0.5, pch=16)

# Now let us get a part of the map to work on. Draw a square to get the whole South America.

e<-drawExtent()

# Then, crop climate and occurrence data according to this focus of study:

climatefocus<-crop(climate,e)

guarac<-crop(guarac,e)

# Plot both:

plot(climatefocus[[1]])

points(guarac,cex=0.5,pch=16)

# Remove variables with multicolinearity from climate data:

library(usdm)
ex<-raster::extract(climatefocus,guarac)
v<-vifstep(ex)
climatefocus<-exclude(climatefocus,v)

# Now let us organize our data:

d<- sdmData(species~., guarac, predictors=climatefocus, bg = list(method="gRandom", n=10000))

# We can now fit the models we want. First let us get a list of methods we can use. There is a lot of possible methods, including MAXENT.

getmethodNames()

#Now fit models. We will fit glm and maxent models. It will take some time. Note that ncore represents the number of cores to be used, so it depends on you computer.

m<-sdm(species~., d, methods=c("glm",? "maxent"), replication=c("sub","boot"), test.p=30, n=3, parallelSettings=list(ncore=3,method="parallel"))

# Call the m object. You will have a summary of model performances, including AUC (we talked about it on the post mentioned in the beginning of this post). You can also use gui function to see a cool interface for model performance analysis.

m
gui(m) # if it does not stop running after you close the window, press ESC

# Now we can predict the habitat suitability values. It will take some time, as each model and its replications will generate predictions.

pred1<-predict(m,climatefocus)

# And we can generate an ensemble, which combines predictions into a single object, weighted by the TSS values, which are calculated using a cross-validation procedure (train vs test data).

en1<-ensemble(m, climatefocus, setting=list(method="weighted",stat="tss",opt=2))
plot(en1)

# Now we can say that values of habitat suitability>0.5 represent an occurrence of the species, so... Maned Wolf can more certainly occur, at the present-day, in this place in green.

presoccur<-ifelse(values(en1)>0.5,1,0)
values(en1)=presoccur
plot(en1)

#Ok, now let us get the post mentioned at the beginning of this one. We will make things automatic. We will generate community species richness for both the present and the future, but using the "ensemble" technique. This is what is made in a software called BIOENSEMBLES, but it is not released yet in a faster language than R (but a version of it is already in use in some labs from Spain, Portugal, Brazil, and maybe some others).

# Let us make first a changing world 2.0 function. Copy and paste into R.

changingworld2<-function(gen, epi, climate, climatefocus, futclimatefocus, methods=c("glm")){
listdata<-list() 
spdata<-gbif(as.character(gen),as.character(epi),download=T,geo=T)
spdata<-spdata[,c('lon','lat')]
spdata<-drop_na(spdata)
spdata$species<-1
coordinates(spdata)<- c("lon","lat")
spdata<-crop(spdata,climate)
ex<-raster::extract(climate,spdata)
v<-vifstep(ex)
climatefocus<-exclude(climate,v)
d<- sdmData(species~., spdata, predictors=climate, bg = list(method="gRandom", n=10000))
m<-sdm(species~., d, methods=methods, replication=c("sub","boot"), test.p=30, n=3, parallelSettings=list(ncore=3,method="parallel"))
listdata[[1]]<-m 
en1<-ensemble(m, climatefocus, setting=list(method="weighted",stat="tss",opt=2)) 
en2<-ensemble(m, futclimatefocus, setting=list(method="weighted",stat="tss",opt=2)) 
presoccur<-ifelse(values(en1)>0.5,1,0)
values(en1)=presoccur
listdata[[2]]<-en1
futoccur<-ifelse(values(en2)>0.5,1,0)
values(en2)=futoccur
listdata[[3]]<-en2
return(listdata)}

#Now let us start using this function. First load some packages and load the climate data you want. I will get a 2.5 degree resolution for bioclim variables.

library(sdm)
library(rgdal)
library(dismo)
library(maps)
library(raster)
library(rJava)

climate=raster::getData("worldclim", var='bio',res=2.5)

# Now let us create a shapefile. If you want, you can use the Cerrado shapefile from the post mentioned at the beginning of the post using?

shape <- readOGR(choose.files(), "limite_cerrado")

# But let us create a shapefile using "drawExtent()". Just make a square in Cerrado location :

plot(climate[[1]])
shape<-drawExtent()

# Now go to the mentioned post at the beginning of this post and download and load the csv file of species from the Cerrado region.

sp<-read.csv(choose.files(),sep=";")

# Then, get climate data for the future using the getData() function. Then crop climate data from the present and the future according to shapefile to make the climatefocus and futclimatefocus objects.

futbio<-raster::getData('CMIP5', var='bio', res=2.5, rcp=85, model='MI', year=70)

names(futbio)<-names(climate)

climatefocus=crop(climate,shape)

futclimatefocus=crop(futbio,shape)

climatefocus<-mask(climatefocus, climatefocus)

futclimatefocus<-mask(futclimatefocus, futclimatefocus)

# Now let us run the function to create a list with ensembles (for both the present and the future) for each species. We will use glm and maxent methods, but you can change this to include other methods.

listspecies<-list()
for(i in 1:length(sp[,1])){
listspecies[[i]]<-changingworld2(sp[i,1], sp[i,2], climate, climatefocus, futclimatefocus,methods=c("glm","maxent"))
}

# We can combine species occurrences from different objects from the lists.

val<-values(listspecies[[1]][[2]])
val<-ifelse(val>0,0,0)
present<-val
future<-val
for(i in 1:length(sp[,1])){
present<-present+values(listspecies[[i]][[2]])
future<-future+values(listspecies[[i]][[3]])
}

# We can now create maps for the present and the future with the occurrences of species from the ecological community of mammals from Cerrado.

presraster<-listspecies[[1]][[2]]

values(presraster)=present

plot(presraster)

# Distribution of present-day Cerrado mammal species in the present (I only used 10 species, because it takes too much time to complete all 123 species, but let it run and tell me what you found):

futraster<-listspecies[[1]][[2]]

values(futraster)=future

plot(futraster)

# Distribution of present-day Cerrado mammal species in the future (year 2070 in a scenario of high concentration of greenhouse gases) (again, only using 10 species):

# We have seen a lot for today. You can now play with BIOENSEMBLES. Congrats :) if you want to use more than one future climatic model, you can modify the function to include more futclimatefocus objects and to create more ensembles. Then, you can make a mean() raster object for future ensembles.

#######################

EXTRA:

Please, take a look at my book, "The Agony of Choice", available both in English and Portuguese:

What I present in this book is the attempt to show the basic biology needed to answer why, how, where, when to prioritize the conservation of some species. You may think, "Why not save all species equally?". In a nutshell, we have limited resources for environmental conservation. We can try to mitigate some regional or global effects on biodiversity, for example by decreasing the release of greenhouse gases, or by reconstitution of the ozone layer. However, the human population is far from stabilizing its population growth, and soon we will be 10 billion on the planet by the end of the 21st century. To deal with this, agriculture and urban sprawl continue to grow. Native vegetation of global ecosystems is extremely threatened. This is the case of the Cerrado, which, in 2002, had 60% of its native vegetation, concentrated basically in the North of the biome.

Human activities threaten the beings of this planet. Some of these organisms have such small and specific places of life that they have no salvation. They'll be extinct without us even knowing. Ecological studies may try to search for the species that are threatened and transfer them to zoos, taxidermize them and describe them. Knowing what we've lost in nature, we can mitigate the negative effects on other species that could have ecological relationships with those that go. We also note that some species are more threatened than others. One question is, "should we protect those that are still likely to be saved or those that are less likely?". Another: "should we save the one that has the greatest contribution to the functioning of the ecosystem, for example by allowing the pollination of several plant species?" There are several doubts that people who conserve nature try to solve.

Some conservation biology issues apply to the present. Some species or places where they live are disappearing. However, some issues are concerned about the future. Computer models today show that the future will be warmer and drier, with far fewer native areas. We even know which areas and species will probably be most affected in the future. This work aims to show the differences between species, to show that some are ecologically more vulnerable than others, showing ways to look for which still have greater contribution to the functioning of ecosystems. The last chapter has a title that includes the term "Noah's Ark", which conservation biologists should try to create (not literally, of course). At the end of this book, I hope you can answer: who will enter this ark, and why?

Also, feel free to access my blog, R Functions:

Also, if you did not understand most of the R coding we performed, take a look at my Introduction to R:

That's it for now,

Till' Next Time!

#######################

muhammad inam

4 个月

I do modelling with maxent.. I want to be a contributor.. with modelling articles

1 次回应

要查看或添加评论，请登录

José Hidasi-Neto的更多文章

Introdu??o à Bioestatística (curso completo)

2024年9月3日

Introdu??o à Bioestatística (curso completo)

Pessoal, tudo bem? Aqui está meu curso completo de bioestatística. S?o 14 vídeos que falam desde amostragem até testes…
Phylogenetic Comparative Methods (PCMs) in R

2024年8月24日

Phylogenetic Comparative Methods (PCMs) in R

Hi there, Today we are going to see a little about Phylogenetic Comparative Methods (PCMs). First, we'll see what PCMs…
astRaRium2: An R Game About Linking Stars

2024年8月19日

astRaRium2: An R Game About Linking Stars

Hi there, Today I have a game I created in R for you. I named it astRaRium, and it's Puzzle game about identifying and…
The Rise of Human Singularity

2024年8月9日

The Rise of Human Singularity

Written by: José Hidasi Neto. Biologist and PhD in Ecology and Evolution.
A Brief Introduction to Artificial Intelligence (AI) and Machine Learning

2024年8月7日

A Brief Introduction to Artificial Intelligence (AI) and Machine Learning

Hi there, I was studying and I stopped to think a little. It would be great if people understood how Elementary and…

2 条评论
Emissários do Céu: O Enigma das Aves Mensageiras

2024年8月6日

Emissários do Céu: O Enigma das Aves Mensageiras

1. Mensagens nas Penas Em uma pequena vila na ásia, os habitantes come?aram a notar algo peculiar nas aves que chegavam…

See all articles

Modeling Species Distributions in R

José Hidasi-Neto

Post-Doc em Ecologia e Evolu??o | Desenvolvedor de Jogos | Data Analyst | Unity, Unreal | Game Designer | Power BI | R, Python, C#, C++ | Brasileiro-Húngaro |

1. First Steps

2. Modelling the Distributions of Species and Communities under Distinct Climate Scenarios

领英推荐

3. BIOENSEMBLES: The Usage of Ensembles in Ecological Modelling

EXTRA:

José Hidasi-Neto的更多文章

社区洞察

其他会员也浏览了

??AI in Forestry: Bark Beetle Identification & Monitoring??

Snagging List

Murray-Darling Basin Vegetation Structure and Extent

Ember Hall | Igniting The Future

Discover Lepetodrilus marianae!

Ep. 028: People and Technology in Rewilding with Simon Roper

Established invasive tree species provide opportunities for climate adaptation

Understanding the main vegetation indices for agriculture

Newsletter September 2024

Mapping a Future for Rhinos: How GIS and Satellite Imaging Can Save Habitats

1. First Steps

2. Modelling the Distributions of Species and Communities under Distinct Climate Scenarios

领英推荐

3. BIOENSEMBLES: The Usage of Ensembles in Ecological Modelling

EXTRA:

José Hidasi-Neto的更多文章

Introdu??o à Bioestatística (curso completo)

Phylogenetic Comparative Methods (PCMs) in R

astRaRium2: An R Game About Linking Stars

The Rise of Human Singularity

A Brief Introduction to Artificial Intelligence (AI) and Machine Learning

Emissários do Céu: O Enigma das Aves Mensageiras

社区洞察

其他会员也浏览了

??AI in Forestry: Bark Beetle Identification & Monitoring??

Snagging List

Murray-Darling Basin Vegetation Structure and Extent

Ember Hall | Igniting The Future

Discover Lepetodrilus marianae!

Ep. 028: People and Technology in Rewilding with Simon Roper

Established invasive tree species provide opportunities for climate adaptation

Understanding the main vegetation indices for agriculture

Newsletter September 2024

Mapping a Future for Rhinos: How GIS and Satellite Imaging Can Save Habitats