vignettes/generating-zeros-using-the-species-list.Rmd
generating-zeros-using-the-species-list.Rmd
The aim of this document is demonstrate how the Species List table
(SL) of the RDBES can be used to complement the sample table with zeros
in cases where, e.g., a species was looked for but not found and
therefore does not appear in the Sample table (SA) of the RDBES. The
task of adding zeros to the Sample table (SA) is made easy by using the
function generateZerosUsingSL
available in the
RDBEScore
package.
# read an example dataset and simplify it to 1 trip and 1 haul [dev bote: this section needs to be reworked when data and filterRDBESDataObject are updated]
data(Pckg_survey_apistrat_H1)
myH1DataObject1 <- Pckg_survey_apistrat_H1
myH1DataObject1$SL<-myH1DataObject1$SL[grepl(myH1DataObject1$SL$SLspeclistName, pat="Pckg_survey_apistrat_H1"),]
#myH1DataObject1<-filterAndTidyRDBESDataObject(myH1DataObject1, fieldsToFilter="FOid",valuesToFilter=70849, killOrphans = TRUE)
myH1DataObject1<-filterRDBESDataObject(myH1DataObject1, fieldsToFilter="SSid",valuesToFilter=227694, killOrphans = TRUE)
# check it is a valid RDBESobject
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)
The example is from data in hierarchy 1. It contains a single trip with a single haul. For simplicity, we restrict our analysis to the tables SL, SS and SA which are the ones handled by the functions we which behaviour we want to demonstrate.
Examining a print of the Species List (SL) one can conclude that the sampling targeted the landings of only 1 species. In this case the species was Nephrops norvegicus (aphiaId 107254).
myH1DataObject1[c("SL")]
#> $SL
#> SLid SLrecType SLcou SLinst SLspeclistName
#> 1: 47891 SL ZW 4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> SLyear SLcatchFrac SLcommTaxon SLsppCode
#> 1: 1965 Lan 107254 107254
Examining a print of the Species Selection table (SS), one can confirm that only one fishing operation is present in the data (FOid 70849) and that landings were indeed sampled from it (for simplicity only a subset of columns is printed). Note that variable is set to “N” (i.e., No). This will have to be changed later on if we want zeros calculated.
myH1DataObject1[[c("SS")]][,c(1:15,19)]
#> SSid LEid FOid TEid FTid SLid OSid SSrecType SSseqNum SSstratification
#> 1: 227694 NA 70849 NA NA 47891 NA SS 1 N
#> SSstratumName SSclustering SSclusterName SSobsActTyp SScatchFra
#> 1: U N U Sort Lan
#> SSuseCalcZero
#> 1: N
Given the previous, it is expected that if Nephrops norvegicus was sampled it will appear in the RDBES Sample table (SA). One can confirm that happened by printing that table (for simplicity only a subset of columns is printed).
Lets change the example, by adding a couple of new species (Pandalus borealis and Cancer pagurus) to the Species List table (SS). We also change variable to “Y” so that zeros can be calculated.
# first we duplicate the SL
myH1DataObject1$SL<-rbind(myH1DataObject1$SL, myH1DataObject1$SL, myH1DataObject1$SL)
# then we update a few fields and reset the SL key
myH1DataObject1$SL[2:3,c("SLid","SLcommTaxon","SLsppCode")]<-data.frame(c(47892, 47893),c(107276, 107649), c(107276, 107649))
setkeyv(myH1DataObject1$SL,"SLid")
# change SSuseCalcZero to "Y"
myH1DataObject1$SS$SSuseCalcZero<-"Y"
# finally we make sure the object we created is a valid RDBES data object. No message is a good sign.
validateRDBESDataObject(myH1DataObject1, checkDataTypes = TRUE)
# display new SL
myH1DataObject1[c("SL")]
#> $SL
#> SLid SLrecType SLcou SLinst SLspeclistName
#> 1: 47891 SL ZW 4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> 2: 47892 SL ZW 4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> 3: 47893 SL ZW 4484 WGRDBES-EST_TEST_1_Pckg_survey_apistrat_H1
#> SLyear SLcatchFrac SLcommTaxon SLsppCode
#> 1: 1965 Lan 107254 107254
#> 2: 1965 Lan 107276 107276
#> 3: 1965 Lan 107649 107649
After the update, the new dataset likens a situation where observers looked for three species (Nephrops norvegicus, Cancer pagurus and Pandalus borealis) with only one of them (Nephrops norvegicus) having been found in the sample. Running function the zeros for those additional species can be quickly added.
myH1DataObject1updte<-generateZerosUsingSL(myH1DataObject1)
myH1DataObject1updte$SA[,c(1:9,48:49)]
#> SAid SSid LEid SArecType SAseqNum SAparSequNum SAstratification
#> 1: 572813 227694 NA SA 0.998 NA N
#> 2: 572813 227694 NA SA 0.999 NA N
#> 3: 572813 227694 NA SA 1.000 NA N
#> SAstratumName SAspeCode SAtotalWtMes SAsampWtMes
#> 1: U 107649 0 0
#> 2: U 107276 0 0
#> 3: U 107254 276 276
Note that the new rows have floating points values for SAid and SAseqNum (we use sprintf to ensure the decimal places are displayed). This facilitates the ordering of the samples and prevents overlaps when different datasets are joined. Also a SAunitName was created for the new rows that is identical to the SAid.