Skip to contents

Introduction

The aim of this document is to illustrate some of the ways of manipulating RDBESDataObjects.

Prerequisites

First we’ll load some example data from the RDBES and check it’s valid. It’s a good idea to check your RDBESDataObjects are valid after any manipulations you perform. See how to import your own data in the vignette Import RDBES data In this vignette package example data is used.

# load Hierarchy 1 demo data
myH1RawObject <- H1Example

validateRDBESDataObject(myH1RawObject, verbose = FALSE)

The print method gives list of non-null tables in the RDBESDataObject. The structure of the output for each table is:

  • Table name (DE, TE, LE, etc.)
  • Number of rows
  • Sampling method (if applicable, SWRWR, NPJS, CENSUS, etc.)
  • Range of number sampled (if applicable)
  • Range of number total (if applicable)

If there is a single hierarchy present the output is ordered by RDBES hierarchy structure.

print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#>  DE: 8
#>  SD: 8
#>  VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#>  FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#>  FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#>  SS: 1916 (CENSUS,SRSWR: 1/1-4)
#>  SA: 1916 (CENSUS,SRSWR: 1/1-2)
#>  FM: 7290
#>  BV: 14580 (SRSWR: 2/2)
#>  VD: 311
#>  SL: 6
#>  IS: 25

Underling the print function there is a summary function that retains only unique rows for some columns used in print.

h1summary <- summary(myH1RawObject)
#get the hierarchy
h1summary$hierarchy
#> [1] 1
#extract the number of rows in tables from the summary
sapply(h1summary$data, function(x){x$rows})
#>    DE    SD    VS    FT    FO    SS    SA    FM    BV    VD    SL    IS 
#>     8     8  1214  1430  1916  1916  1916  7290 14580   311     6    25

To get the number of rows in each non-null table you can simply call the object:

myH1RawObject #equivalent to print(summary(myH1RawObject))
#> Hierarchy 1 RDBESdataObject:
#>  DE: 8
#>  SD: 8
#>  VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#>  FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#>  FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#>  SS: 1916 (CENSUS,SRSWR: 1/1-4)
#>  SA: 1916 (CENSUS,SRSWR: 1/1-2)
#>  FM: 7290
#>  BV: 14580 (SRSWR: 2/2)
#>  VD: 311
#>  SL: 6
#>  IS: 25

To get an example Hierarchy 5 data:

# Hierarchy 5 demo data
myH5RawObject <- H5Example

validateRDBESDataObject(myH5RawObject, verbose = FALSE)

# Number of rows in each non-null table and hierarchy
print(myH5RawObject)
#> Hierarchy 5 RDBESdataObject:
#>  DE: 3
#>  SD: 3
#>  OS: 27 (SRSWR: 3/100)
#>  LE: 27 (SRSWR: 1/2)
#>  FT: 27 (SRSWR: 1/1)
#>  SS: 27 (SRSWR: 1/4)
#>  SA: 27 (SRSWR: 1/2)
#>  FM: 270
#>  BV: 540 (SRSWR: 2/2)
#>  VD: 311
#>  SL: 1
#>  IS: 20
#>  CL: 497
#>  CE: 99

Sorting RDBESDataObject

If the data has a single hierarchy in the DE table a correct sort order is defined for it. You can use the sort() method for it. Printing the object sorts it automatically if possible.

#before sorting
names(H8ExampleEE1)
#>  [1] "DE" "SD" "VS" "FT" "FO" "TE" "LO" "OS" "LE" "SS" "SA" "FM" "BV" "VD" "SL"
#> [16] "IS" "CL" "CE"
#after sorting
names(sort(H8ExampleEE1))
#>  [1] "DE" "SD" "TE" "VS" "FT" "LE" "SS" "SA" "FM" "BV" "FO" "LO" "OS" "VD" "SL"
#> [16] "IS" "CL" "CE"
#printing the summary
H8ExampleEE1
#> Hierarchy 8 RDBESdataObject:
#>  DE: 1
#>  SD: 1
#>  TE: 11 (SRSWOR: 2-3/4)
#>  VS: 15 (SRSWR: 1-2/7-12)
#>  LE: 15 (SRSWOR: 1/1)
#>  SS: 15 (CENSUS: 1/1)
#>  SA: 15 (SRSWOR: 1/794-14268)
#>  BV: 3995 (CENSUS: 16-100/16-100)
#>  VD: 7
#>  SL: 1
#>  IS: 2
#>  CL: 71
#>  CE: 132

Combining RDBESDataObjects

RDBESDataObjects can be combined using the combineRDBESDataObjects() function. This might be required when different sampling schemes are used to collect on-shore and at-sea samples - it will often be required to combine all the data together before further analysis.

myCombinedRawObject <- combineRDBESDataObjects(myH1RawObject,
                                               myH5RawObject)

# Number of rows in each non-null table and hierarchies
print(myCombinedRawObject)
#> Warning: No sort order for multiple hierarchies can be defined!
#> Warning: Mixed hierarchy RDBESDataObject!
#> Hierarchy 1 RDBESdataObject:
#>   Hierarchy 5 RDBESdataObject:
#>  DE: 11
#>  SD: 11
#>  VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#>  FT: 1457 (CENSUS,SRSWR: 1-3/1-100)
#>  FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#>  OS: 27 (SRSWR: 3/100)
#>  LE: 27 (SRSWR: 1/2)
#>  SS: 1943 (CENSUS,SRSWR: 1/1-4)
#>  SA: 1943 (CENSUS,SRSWR: 1/1-2)
#>  FM: 7560
#>  BV: 15120 (SRSWR: 2/2)
#>  VD: 311
#>  SL: 7
#>  IS: 25
#>  CL: 497
#>  CE: 99

validateRDBESDataObject(myCombinedRawObject, verbose = FALSE)

Filtering RDBESDataObjects

RDBESDataObjects can be filtered using the filterRDBESDataObject() function - this allows the RDBESDataObject to be filtered by any field. A typical use of filtering might be to extract all data collected in a particular ICES division.

myH1RawObject <- H1Example

# Number of rows in each non-null table
print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#>  DE: 8
#>  SD: 8
#>  VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#>  FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#>  FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#>  SS: 1916 (CENSUS,SRSWR: 1/1-4)
#>  SA: 1916 (CENSUS,SRSWR: 1/1-2)
#>  FM: 7290
#>  BV: 14580 (SRSWR: 2/2)
#>  VD: 311
#>  SL: 6
#>  IS: 25

myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc")
myValues <- c("ZW","ZWBZH","ZWVFA" )

myFilteredObject <- filterRDBESDataObject(myH1RawObject,
                                         fieldsToFilter = myFields,
                                         valuesToFilter = myValues )

# Number of rows in each non-null table
print(myFilteredObject)
#> Hierarchy 1 RDBESdataObject:
#>  DE: 8
#>  SD: 8
#>  VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#>  FT: 36 (SRSWR: 3/100)
#>  FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#>  SS: 1916 (CENSUS,SRSWR: 1/1-4)
#>  SA: 1916 (CENSUS,SRSWR: 1/1-2)
#>  FM: 7290
#>  BV: 14580 (SRSWR: 2/2)
#>  VD: 310
#>  SL: 6
#>  IS: 25

validateRDBESDataObject(myFilteredObject, verbose = FALSE)

It is important to note that filtering is likely to result in “orphan” rows being produced so it is usual to also apply the findAndKillOrphans() function to the filtered data to remove these records.


myFilteredObjectNoOrphans <- 
  findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE)

validateRDBESDataObject(myFilteredObjectNoOrphans, verbose = FALSE)

You can also remove any records that are not linking to a row in the VesselDetails (VD) table using the removeBrokenVesselLinks() function.


myFields <- c("VDlenCat")
myValues <- c("18-<24" )
myFilteredObject <- filterRDBESDataObject(myFilteredObjectNoOrphans,
                                         fieldsToFilter = myFields,
                                         valuesToFilter = myValues )

myFilteredObjectValidVesselLinks <- removeBrokenVesselLinks(
                                  objectToCheck = myFilteredObject,
                                  verbose = FALSE)

validateRDBESDataObject(myFilteredObjectValidVesselLinks, verbose = FALSE)

Finally you can also remove any records that are not linking to an entry in the SpeciesListDetails (SL) table using the removeBrokenSpeciesListLinks() function.


myFields <- c("SLspeclistName")
myValues <- c("ZW_1965_SpeciesList" )
myFilteredObjectValidSpeciesLinks <- filterRDBESDataObject(myFilteredObjectValidVesselLinks,
                                         fieldsToFilter = myFields,
                                         valuesToFilter = myValues )

myFilteredObjectValidSpeciesLinks <- removeBrokenSpeciesListLinks(
                                  objectToCheck = myFilteredObjectValidSpeciesLinks,
                                  verbose = FALSE)

validateRDBESDataObject(myFilteredObjectValidSpeciesLinks, verbose = FALSE)

Getting Subsets of RDBESDataObject Tables

Sometimes it we want to see how a field or values in the RDBESDataObject are connected to otther tables. One use case would be e.g. to see when a specific Landing Event (LE) occured.For this we can use the getLinkedDataFromLevel() function.

#get the TE table corresponding to the first LEid in the H8ExampleEE1 object
getLinkedDataFromLevel("LEid", c(1), H8ExampleEE1, "TE", verbose = TRUE)
#> Traversing upwards in the table hierarchy from LE to TE 
#> [1] "LE: 1"
#> [1] "Skipping: FT"
#> [1] "VS: 1"
#> Key: <TEid>
#>     TEid  SDid   LOid   VSid TErecType TEseqNum TEstratification TEtimeUnit
#>    <num> <num> <lgcl> <lgcl>    <char>    <num>           <char>     <char>
#> 1:     1     1     NA     NA        TE        1                Y       Week
#>    TEstratumName TEclustering TEclusterName TEsampler TEnumTotal TEnumSamp
#>           <char>       <char>        <char>    <lgcl>      <num>     <int>
#> 1:      November            N             U        NA          4         2
#>    TEincProb TEselectMeth TEunitName TEselProb TEincProbCluster TEsamp
#>       <lgcl>       <char>     <char>    <lgcl>           <lgcl> <char>
#> 1:        NA       SRSWOR  1-week-48        NA               NA      Y
#>    TEselectMethCluster TEnumTotalClusters TEnumSampClusters TEselProbCluster
#>                 <lgcl>             <lgcl>            <lgcl>           <lgcl>
#> 1:                  NA                 NA                NA               NA
#>    TEnoSampReason TEnonRespCol TEauxVarTot TEauxVarValue TEauxVarName
#>            <lgcl>       <char>      <lgcl>        <lgcl>       <lgcl>
#> 1:             NA            N          NA            NA           NA
#>    TEauxVarUnit
#>          <lgcl>
#> 1:           NA

Similarly we can get the subset of the LE table corresponding to a specific value in the TE table. This does not have to be the id field, but can be any field in the table.

#get the TE table corresponding to the first LEid in the H8ExampleEE1 object
getLinkedDataFromLevel("TEstratumName", c("November"), H8ExampleEE1, "LE", verbose = TRUE)
#> Traversing downwards in the table hierarchy from TE to LE 
#> TEstratumName: November
#> VSid: 1, 2, 11
#> Key: <LEid>
#>     LEid  FOid   OSid   FTid  VSid  VDid  TEid   SSid   SAid LErecType
#>    <num> <num> <lgcl> <lgcl> <num> <num> <num> <lgcl> <lgcl>    <char>
#> 1:     1    NA     NA     NA     1     6     1     NA     NA        LE
#> 2:     2    NA     NA     NA     2     6     1     NA     NA        LE
#> 3:    11    NA     NA     NA    11  1255     8     NA     NA        LE
#>    LEstratification LEseqNum LEhaulNum LEstratumName LEclustering LEclusterName
#>              <char>    <num>    <lgcl>        <char>       <char>        <char>
#> 1:                N        1        NA             U            N             U
#> 2:                N        2        NA             U            N             U
#> 3:                N       11        NA             U            N             U
#>    LEsampler LEmixedTrip LEcatReg LElocode LElocName LElocType LEctry
#>       <lgcl>      <char>   <char>   <char>    <lgcl>    <char> <char>
#> 1:        NA           N      Lan    EEPRN        NA      Port     EE
#> 2:        NA           N      Lan    EEPRN        NA      Port     EE
#> 3:        NA           N      Lan    EERMS        NA      Port     EE
#>        LEdate LEtime LEeconZoneIndi      LEarea LEstatRect  LEgsaSubarea
#>        <char> <lgcl>         <lgcl>      <char>     <char>        <char>
#> 1: 2022-11-28     NA             NA 27.3.d.28.1       44H3 NotApplicable
#> 2: 2022-12-05     NA             NA 27.3.d.28.1       44H3 NotApplicable
#> 3: 2022-11-10     NA             NA 27.3.d.28.1       44H2 NotApplicable
#>    LEjurisdArea LEgeoDatBas LEgeoSou LEnatFishAct LEmetier5         LEmetier6
#>          <lgcl>      <char>   <char>       <lgcl>    <lgcl>            <char>
#> 1:           NA    Official     Logb           NA        NA OTM_SPF_16-31_0_0
#> 2:           NA    Official     Logb           NA        NA OTM_SPF_16-31_0_0
#> 3:           NA    Official     Logb           NA        NA OTM_SPF_16-31_0_0
#>    LEgear LEgeaDatBas LEgearSou LEmeshSize LEselDev LEselDevMeshSize LEtarget
#>    <char>      <char>    <char>      <int>   <lgcl>           <lgcl>   <lgcl>
#> 1:    OTM    Official    CombOD         18       NA               NA       NA
#> 2:    OTM    Official    CombOD         18       NA               NA       NA
#> 3:    OTM    Official    CombOD         20       NA               NA       NA
#>    LEnumTotal LEnumSamp LEselProb LEselectMeth LEselectMethCluster
#>         <int>     <int>    <lgcl>       <char>              <lgcl>
#> 1:          1         1        NA       SRSWOR                  NA
#> 2:          1         1        NA       SRSWOR                  NA
#> 3:          1         1        NA       SRSWOR                  NA
#>    LEnumTotalClusters LEnumSampClusters LEincProb LEsamp LEnoSampReason
#>                <lgcl>            <lgcl>    <lgcl> <char>         <lgcl>
#> 1:                 NA                NA        NA      Y             NA
#> 2:                 NA                NA        NA      Y             NA
#> 3:                 NA                NA        NA      Y             NA
#>    LEnonRespCol LEfullTripAva LEencrVessCode LEunitName LEmitiDev LEgearDim
#>          <char>        <char>         <char>     <char>    <char>    <lgcl>
#> 1:            N          <NA>              6  LAN-43256      None        NA
#> 2:            N          <NA>              6  LAN-43356      None        NA
#> 3:            N          <NA>           1283  LAN-43016      None        NA
#>    LEselProbCluster LEincProbCluster LEauxVarTot LEauxVarValue LEauxVarName
#>              <lgcl>           <lgcl>      <lgcl>        <lgcl>       <lgcl>
#> 1:               NA               NA          NA            NA           NA
#> 2:               NA               NA          NA            NA           NA
#> 3:               NA               NA          NA            NA           NA
#>    LEauxVarUnit LEfishManUnit
#>          <lgcl>        <lgcl>
#> 1:           NA            NA
#> 2:           NA            NA
#> 3:           NA            NA

Several values can be used to get a subset of the table.

#get the SA table corresponding to the first 2 TEids in the H8ExampleEE1 object
getLinkedDataFromLevel("TEid", c(1,2), H8ExampleEE1, "SA", verbose = TRUE)
#> Traversing downwards in the table hierarchy from TE to SA 
#> TEid: 1, 2
#> VSid: 1, 2, 3
#> LEid: 1, 2, 3
#> SSid: 1, 2, 3
#> Key: <SAid>
#>     SAid SAparSequNum  SSid SArecType SAseqNum SAstratification SAstratumName
#>    <num>       <lgcl> <num>    <char>    <num>           <char>        <char>
#> 1:     1           NA     1        SA        1                N             U
#> 2:     2           NA     2        SA        2                N             U
#> 3:     3           NA     3        SA        3                N             U
#>    SAspeCode SAspeCodeFAO SApres SAstateOfProc SAcatchCat SAlandCat
#>        <num>       <char> <char>        <char>     <char>    <char>
#> 1:    126425          SPR    WHL           FRE        Lan       HuC
#> 2:    126425          SPR    WHL           FRE        Lan       HuC
#> 3:    126425          SPR    WHL           FRE        Lan       HuC
#>    SAcommCatScl SAcommCat  SAsex SAunitName SAunitType SAtotalWtLive
#>          <lgcl>    <lgcl> <char>     <char>     <char>        <lgcl>
#> 1:           NA        NA      U     202201        Box            NA
#> 2:           NA        NA      U     202202        Box            NA
#> 3:           NA        NA      U     202203        Box            NA
#>    SAsampWtLive SAnumTotal SAnumSamp SAselProb SAselectMeth SAlowHierarchy
#>          <lgcl>      <num>     <int>    <lgcl>       <char>         <char>
#> 1:           NA       1279         1        NA       SRSWOR              C
#> 2:           NA       2034         1        NA       SRSWOR              C
#> 3:           NA       6666         1        NA       SRSWOR              C
#>       SAsampler SAnoSampReasonFM SAnoSampReasonBV SAtotalWtMes SAtotWtMeaDatBas
#>          <char>           <lgcl>           <lgcl>        <num>           <char>
#> 1: SelfSampling               NA               NA       750000         Official
#> 2: SelfSampling               NA               NA       700000         Official
#> 3: SelfSampling               NA               NA      1500000         Official
#>    SAsampWtMes            SAspecState  SAgsaSubarea SAarea SAstatRect
#>          <num>                 <char>        <char> <lgcl>     <lgcl>
#> 1:         586 DeadOrZeroProbSurvival NotApplicable     NA         NA
#> 2:         344 DeadOrZeroProbSurvival NotApplicable     NA         NA
#> 3:         225 DeadOrZeroProbSurvival NotApplicable     NA         NA
#>    SAeconZoneIndi SAjurisdArea SAgeoDatBas SAgeoSou SAnatFishAct SAmetier5
#>            <lgcl>       <lgcl>      <char>   <char>       <lgcl>    <lgcl>
#> 1:             NA           NA        <NA>     <NA>           NA        NA
#> 2:             NA           NA        <NA>     <NA>           NA        NA
#> 3:             NA           NA        <NA>     <NA>           NA        NA
#>    SAmetier6 SAgear SAgeaDatBas SAgearSou SAmeshSize SAselDev SAselDevMeshSize
#>       <lgcl> <lgcl>      <char>    <char>     <lgcl>   <lgcl>           <lgcl>
#> 1:        NA     NA        <NA>      <NA>         NA       NA               NA
#> 2:        NA     NA        <NA>      <NA>         NA       NA               NA
#> 3:        NA     NA        <NA>      <NA>         NA       NA               NA
#>    SAincProb SAsamp SAconFacMeasLive SAnoSampReason SAnonRespCol SAauxVarTot
#>       <lgcl> <char>           <lgcl>         <lgcl>       <char>      <lgcl>
#> 1:        NA      Y               NA             NA            N          NA
#> 2:        NA      Y               NA             NA            N          NA
#> 3:        NA      Y               NA             NA            N          NA
#>    SAauxVarValue SAauxVarName SAauxVarUnit SAfishManUnit  LEid
#>           <lgcl>       <lgcl>       <lgcl>        <lgcl> <num>
#> 1:            NA           NA           NA            NA    NA
#> 2:            NA           NA           NA            NA    NA
#> 3:            NA           NA           NA            NA    NA

Also lower hierarchy tables can be used to get the subset of the higher hierarchy tables.

#which vessel caught those fish?
getLinkedDataFromLevel("BVfishId", c("410472143", "410472144"), H8ExampleEE1, "VS", TRUE)
#> Traversing upwards in the table hierarchy from BV to VS 
#> [1] "BV: 410472143, 410472144"
#> [1] "Skipping: FM"
#> [1] "SA: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5"
#> [1] "SS: 5"
#> [1] "LE: 5"
#> [1] "Skipping: FT"
#> Key: <VSid>
#>     VSid  SDid  VDid  TEid VSrecType VSstratification VSstratumName VSunitName
#>    <num> <num> <num> <num>    <char>           <char>        <char>     <lgcl>
#> 1:     5     1   500     4        VS                N             U         NA
#>    VSseqNum VSencrVessCode VSclustering VSclusterName VSsampler VSnumTotal
#>       <num>          <int>       <char>        <char>    <lgcl>      <int>
#> 1:        5           8161            N             U        NA         11
#>    VSnumSamp VSincProb VSselectMeth VSselectMethCluster VSnumTotalClusters
#>        <int>    <lgcl>       <char>              <lgcl>             <lgcl>
#> 1:         2        NA        SRSWR                  NA                 NA
#>    VSnumSampClusters VSselProb VSselProbCluster VSincProbCluster VSsamp
#>               <lgcl>    <lgcl>           <lgcl>           <lgcl> <char>
#> 1:                NA        NA               NA               NA      Y
#>    VSnoSampReason VSnonRespCol VSauxVarTot VSauxVarValue VSauxVarName
#>            <lgcl>       <char>      <lgcl>        <lgcl>       <lgcl>
#> 1:             NA            N          NA            NA           NA
#>    VSauxVarUnit
#>          <lgcl>
#> 1:           NA

If some table is missing it is skipped if possible. If it is not possible to skip it, the function will return an error.

See also other package vignettes:

#END