01b Manipulating RDBESDataObjects
2025-10-15
Source:vignettes/v01b-manipulating-rdbesdataobjects.Rmd
v01b-manipulating-rdbesdataobjects.Rmd
Introduction
The aim of this document is to illustrate some of the ways of manipulating RDBESDataObjects.
Prerequisites
First we’ll load some example data from the RDBES and check it’s valid. It’s a good idea to check your RDBESDataObjects are valid after any manipulations you perform. See how to import your own data in the vignette Import RDBES data In this vignette package example data is used.
# load Hierarchy 1 demo data
myH1RawObject <- H1Example
validateRDBESDataObject(myH1RawObject, verbose = FALSE)
Print, sort and summary of RDBESDataObject
The print method gives list of non-null tables in the RDBESDataObject. The structure of the output for each table is:
- Table name (DE, TE, LE, etc.)
- Number of rows
- Sampling method (if applicable, SWRWR, NPJS, CENSUS, etc.)
- Range of number sampled (if applicable)
- Range of number total (if applicable)
If there is a single hierarchy present the output is ordered by RDBES hierarchy structure.
print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 6
#> IS: 25
Underling the print function there is a summary function that retains only unique rows for some columns used in print.
h1summary <- summary(myH1RawObject)
#get the hierarchy
h1summary$hierarchy
#> [1] 1
#extract the number of rows in tables from the summary
sapply(h1summary$data, function(x){x$rows})
#> DE SD VS FT FO SS SA FM BV VD SL IS
#> 8 8 1214 1430 1916 1916 1916 7290 14580 311 6 25
To get the number of rows in each non-null table you can simply call the object:
myH1RawObject #equivalent to print(summary(myH1RawObject))
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 6
#> IS: 25
To get an example Hierarchy 5 data:
# Hierarchy 5 demo data
myH5RawObject <- H5Example
validateRDBESDataObject(myH5RawObject, verbose = FALSE)
# Number of rows in each non-null table and hierarchy
print(myH5RawObject)
#> Hierarchy 5 RDBESdataObject:
#> DE: 3
#> SD: 3
#> OS: 27 (SRSWR: 3/100)
#> LE: 27 (SRSWR: 1/2)
#> FT: 27 (SRSWR: 1/1)
#> SS: 27 (SRSWR: 1/4)
#> SA: 27 (SRSWR: 1/2)
#> FM: 270
#> BV: 540 (SRSWR: 2/2)
#> VD: 311
#> SL: 1
#> IS: 20
#> CL: 497
#> CE: 99
Sorting RDBESDataObject
If the data has a single hierarchy in the DE table a correct sort order is defined for it. You can use the sort() method for it. Printing the object sorts it automatically if possible.
#before sorting
names(H8ExampleEE1)
#> [1] "DE" "SD" "VS" "FT" "FO" "TE" "LO" "OS" "LE" "SS" "SA" "FM" "BV" "VD" "SL"
#> [16] "IS" "CL" "CE"
#after sorting
names(sort(H8ExampleEE1))
#> [1] "DE" "SD" "TE" "VS" "FT" "LE" "SS" "SA" "FM" "BV" "FO" "LO" "OS" "VD" "SL"
#> [16] "IS" "CL" "CE"
#printing the summary
H8ExampleEE1
#> Hierarchy 8 RDBESdataObject:
#> DE: 1
#> SD: 1
#> TE: 11 (SRSWOR: 2-3/4)
#> VS: 15 (SRSWR: 1-2/7-12)
#> LE: 15 (SRSWOR: 1/1)
#> SS: 15 (CENSUS: 1/1)
#> SA: 15 (SRSWOR: 1/794-14268)
#> BV: 3995 (CENSUS: 16-100/16-100)
#> VD: 7
#> SL: 1
#> IS: 2
#> CL: 71
#> CE: 132
Combining RDBESDataObjects
RDBESDataObjects can be combined using the combineRDBESDataObjects() function. This might be required when different sampling schemes are used to collect on-shore and at-sea samples - it will often be required to combine all the data together before further analysis.
myCombinedRawObject <- combineRDBESDataObjects(myH1RawObject,
myH5RawObject)
# Number of rows in each non-null table and hierarchies
print(myCombinedRawObject)
#> Warning: No sort order for multiple hierarchies can be defined!
#> Warning: Mixed hierarchy RDBESDataObject!
#> Hierarchy 1 RDBESdataObject:
#> Hierarchy 5 RDBESdataObject:
#> DE: 11
#> SD: 11
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1457 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> OS: 27 (SRSWR: 3/100)
#> LE: 27 (SRSWR: 1/2)
#> SS: 1943 (CENSUS,SRSWR: 1/1-4)
#> SA: 1943 (CENSUS,SRSWR: 1/1-2)
#> FM: 7560
#> BV: 15120 (SRSWR: 2/2)
#> VD: 311
#> SL: 7
#> IS: 25
#> CL: 497
#> CE: 99
validateRDBESDataObject(myCombinedRawObject, verbose = FALSE)
Filtering RDBESDataObjects
RDBESDataObjects can be filtered using the filterRDBESDataObject() function - this allows the RDBESDataObject to be filtered by any field. A typical use of filtering might be to extract all data collected in a particular ICES division.
myH1RawObject <- H1Example
# Number of rows in each non-null table
print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 6
#> IS: 25
myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc")
myValues <- c("ZW","ZWBZH","ZWVFA" )
myFilteredObject <- filterRDBESDataObject(myH1RawObject,
fieldsToFilter = myFields,
valuesToFilter = myValues )
# Number of rows in each non-null table
print(myFilteredObject)
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 36 (SRSWR: 3/100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 310
#> SL: 6
#> IS: 25
validateRDBESDataObject(myFilteredObject, verbose = FALSE)
It is important to note that filtering is likely to result in “orphan” rows being produced so it is usual to also apply the findAndKillOrphans() function to the filtered data to remove these records.
myFilteredObjectNoOrphans <-
findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE)
validateRDBESDataObject(myFilteredObjectNoOrphans, verbose = FALSE)
You can also remove any records that are not linking to a row in the VesselDetails (VD) table using the removeBrokenVesselLinks() function.
myFields <- c("VDlenCat")
myValues <- c("18-<24" )
myFilteredObject <- filterRDBESDataObject(myFilteredObjectNoOrphans,
fieldsToFilter = myFields,
valuesToFilter = myValues )
myFilteredObjectValidVesselLinks <- removeBrokenVesselLinks(
objectToCheck = myFilteredObject,
verbose = FALSE)
validateRDBESDataObject(myFilteredObjectValidVesselLinks, verbose = FALSE)
Finally you can also remove any records that are not linking to an entry in the SpeciesListDetails (SL) table using the removeBrokenSpeciesListLinks() function.
myFields <- c("SLspeclistName")
myValues <- c("ZW_1965_SpeciesList" )
myFilteredObjectValidSpeciesLinks <- filterRDBESDataObject(myFilteredObjectValidVesselLinks,
fieldsToFilter = myFields,
valuesToFilter = myValues )
myFilteredObjectValidSpeciesLinks <- removeBrokenSpeciesListLinks(
objectToCheck = myFilteredObjectValidSpeciesLinks,
verbose = FALSE)
validateRDBESDataObject(myFilteredObjectValidSpeciesLinks, verbose = FALSE)
Getting Subsets of RDBESDataObject Tables
Sometimes it we want to see how a field or values in the RDBESDataObject are connected to otther tables. One use case would be e.g. to see when a specific Landing Event (LE) occured.For this we can use the getLinkedDataFromLevel() function.
#get the TE table corresponding to the first LEid in the H8ExampleEE1 object
getLinkedDataFromLevel("LEid", c(1), H8ExampleEE1, "TE", verbose = TRUE)
#> Traversing upwards in the table hierarchy from LE to TE
#> [1] "LE: 1"
#> [1] "Skipping: FT"
#> [1] "VS: 1"
#> Key: <TEid>
#> TEid SDid LOid VSid TErecType TEseqNum TEstratification TEtimeUnit
#> <num> <num> <lgcl> <lgcl> <char> <num> <char> <char>
#> 1: 1 1 NA NA TE 1 Y Week
#> TEstratumName TEclustering TEclusterName TEsampler TEnumTotal TEnumSamp
#> <char> <char> <char> <lgcl> <num> <int>
#> 1: November N U NA 4 2
#> TEincProb TEselectMeth TEunitName TEselProb TEincProbCluster TEsamp
#> <lgcl> <char> <char> <lgcl> <lgcl> <char>
#> 1: NA SRSWOR 1-week-48 NA NA Y
#> TEselectMethCluster TEnumTotalClusters TEnumSampClusters TEselProbCluster
#> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: NA NA NA NA
#> TEnoSampReason TEnonRespCol TEauxVarTot TEauxVarValue TEauxVarName
#> <lgcl> <char> <lgcl> <lgcl> <lgcl>
#> 1: NA N NA NA NA
#> TEauxVarUnit
#> <lgcl>
#> 1: NA
Similarly we can get the subset of the LE table corresponding to a specific value in the TE table. This does not have to be the id field, but can be any field in the table.
#get the TE table corresponding to the first LEid in the H8ExampleEE1 object
getLinkedDataFromLevel("TEstratumName", c("November"), H8ExampleEE1, "LE", verbose = TRUE)
#> Traversing downwards in the table hierarchy from TE to LE
#> TEstratumName: November
#> VSid: 1, 2, 11
#> Key: <LEid>
#> LEid FOid OSid FTid VSid VDid TEid SSid SAid LErecType
#> <num> <num> <lgcl> <lgcl> <num> <num> <num> <lgcl> <lgcl> <char>
#> 1: 1 NA NA NA 1 6 1 NA NA LE
#> 2: 2 NA NA NA 2 6 1 NA NA LE
#> 3: 11 NA NA NA 11 1255 8 NA NA LE
#> LEstratification LEseqNum LEhaulNum LEstratumName LEclustering LEclusterName
#> <char> <num> <lgcl> <char> <char> <char>
#> 1: N 1 NA U N U
#> 2: N 2 NA U N U
#> 3: N 11 NA U N U
#> LEsampler LEmixedTrip LEcatReg LElocode LElocName LElocType LEctry
#> <lgcl> <char> <char> <char> <lgcl> <char> <char>
#> 1: NA N Lan EEPRN NA Port EE
#> 2: NA N Lan EEPRN NA Port EE
#> 3: NA N Lan EERMS NA Port EE
#> LEdate LEtime LEeconZoneIndi LEarea LEstatRect LEgsaSubarea
#> <char> <lgcl> <lgcl> <char> <char> <char>
#> 1: 2022-11-28 NA NA 27.3.d.28.1 44H3 NotApplicable
#> 2: 2022-12-05 NA NA 27.3.d.28.1 44H3 NotApplicable
#> 3: 2022-11-10 NA NA 27.3.d.28.1 44H2 NotApplicable
#> LEjurisdArea LEgeoDatBas LEgeoSou LEnatFishAct LEmetier5 LEmetier6
#> <lgcl> <char> <char> <lgcl> <lgcl> <char>
#> 1: NA Official Logb NA NA OTM_SPF_16-31_0_0
#> 2: NA Official Logb NA NA OTM_SPF_16-31_0_0
#> 3: NA Official Logb NA NA OTM_SPF_16-31_0_0
#> LEgear LEgeaDatBas LEgearSou LEmeshSize LEselDev LEselDevMeshSize LEtarget
#> <char> <char> <char> <int> <lgcl> <lgcl> <lgcl>
#> 1: OTM Official CombOD 18 NA NA NA
#> 2: OTM Official CombOD 18 NA NA NA
#> 3: OTM Official CombOD 20 NA NA NA
#> LEnumTotal LEnumSamp LEselProb LEselectMeth LEselectMethCluster
#> <int> <int> <lgcl> <char> <lgcl>
#> 1: 1 1 NA SRSWOR NA
#> 2: 1 1 NA SRSWOR NA
#> 3: 1 1 NA SRSWOR NA
#> LEnumTotalClusters LEnumSampClusters LEincProb LEsamp LEnoSampReason
#> <lgcl> <lgcl> <lgcl> <char> <lgcl>
#> 1: NA NA NA Y NA
#> 2: NA NA NA Y NA
#> 3: NA NA NA Y NA
#> LEnonRespCol LEfullTripAva LEencrVessCode LEunitName LEmitiDev LEgearDim
#> <char> <char> <char> <char> <char> <lgcl>
#> 1: N <NA> 6 LAN-43256 None NA
#> 2: N <NA> 6 LAN-43356 None NA
#> 3: N <NA> 1283 LAN-43016 None NA
#> LEselProbCluster LEincProbCluster LEauxVarTot LEauxVarValue LEauxVarName
#> <lgcl> <lgcl> <lgcl> <lgcl> <lgcl>
#> 1: NA NA NA NA NA
#> 2: NA NA NA NA NA
#> 3: NA NA NA NA NA
#> LEauxVarUnit LEfishManUnit
#> <lgcl> <lgcl>
#> 1: NA NA
#> 2: NA NA
#> 3: NA NA
Several values can be used to get a subset of the table.
#get the SA table corresponding to the first 2 TEids in the H8ExampleEE1 object
getLinkedDataFromLevel("TEid", c(1,2), H8ExampleEE1, "SA", verbose = TRUE)
#> Traversing downwards in the table hierarchy from TE to SA
#> TEid: 1, 2
#> VSid: 1, 2, 3
#> LEid: 1, 2, 3
#> SSid: 1, 2, 3
#> Key: <SAid>
#> SAid SAparSequNum SSid SArecType SAseqNum SAstratification SAstratumName
#> <num> <lgcl> <num> <char> <num> <char> <char>
#> 1: 1 NA 1 SA 1 N U
#> 2: 2 NA 2 SA 2 N U
#> 3: 3 NA 3 SA 3 N U
#> SAspeCode SAspeCodeFAO SApres SAstateOfProc SAcatchCat SAlandCat
#> <num> <char> <char> <char> <char> <char>
#> 1: 126425 SPR WHL FRE Lan HuC
#> 2: 126425 SPR WHL FRE Lan HuC
#> 3: 126425 SPR WHL FRE Lan HuC
#> SAcommCatScl SAcommCat SAsex SAunitName SAunitType SAtotalWtLive
#> <lgcl> <lgcl> <char> <char> <char> <lgcl>
#> 1: NA NA U 202201 Box NA
#> 2: NA NA U 202202 Box NA
#> 3: NA NA U 202203 Box NA
#> SAsampWtLive SAnumTotal SAnumSamp SAselProb SAselectMeth SAlowHierarchy
#> <lgcl> <num> <int> <lgcl> <char> <char>
#> 1: NA 1279 1 NA SRSWOR C
#> 2: NA 2034 1 NA SRSWOR C
#> 3: NA 6666 1 NA SRSWOR C
#> SAsampler SAnoSampReasonFM SAnoSampReasonBV SAtotalWtMes SAtotWtMeaDatBas
#> <char> <lgcl> <lgcl> <num> <char>
#> 1: SelfSampling NA NA 750000 Official
#> 2: SelfSampling NA NA 700000 Official
#> 3: SelfSampling NA NA 1500000 Official
#> SAsampWtMes SAspecState SAgsaSubarea SAarea SAstatRect
#> <num> <char> <char> <lgcl> <lgcl>
#> 1: 586 DeadOrZeroProbSurvival NotApplicable NA NA
#> 2: 344 DeadOrZeroProbSurvival NotApplicable NA NA
#> 3: 225 DeadOrZeroProbSurvival NotApplicable NA NA
#> SAeconZoneIndi SAjurisdArea SAgeoDatBas SAgeoSou SAnatFishAct SAmetier5
#> <lgcl> <lgcl> <char> <char> <lgcl> <lgcl>
#> 1: NA NA <NA> <NA> NA NA
#> 2: NA NA <NA> <NA> NA NA
#> 3: NA NA <NA> <NA> NA NA
#> SAmetier6 SAgear SAgeaDatBas SAgearSou SAmeshSize SAselDev SAselDevMeshSize
#> <lgcl> <lgcl> <char> <char> <lgcl> <lgcl> <lgcl>
#> 1: NA NA <NA> <NA> NA NA NA
#> 2: NA NA <NA> <NA> NA NA NA
#> 3: NA NA <NA> <NA> NA NA NA
#> SAincProb SAsamp SAconFacMeasLive SAnoSampReason SAnonRespCol SAauxVarTot
#> <lgcl> <char> <lgcl> <lgcl> <char> <lgcl>
#> 1: NA Y NA NA N NA
#> 2: NA Y NA NA N NA
#> 3: NA Y NA NA N NA
#> SAauxVarValue SAauxVarName SAauxVarUnit SAfishManUnit LEid
#> <lgcl> <lgcl> <lgcl> <lgcl> <num>
#> 1: NA NA NA NA NA
#> 2: NA NA NA NA NA
#> 3: NA NA NA NA NA
Also lower hierarchy tables can be used to get the subset of the higher hierarchy tables.
#which vessel caught those fish?
getLinkedDataFromLevel("BVfishId", c("410472143", "410472144"), H8ExampleEE1, "VS", TRUE)
#> Traversing upwards in the table hierarchy from BV to VS
#> [1] "BV: 410472143, 410472144"
#> [1] "Skipping: FM"
#> [1] "SA: 5, 5, 5, 5, 5, 5, 5, 5, 5, 5"
#> [1] "SS: 5"
#> [1] "LE: 5"
#> [1] "Skipping: FT"
#> Key: <VSid>
#> VSid SDid VDid TEid VSrecType VSstratification VSstratumName VSunitName
#> <num> <num> <num> <num> <char> <char> <char> <lgcl>
#> 1: 5 1 500 4 VS N U NA
#> VSseqNum VSencrVessCode VSclustering VSclusterName VSsampler VSnumTotal
#> <num> <int> <char> <char> <lgcl> <int>
#> 1: 5 8161 N U NA 11
#> VSnumSamp VSincProb VSselectMeth VSselectMethCluster VSnumTotalClusters
#> <int> <lgcl> <char> <lgcl> <lgcl>
#> 1: 2 NA SRSWR NA NA
#> VSnumSampClusters VSselProb VSselProbCluster VSincProbCluster VSsamp
#> <lgcl> <lgcl> <lgcl> <lgcl> <char>
#> 1: NA NA NA NA Y
#> VSnoSampReason VSnonRespCol VSauxVarTot VSauxVarValue VSauxVarName
#> <lgcl> <char> <lgcl> <lgcl> <lgcl>
#> 1: NA N NA NA NA
#> VSauxVarUnit
#> <lgcl>
#> 1: NA
If some table is missing it is skipped if possible. If it is not possible to skip it, the function will return an error.
See also other package vignettes:
- v01a data import
- v01b manipulating rdbesdataobjects
- v02a generating probabilities
- v02b Generating zeros for species not observed
- v02c Generating NAs for species not targeted by sampling
- v02d update SA with taxon from SL
- v03a Estimating population parameters unbiased estimator
- v03b Estimating population parameters ratio estimator
- v04 Create IC format from estimation results
#END