vignettes/manipulating-rdbesdataobjects.Rmd
manipulating-rdbesdataobjects.Rmd
The aim of this document is to illustrate some of the ways of manipulating RDBESDataObjects.
First we’ll load some example data from the RDBES and check it’s valid. It’s a good idea to check your RDBESDataObjects are valid after any manipulations you perform. See how to import your own data in the vignette Import RDBES data In this vignette package example data is used.
# load Hierarchy 1 demo data
myH1RawObject <- H1Example
validateRDBESDataObject(myH1RawObject, verbose = FALSE)
The print method gives list of non-null tables in the RDBESDataObject. The structure of the output for each table is:
If there is a single hierarchy present the output is ordered by RDBES hierarchy structure.
print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 170
Underling the print function there is a summary function that retains only unique rows for some columns used in print.
h1summary <- summary(myH1RawObject)
#get the hierarchy
h1summary$hierarchy
#> [1] 1
#extract the number of rows in tables from the summary
sapply(h1summary$data, function(x){x$rows})
#> DE SD VS FT FO SS SA FM BV VD SL
#> 8 8 1214 1430 1916 1916 1916 7290 14580 311 170
To get the number of rows in each non-null table you can simply call the object:
myH1RawObject #equivalent to print(summary(myH1RawObject))
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 170
To get an example Hierarchy 5 data:
# Hierarchy 5 demo data
myH5RawObject <- H5Example
validateRDBESDataObject(myH5RawObject, verbose = FALSE)
# Number of rows in each non-null table and hierarchy
print(myH5RawObject)
#> Hierarchy 5 RDBESdataObject:
#> DE: 3
#> SD: 3
#> OS: 27 (SRSWR: 3/100)
#> LE: 27 (SRSWR: 1/2)
#> FT: 27 (SRSWR: 1/1)
#> SS: 27 (SRSWR: 1/4)
#> SA: 27 (SRSWR: 1/2)
#> FM: 270
#> BV: 540 (SRSWR: 2/2)
#> VD: 311
#> SL: 170
If the data has a single hierarchy in the DE table a correct sort order is defined for it. You can use the sort() method for it. Printing the object sorts it automatically if possible.
#before sorting
names(H8ExampleEE1)
#> [1] "DE" "SD" "VS" "FT" "FO" "TE" "LO" "OS" "LE" "SS" "SA" "FM" "BV" "VD" "SL"
#> [16] "CL" "CE"
#after sorting
names(sort(H8ExampleEE1))
#> [1] "DE" "SD" "TE" "VS" "FT" "LE" "SS" "SA" "FM" "BV" "FO" "LO" "OS" "VD" "SL"
#> [16] "CL" "CE"
#printing the summary
H8ExampleEE1
#> Hierarchy 8 RDBESdataObject:
#> DE: 1
#> SD: 1
#> TE: 11 (SRSWOR: 2-3/4)
#> VS: 14 (SRSWR: 1-2/7-12)
#> LE: 15 (SRSWOR: 1-2/1-2)
#> SS: 15 (CENSUS: 2/2)
#> SA: 15 (SRSWOR: 1/794-14268)
#> BV: 3995 (CENSUS: 16-100/16-100)
#> VD: 7
#> SL: 2
#> CL: 71
#> CE: 132
RDBESDataObjects can be combined using the combineRDBESDataObjects() function. This might be required when different sampling schemes are used to collect on-shore and at-sea samples - it will often be required to combine all the data together before further analysis.
myCombinedRawObject <- combineRDBESDataObjects(myH1RawObject,
myH5RawObject)
# Number of rows in each non-null table and hierarchies
print(myCombinedRawObject)
#> Warning: No sort order for multiple hierarchies can be defined!
#> Warning: Mixed hierarchy RDBESDataObject!
#> Hierarchy 1 RDBESdataObject:
#> Hierarchy 5 RDBESdataObject:
#> DE: 11
#> SD: 11
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1457 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> OS: 27 (SRSWR: 3/100)
#> LE: 27 (SRSWR: 1/2)
#> SS: 1943 (CENSUS,SRSWR: 1/1-4)
#> SA: 1943 (CENSUS,SRSWR: 1/1-2)
#> FM: 7560
#> BV: 15120 (SRSWR: 2/2)
#> VD: 311
#> SL: 170
validateRDBESDataObject(myCombinedRawObject, verbose = FALSE)
RDBESDataObjects can be filtered using the filterRDBESDataObject() function - this allows the RDBESDataObject to be filtered by any field. A typical use of filtering might be to extract all data collected in a particular ICES division.
myH1RawObject <- H1Example
# Number of rows in each non-null table
print(myH1RawObject)
#> Hierarchy 1 RDBESdataObject:
#> DE: 8
#> SD: 8
#> VS: 1214 (SRSWOR,CENSUS,SRSWR: 2-135/4-1382)
#> FT: 1430 (CENSUS,SRSWR: 1-3/1-100)
#> FO: 1916 (CENSUS,SRSWR: 1-3/1-20)
#> SS: 1916 (CENSUS,SRSWR: 1/1-4)
#> SA: 1916 (CENSUS,SRSWR: 1/1-2)
#> FM: 7290
#> BV: 14580 (SRSWR: 2/2)
#> VD: 311
#> SL: 170
myFields <- c("SDctry","VDctry","VDflgCtry","FTarvLoc")
myValues <- c("ZW","ZWBZH","ZWVFA" )
myFilteredObject <- filterRDBESDataObject(myH1RawObject,
fieldsToFilter = myFields,
valuesToFilter = myValues )
# Number of rows in each non-null table
unlist(summary(myFilteredObject)$rows)
#> NULL
validateRDBESDataObject(myFilteredObject, verbose = FALSE)
It is important to note that filtering is likely to result in “orphan” rows being produced so it is usual to also apply the findAndKillOrphans() function to the filtered data to remove these records.
myFilteredObjectNoOrphans <-
findAndKillOrphans(objectToCheck = myFilteredObject, verbose = FALSE)
validateRDBESDataObject(myFilteredObjectNoOrphans, verbose = FALSE)
You can also remove any records that are not linking to a row in the VesselDetails (VD) table using the removeBrokenVesselLinks() function.
myFields <- c("VDlenCat")
myValues <- c("18-<24" )
myFilteredObject <- filterRDBESDataObject(myFilteredObjectNoOrphans,
fieldsToFilter = myFields,
valuesToFilter = myValues )
myFilteredObjectValidVesselLinks <- removeBrokenVesselLinks(
objectToCheck = myFilteredObject,
verbose = FALSE)
validateRDBESDataObject(myFilteredObjectValidVesselLinks, verbose = FALSE)
Finally you can also remove any records that are not linking to an entry in the SpeciesListDetails (SL) table using the removeBrokenSpeciesListLinks() function.
myFields <- c("SLspeclistName")
myValues <- c("ZW_1965_SpeciesList" )
myFilteredObjectValidSpeciesLinks <- filterRDBESDataObject(myFilteredObjectValidVesselLinks,
fieldsToFilter = myFields,
valuesToFilter = myValues )
myFilteredObjectValidSpeciesLinks <- removeBrokenSpeciesListLinks(
objectToCheck = myFilteredObjectValidSpeciesLinks,
verbose = FALSE)
validateRDBESDataObject(myFilteredObjectValidSpeciesLinks, verbose = FALSE)
See also other package vignettes: