Skip to content

Model shapes

Introduction

The datastore is an online store of scattering model shapes, metadata, and raw data. Access is via a simple RESTful web API and we have developed a prototype for testing, but the intention is for to ICES to eventually host and manage it.

Input data to the datastore are TOML- or JSON-formatted text files containing the shapes and metadata, and miscellaneous other files (e.g., raw images). The datastore schema:

  • Specifies the data format of the TOML or JSON files (required attributes, data, units, valid values, etc)
  • Is the authoritative source of those data files
  • Is used to validate data files before the datastore will accept them
  • Is implemented as a JSON Schema

The data format has been designed to be easy to work with (supported by multiple programming languages, text-based, readable by people) and general enough to store all types of shapes used in fish and plankton scattering models.

The structure and layout of all other files (raw images, scripts, etc) are not controlled.

Data format

A specimen contains one or more shapes, representing parts of the specimen (e.g., body, swimbladder, backbone, etc), and metadata about the specimen. A dataset can be used to group specimens.

There are different shape types:

  • outline - dorsal and lateral outlines
  • surface - 3D triangulated surface mesh
  • voxels - 3D grid of density and sound speed values
  • categorised voxels - 3D grid of categorised material properties
  • geometric - combinations of simple shapes (cylinders, spheroids) - not yet fully implemented

In the data format, each shape type has data attributes for storing the shape (documentation here). Unit and coordinate systems are the same as in echoSMs.

Creating input files

Input files can be created by hand, but this quickly becomes tedious, especially for the shape data itself. Using a script to create the TOML files is better.

A notebook demonstrating how to create and validate an input data file is notebooks/creating a datastore input file.ipynb. Further details are in the echoSMs documentation.

The R jsonvalidate and rjsoncons packages can also be used for validation (no notebook is provided for this).

Currently, there is no way for you to load an input file to the datastore (it's a manual process that Gavin does).

Using a local version of the datastore

Your online computer can run a local version of the web API using local data files.

  1. Run the creating a datastore input file.ipynb notebook. This creates a TOML file for a made-up specimen and stores it in the ~/datasets directory
  2. Copy other example files to the ~/datasets directory:
    1. cp -r ~/echoSMs-2026-FAST-workshop/shapes ~/datasets
  3. Process the TOML files in the ~/datasets directory into the form needed by the web API:
    1. cd ~/echoSMs-2026-FAST-workshop/src
    2. python process_for_datastore.py
    3. Fix any datafile errors and repeat until there are none
  4. Start the local web API:
    1. pip install jmespath (only need to do this once)
    2. fastapi dev
    3. Run the getting_shapes.ipynb notebook, uncommenting the # api_URL = 'http://127.0.0.1:8000' line

Getting shapes

This is done with the RESTful web API. This is a simple API that will fit many (but not all) uses.

Documentation on the API is here.

There are API calls to:

  • retrieve specimen metadata (excluding the shape data) with filtering on some metadata attributes
  • retrieve data for a specific specimen (shape data and metadata)
  • retrieve an image of the shape data for a specific specimen

API calls can be tested in a web browser. For example:

The notebooks/gettting shapes.ipynb notebook shows more details on using the web API.