Biodiversity information handling and data quality

Posted by AaronClausen

NatureMapr's mission is to empower anybody to report plant or animal information anywhere in Australia and ensure the information gets to the people that need to know about it.

Part of that mission is to ensure Aussie biodiversity information is easily and readily accessible so it can be used to help achieve real world outcomes for the environment.

Australia's biodiversity data is a unique and special asset which deserves to be handled accordingly.

NatureMapr is responsible for handling and safeguarding this information appropriately - and it's a responsibility we've always taken extremely seriously.

Storage of information

NatureMapr stores both sensitive and non-sensitive information in Sydney, Australia in a secure data center which meets Australian government information handling guidelines.

NatureMapr does not store or send sensitive Australian biodiversity information overseas.

Safeguarding of sensitive information

NatureMapr is responsible for the safeguarding of sensitive information including:

  • Precise location of sightings which are yet to be identified
  • Precise location of highly sensitive species
  • Precise location of highly sensitive specimens including associated places like nesting or breeding sites
  • Sensitive descriptions and/or meta data of highly sensitive species and/or specimens
  • Personal or private property location

NatureMapr makes this type of sensitive information available in accordance with the National Framework for the Sharing of Restricted Access Species Data in Australia (RASD).

NatureMapr does not feed this type of sensitive information to data partners that are unable to guarantee the identical safeguarding of this information downstream.

NatureMapr's sensitive species safeguarding framework is enforced universally by the NatureMapr API and respected throughout all parts of the platform.

If a sighting is re-identified as a different species over time, it will automatically inherit the sensitivity level, and therefore safeguarding behaviour, of the newly identified species.


Expert identification and curation of records

NatureMapr's goal is for all uploaded records to be curated and formally identified by an expert human moderator, resulting in complete confidence in and assurance of the identification of the record.

This means that the record can then be trusted in full so that the environmental and scientific impacts it can contribute to downstream are maximised.

NatureMapr achieves this through a comprehensive workflow which invites suitably qualified and/or experienced expert human moderators to participate in the identification and curation of a particular record.

 

Inconclusive records

If a record cannot be formally identified as part of curation, it is marked as inconclusive, eliminating any eligibility for downstream data transfer and kept for informational purposes only.

Expert human moderators

NatureMapr is powered by an incredible community of expert, human moderators who provide their time and expertise to provide formal identification of uploaded records.

There are 2 types of moderators:

  • Category moderators - experts within a particular category of species; AND
  • Location moderators - experts with intimate, first hand knowledge of their own local reserve who are alert to the typical species and activity that occurs within that reserve

NatureMapr's citizen science workflow alerts and invites relevant expert moderators to participate in the identification of a record as it arrives in the platform.

While general community members can freely make ID suggestions and participate in discussion, only relevant expert moderators with appropriate qualifications, experience and trust may formally confirm the identification of a record.

NatureMapr's first expert human moderator, Dr Michael Mulvaney, was a formally trained ecologist with decades of experience identifying and differentiating thousands of plant and animal species in the field as an ACT Government Senior Conservation Officer. Dr Mulvaney's extensive trusted professional network largely formed the basis of the original group of NatureMapr expert human moderators.

Eligibility of expert, human moderators

As NatureMapr developed and its workflow became robust and refined, a set of criteria governing the appointment of new expert human moderators was established.

A NatureMapr contributor may be granted the expert, human moderator privilege if they:

  1. Hold relevant formal qualifications and/or training within a particular category of expertise
    (E.g. ecologist, biologist, entomologist etc); OR

  2. Have obtained professional experience relevant to a particular category of expertise
    E.g. CSIRO Scientist, Park Ranger, Conservation Officer etc); OR

  3. Have demonstrated an extensive track record of uploading and/or identifying records within a particular category or location
    (E.g. someone that has learnt to identify native orchids through genuine interest and repetition over time)

And are:

  1. Entrusted and approved by an existing expert, human moderator within a parent category of their particular category of expertise
    (E.g. Expert bird moderator trusts a colleague who specialises in birds of prey)

Artificial intelligence non-human moderator

NatureMapr's non-human AI based moderator, CarbonAI, is trained and closely supervised by our community of suitably qualified and experienced expert human moderators.

CarbonAI's primary role is to support NatureMapr's expert human moderators and to protect their extremely precious and limited time.

For this reason, CarbonAI primarily makes automated ID suggestions, in order to fast track the full workflow so there are less steps for busy expert human moderators to complete.

CarbonAI may however make automated ID confirmations in the following ideal circumstances:

  1. CarbonAI predicts a probability of at least 99% on the record; AND
  2. Record has been identified by a human to at least category level; AND
  3. Species has been adequately trained with a minimum of 100 images; AND
  4. At least 50% of species within the same category have been trained

NatureMapr's expert human moderator community retain the ability to override any of CarbonAI's suggestions or confirmations at all times.

In exceptional cases where a CarbonAI confirmation has been overriden by an expert human moderator, this action further trains and improves CarbonAI's machine learning model for next time.


Exchange of information

Eligibility of data for transfer

NatureMapr enforces high levels of validation and data quality checks on all records.

These data quality standards are applied from the point of initial data entry and through the entire lifecycle of the record.

Only records which meet the following additional criteria become eligible for transfer to NatureMapr's trusted data partners:

  • Record has been formally identified by a trusted and appropriately qualified expert human moderator
  • Record has been formally identified by non-human AI based moderator which has been trained and reviewed by trusted and appropriately qualified expert human moderator(s)
  • Record contains one or more rich multimedia files which act as a form of evidence of the record
  • Record contains precise location information (rounded to 6 digits / 1m accuracy of original device accuracy) as well as reviewed GPS evidence of the record's location
  • Record shows meta data completeness (E.g. nesting site = true, animal health = deceased, plant height = 75mm)

Information that does not meet these minimum standards has reduced scientific and real world use once it leaves the original source system (NatureMapr).

Data provenance

NatureMapr maintains an internal historical audit trail of:

All audit trail entries are decorated with additional meta data about the responsible contributor, their credentials and the date/time that the action occurred.

Audit trail and provenance meta data may be exposed to NatureMapr's trusted data partners by arrangement or under the Australian Biodiversity Information Standard (ABIS).

Some types of retrospective updates to records may negatively impact a record's eligibility for data transfer to NatureMapr's trusted data partners. Such records may still be extremely important from an informational perspective, but are best accessed in situ in their original source platform (NatureMapr).

E.g. If a record's location is adjusted more than 500m from its original GPS provided location, the record becomes ineligble for partner data transfer as confidence has been reduced.

Trusted data partners

NatureMapr either currently, or intends to, feed to the following data partners:

  • Biodiversity Data Repository - Department of Climate Change, Energy, the Environment and Water
  • CSIRO Atlas of Living Australia
  • ACT Wildlife Atlas
  • NSW BioNet
  • QLD WildNet
  • Victorian Biodiversity Atlas
  • SA NatureMaps
  • WA Nature Map
  • TAS Natural Values Atlas

Exchange of information using ABIS

NatureMapr exposes Australian biodiversity information to trusted data partners via secure web API as a JSON payload.

NatureMapr also has the ability to exchange information in conformance with the Australian Biodiversity Information Standard (ABIS) and is closely monitoring the development of this standard.

ABIS is a data standard that specifies how information about biodiversity is to be represented for exchange and use in Australia.

Exchanging Australian biodiversity information using the ABIS standard ensures that NatureMapr can:

  • Maximise the quality and completeness of information we send to data partners, including the Biodiversity Data Repository
  • Prevent and minimise the loss of detailed information and meta data captured by our citizen scientists as part of their original submission

Use of globally unique persistent identifiers (PIDS)

To prevent the duplication of biodiversity records within downstream partner databases, NatureMapr uses globally unique persistent identifiers which uniquely identify all NatureMapr records regardless of the repository they have been discovered through.

NatureMapr PIDS take the following form:

  • https://naturemapr.org/sightings/XXXXXXX, where X is the unique ID (integer) of the NatureMapr source record

Appendix 1 - standard data quality checks

NatureMapr enforces the following standard data quality checks on all records:

  • Must contain a geospatial coordinate pair as location
  • Must be identified to at least category level E.g. Plant
  • Must contain at least Approximate Abundance
  • Location coordinates must reside within Australian bounding box including Macquarie Island
  • Location coordinates must be preserved to 6 digits (within 1m accuracy of supplied value)
  • Record year must be on or after 1970
  • Record date/time must not be in the future
  • Record must not be a duplicate of an existing record (same species, same location coordinates, same date/time)
  • Record location reviewed by the community as realistic (E.g. Quoll in the pacific ocean would be promptly rejected and/or corrected)
  • Relevant meta data attributes (E.g. Animal health, Plant height) are presented to the contributor for completion based on the selected species
  • Negative numbers are not accepted for abundance or relevant meta data attributes

Incentivisation of high data quality

NatureMapr incentivises contributors to provide full and complete records with as much detail and additional meta data as physically possible.

Records are scored against NatureMapr's data quality standards and are penalised for failing to meet each of the following:

  • Images or audio provided
  • Multiple images or audio files provided
  • Confirmed by an expert moderator
  • Nearby sighting(s) of same species
  • GPS evidence of location provided
  • Description provided
  • Relevant meta data attributes provided

Each of these criterion increase record data quality, the liklihood of successful identification by an expert moderator and the usefulness of the record downstream to trusted data partners.


Available for download

The following files are available for download:

1,900,751 sightings of 21,152 species in 9,355 locations from 13,000 contributors
CCA 3.0 | privacy
We acknowledge the Traditional Owners of this land and acknowledge their continuing connection to their culture. We pay our respects to their Elders past and present.