Fossils are the most important indices for stratigraphy, but the issue of fossil identification confidence or consistency has been discussed and debated for decades (i.e. Gradstein et al., 1985; Polly and Head, 2004). With the highly developed database technology in the last decade, millions of fossil data of varying quality have been accumulated through formal or informal databases (i.e. The Paleobiology Database, OneStratigraphy, World Register of Marine Species). After evaluation, these data have been used for high-resolution stratigraphic correlation (Fan et al., 2020; Deng et al., 2021). However, when tens of thousands of fossil data are used in one quantitative analysis, how accurate, confident, or consistent of their identification could we achieve? For millions of fossil data in the databases, how many could be re-examined and evaluated, and in what way?
Here, an AI-based machine learning approach for fossil image identification is proposed to test the influence of the training set consistency. Three different training sets, one with the original labels from various identifiers and the other two with revised labels from two different experts, are used for independent training with the same deep learning model and then perform identification for the same test set. The consistency among the training set, as well as the machine identification results, could provide arguments for the taxonomic identification confidence issue and the above-mentioned questions.
The work is supported by the Natural Science Foundation of China (Grant 42293280).
References