[Seminar Blog] Mass Digitization of the Insect Collection at the National Museum of Natural History

Written by: Daniel Davis

As a Research Entomologist at the National Museum of Natural History (NMNH) Dr. Torsten Dikow, has worked on describing and collecting species of assassin flies (Asilidae) and mydas flies (Mydidae) in the Namib Desert along the Atlantic coast of Namibia in southwestern Africa. He hunts for these large flies along the rippling sand dunes where 45 minutes or more can pass between each fly sighted. However, while his research interests are incredibly specific, Dr. Dikow also has duties as the Co-Lead of NMNH Department of Entomology and Curator of Diptera & Aquatic Insects. He’s currently working with the rest of the department to make their natural history data available to everyone.

Natural history collections have specimens that represent decades of collection work and contain irreplaceable biological data. However, making this data accessible to scientists is a monumental task. Even still, Dr. Dikow and the NMNH Department of Entomology have been undertaking an effort to digitize a large part of the insect collection. The difficulty of this task is compounded by the sheer diversity of insects (Fig. 1), with around 1.1 million insect species having been described. The NMNH has about 35 million insect specimens representing over 400,000 species.

This speciescape shows the number of species described in major taxonomic groups, with the size of the representative organism corresponding to the number of described species
Fig. 1: This speciescape shows the number of species described in major taxonomic groups, with the size of the representative organism corresponding to the number of described species (Wheeler 1990).

The digitization efforts seek to capture several data types. For each specimen, they generate a high quality photograph using image-stacking technology as well as images of the labels which include information about where and when it was collected, what species it is, and a unique collection identifier. All of the photos of insects generated by the NMNH digitization efforts are in the public domain and available for free on the Smithsonian Institution website. Information about where and when the specimens were collected can be uploaded to occurrence databases such as GBIF that scientists can use in many types of research, including investigations into regional species diversity, niche modeling for individual species, and the effect of climate change on species distributions over time.

The process of digitization can be tedious and very time-consuming, requiring workers to lay out each specimen and its labels, take an image, save the image with the correct metadata, and return the specimen to its proper place in the collection. However new forms of automation can improve the efficiency of this process. Dr. Dikow described how the NMNH has implemented a “pinned insect digitization conveyor” to increase speed and accuracy of the process (Fig. 2). A group of workers place pinned insect specimens on the conveyor with labels spread for imaging. The conveyor moves the specimens through a chamber with a high quality camera before bringing the specimens back to the worker. The conveyor uses a system of cameras to track where each specimen came from in its box and uses a laser pointer to guide the worker to where the specimen should return. This entire system reduces the amount of time and effort needed per specimen.

Pinned insect digitization conveyor at National Museum of Natural History
Fig. 2: Pinned insect digitization conveyor at National Museum of Natural History. Photograph by NMNH Photo Services.

Dr. Dikow is also collaborating with students and professors at UMD—Dr. Evan EconomoDr. Jim Purtilo, and students in Dr. Purtilo’s software engineering capstone class—who are using machine learning to improve automated transcription of specimen label data. Digitization processes frequently rely on workers manually transcribing the data from specimen labels into databases to make specimen records searchable and useful. The hope is that automated transcription can improve the efficiency of this process as well.

Digitization of museum collections both makes the existing collection more accessible and aggregates the data into a database useful for many types of research and analysis that are difficult to tackle using only the physical collection itself. Making the photographs and collection data online is an important way to democratize data, making it available for researchers without the time or funds to travel to the collection. Overall, the availability of this data will be important for taxonomy, ecology, and many other fields of biology.

About the writer: Daniel Davis is a PhD student in the Fritz lab studying the evolution of Cry-resistance in corn earworms.

Citations: Quentin D. Wheeler, Insect Diversity and Cladistic Constraints, Annals of the Entomological Society of America, Volume 83, Issue 6, 1 November 1990, Pages 1031–1047, https://doi.org/10.1093/aesa/83.6.1031