Training a Computer to Identify Moths – Guest post by Andre Poremski

Fieldguide’s approach to machine learning in LepSnap

point-shootLepSnap is an app created by Fieldguide that uses image recognition technology to identify Lepidoptera (moths & butterflies) to the scientific rank that is feasible with a photo, location and date. Read more about LepSnap here.

One of the most frequently asked questions we get is: How does your image recognition system work? Here’s my attempt at unpacking that question…

How does LepSnap recognize a species?

Most of the work to recognize a particular species actually happens long before the app tries to identify that species in a photo submitted by an enduser. The pixels of all images in our database are converted into a set of visual descriptors. The visual descriptors capture many different attributes, from simple geometric shapes and colors, to high-level features such as wing patterns and outline shapes. A classifier then tries (and learns) to work out which features are most discriminative and applies weightings to each. When LepSnap tries to identify a moth in a new photo submission, it first converts the image to the same set of visual descriptors and works out which group it belongs to by comparing the weightings. It is usually pretty confident in its prediction, but it will return multiple ID suggestions when the confidence is low.

Does LepSnap learn in realtime?

When a new moth photo is published on LepSnap, the app does not immediately analyze the image to improve its recognition capabilities. The underlying neural network that powers image recognition does not perform incremental learning in the same manner that we humans process new information. Instead, a neural network is a balanced system of feature weightings that must be compiled from a fixed dataset. Once per month, a Fieldguide team member pulls the entire LepSnap dataset and runs it through a series of steps that trains a neural network from scratch. The training process takes several days and consumes significant computational resources. Once complete, the existing classifier is swapped out with the new one.

Takeaway: Publishing misidentified moths does not have an immediate adverse effect on the image recognition system. Most submissions have time to get verified by the community before they are included in the next training cycle.

What’s involved in neural net training?

The machine learning framework Fieldguide employs requires a hierarchical tree of categories with a minimum of 20 photos per category. The maximum is usually capped at 1000 photos per category, but in general the more training photos per category, the better. Also, the more evenly-distributed the photo quantities are across categories, the better.

For example, if LepSnap only had 5 photos of an IO Moth, 25 photos of a Cecropia Silkmoth, and 50 photos of a Polyphemus Silkmoth, A) the IO Moth would not be considered a category in the neural net due to insufficient examples, and therefore would not be included in any ID suggestions and B) identifications would lean toward the category with the most photos, in other words, there would be more moths incorrectly identified as Polyphemus Silkmoths than the other two species (if all photos were of adult specimens).

Takeaway: Publishing photos of a rare species with few examples is paramount, as it helps LepSnap recognize that species’ existence. Publishing photos of a well-represented species is still valuable due to the sheer number of training examples required.

How flexible is the pattern recognition within a single category?

Moths and butterflies look wildly different as they progress through their lifecycle. A caterpillar can look very different between early instar and pre-pupal phases. Some species are sexually dimorphic and/or highly polymorphic where variations of color pattern are influenced by genetics, diet, environmental conditions, season, and other factors. How does a neural net handle all that variation?

It is not problematic to include photos within the same category that look very different from other photos. A category that contains multiple photos of caterpillars does not hinder its ability to recognize adults, when there are also ample photos of adults. In other words, the recognition model is flexible enough to identify larvae and adults without requiring two separate categories. However, neural nets are more valuable when categories are homogenous, because the classification can be more specific, e.g. “this is a living, adult, female melanic morph of a Tiger Swallowtail”. Therefore, we seek to split a single category into more finer-grained child categories as soon as there are sufficient photos in a subgroup to meet the training requirements of category representation.

Takeaway: With enough photos spanning the full lifecycle, LepSnap can be an expert caterpillar classifier as much as it can help identify adults.

What happens when a moth remains misidentified?

Misidentified images are a data quality issue that is inevitable yet manageable. Misidentified images can weaken the classification model, but only to the extent that is proportionate to the quantity of correctly-identified photos. A large category with many verified photos is not so easily corrupted by a few errors, especially when the correct ID is also a strong category. Fieldguide employs an automatic script that constantly scans the data to bubble up probable misidentifications for expert review.

Takeaway: Don’t be shy about publishing tentative IDs. LepSnap leverages machine learning to surface low-confidence photos in strong categories, while working with experts to heavily curate categories that are most susceptible to false bias.

How does the training correct for color?

The training accounts for color range as a probability index that is one of the many “layers” in the deep learning process. There is no “color correction” per se. If we feed the algorithms sample images that are black & white, tungsten color temperature, and daylight-adjusted colors, each of these images will influence the color weightings and, as such, will improve the system’s ability to recognize future black & white, tungsten-hued, daylight-adjusted images.

Takeaway: LepSnap places a low weight/importance on color unless the training suggests it’s a vital attribute.

How does the training correct for scale, if there is no obvious scale marker?

The scale of the object in the photo is roughly inferred probabilistically by pattern density as compared with other similar categories but is otherwise not a factor in and of itself. For example, a human might look for telltale field marks that distinguish two otherwise near-identical species, but the machine takes into account everything at once: wing vein thickness and spacing, wing scales, abdomen-to-wing proportions, etc…so it’s better at mathematically assessing “proportionality” than we are. The machine cannot, however, use pattern density to calculate a precise size of the specimen, due to attribute variability.

Takeaway: Including scale bars in photos is useful for reference but not necessary for training.

Some species cannot be identified by a photo alone, how does LepSnap handle that?

tussockAdult Sycamore and Banded Tussock moths are two species of North American tiger moth that, within the overlapping range of their respective host plants, cannot be separated by viewing photos of the animals. It remains to be seen if a computer can be trained to differentiate these near-identical species by minute differences, perhaps too subtle for human perception. In the meantime, LepSnap offers “group categories” that describe two or more species that require dissection or other methods to ID the species.

Takeaway: If a group category doesn’t exist but needs to, let us know and we’ll add it pronto.

How many classes can a neural net handle?

Unfortunately there are limitations to how many categories a single neural net can handle. Most do not exceed 10–20,000 categories due to an exponential increase in training time paired with a limit in computational capacity. Therefore, speciality networks such as LepSnap are equipped to provide more granular identifications than a single network trained on all flora and fauna.

It’s all about the

Training a Computer to Identify Moths

Fieldguide’s approach to machine learning in LepSnap

LepSnap is an app created by Fieldguide that uses image recognition technology to identify Lepidoptera (moths & butterflies) to the scientific rank that is feasible with a photo, location and date. Read more about LepSnap here.

One of the most frequently asked questions we get is: How does your image recognition system work? Here’s my attempt at unpacking that question…

How does LepSnap recognize a species?

Most of the work to recognize a particular species actually happens long before the app tries to identify that species in a photo submitted by an enduser. The pixels of all images in our database are converted into a set of visual descriptors. The visual descriptors capture many different attributes, from simple geometric shapes and colors, to high-level features such as wing patterns and outline shapes. A classifier then tries (and learns) to work out which features are most discriminative and applies weightings to each. When LepSnap tries to identify a moth in a new photo submission, it first converts the image to the same set of visual descriptors and works out which group it belongs to by comparing the weightings. It is usually pretty confident in its prediction, but it will return multiple ID suggestions when the confidence is low.

Does LepSnap learn in realtime?

When a new moth photo is published on LepSnap, the app does not immediately analyze the image to improve its recognition capabilities. The underlying neural network that powers image recognition does not perform incremental learning in the same manner that we humans process new information. Instead, a neural network is a balanced system of feature weightings that must be compiled from a fixed dataset. Once per month, a Fieldguide team member pulls the entire LepSnap dataset and runs it through a series of steps that trains a neural network from scratch. The training process takes several days and consumes significant computational resources. Once complete, the existing classifier is swapped out with the new one.

Takeaway: Publishing misidentified moths does not have an immediate adverse effect on the image recognition system. Most submissions have time to get verified by the community before they are included in the next training cycle.

What’s involved in neural net training?

The machine learning framework Fieldguide employs requires a hierarchical tree of categories with a minimum of 20 photos per category. The maximum is usually capped at 1000 photos per category, but in general the more training photos per category, the better. Also, the more evenly-distributed the photo quantities are across categories, the better.

For example, if LepSnap only had 5 photos of an IO Moth, 25 photos of a Cecropia Silkmoth, and 50 photos of a Polyphemus Silkmoth, A) the IO Moth would not be considered a category in the neural net due to insufficient examples, and therefore would not be included in any ID suggestions and B) identifications would lean toward the category with the most photos, in other words, there would be more moths incorrectly identified as Polyphemus Silkmoths than the other two species (if all photos were of adult specimens).

Takeaway: Publishing photos of a rare species with few examples is paramount, as it helps LepSnap recognize that species’ existence. Publishing photos of a well-represented species is still valuable due to the sheer number of training examples required.

How flexible is the pattern recognition within a single category?

Moths and butterflies look wildly different as they progress through their lifecycle. A caterpillar can look very different between early instar and pre-pupal phases. Some species are sexually dimorphic and/or highly polymorphic where variations of color pattern are influenced by genetics, diet, environmental conditions, season, and other factors. How does a neural net handle all that variation?

It is not problematic to include photos within the same category that look very different from other photos. A category that contains multiple photos of caterpillars does not hinder its ability to recognize adults, when there are also ample photos of adults. In other words, the recognition model is flexible enough to identify larvae and adults without requiring two separate categories. However, neural nets are more valuable when categories are homogenous, because the classification can be more specific, e.g. “this is a living, adult, female melanic morph of a Tiger Swallowtail”. Therefore, we seek to split a single category into more finer-grained child categories as soon as there are sufficient photos in a subgroup to meet the training requirements of category representation.

Takeaway: With enough photos spanning the full lifecycle, LepSnap can be an expert caterpillar classifier as much as it can help identify adults.

What happens when a moth remains misidentified?

Misidentified images are a data quality issue that is inevitable yet manageable. Misidentified images can weaken the classification model, but only to the extent that is proportionate to the quantity of correctly-identified photos. A large category with many verified photos is not so easily corrupted by a few errors, especially when the correct ID is also a strong category. Fieldguide employs an automatic script that constantly scans the data to bubble up probable misidentifications for expert review.

Takeaway: Don’t be shy about publishing tentative IDs. LepSnap leverages machine learning to surface low-confidence photos in strong categories, while working with experts to heavily curate categories that are most susceptible to false bias.

How does the training correct for color?

The training accounts for color range as a probability index that is one of the many “layers” in the deep learning process. There is no “color correction” per se. If we feed the algorithms sample images that are black & white, tungsten color temperature, and daylight-adjusted colors, each of these images will influence the color weightings and, as such, will improve the system’s ability to recognize future black & white, tungsten-hued, daylight-adjusted images.

Takeaway: LepSnap places a low weight/importance on color unless the training suggests it’s a vital attribute.

How does the training correct for scale, if there is no obvious scale marker?

The scale of the object in the photo is roughly inferred probabilistically by pattern density as compared with other similar categories but is otherwise not a factor in and of itself. For example, a human might look for telltale field marks that distinguish two otherwise near-identical species, but the machine takes into account everything at once: wing vein thickness and spacing, wing scales, abdomen-to-wing proportions, etc…so it’s better at mathematically assessing “proportionality” than we are. The machine cannot, however, use pattern density to calculate a precise size of the specimen, due to attribute variability.

Takeaway: Including scale bars in photos is useful for reference but not necessary for training.

Some species cannot be identified by a photo alone, how does LepSnap handle that?

Adult Sycamore and Banded Tussock moths are two species of North American tiger moth that, within the overlapping range of their respective host plants, cannot be separated by viewing photos of the animals. It remains to be seen if a computer can be trained to differentiate these near-identical species by minute differences, perhaps too subtle for human perception. In the meantime, LepSnap offers “group categories” that describe two or more species that require dissection or other methods to ID the species.

Takeaway: If a group category doesn’t exist but needs to, let us know and we’ll add it pronto.

How many classes can a neural net handle?

Unfortunately there are limitations to how many categories a single neural net can handle. Most do not exceed 10–20,000 categories due to an exponential increase in training time paired with a limit in computational capacity. Therefore, speciality networks such as LepSnap are equipped to provide more granular identifications than a single network trained on all flora and fauna.

It’s all about the data!

No matter how sophisticated our machine learning system is, it would be useless without continual curation by a large network of experts. I couldn’t have said it better than John Morgan, a naturalist and fellow lep-lover:

This all reminds me of the old axiom: garbage-in, garbage-out — which just means no matter how sophisticated an information processing system is, the quality (accuracy, completeness, relevance, timeliness, etc.) of the information coming out of it cannot be better than the quality of the information that went in.

Our mission is to engage as many people in the training process as possible. We’d love to hear from you if you have expertise to share with the Fieldguide community.

Give LepSnap a whorl!
iPhone app · Android app · Website

Questions? Ideas? Say hello@fieldguide.net

 

data!

No matter how sophisticated our machine learning system is, it would be useless without continual curation by a large network of experts. I couldn’t have said it better than John Morgan, a naturalist and fellow lep-lover:

This all reminds me of the old axiom: garbage-in, garbage-out — which just means no matter how sophisticated an information processing system is, the quality (accuracy, completeness, relevance, timeliness, etc.) of the information coming out of it cannot be better than the quality of the information that went in.

Our mission is to engage as many people in the training process as possible. We’d love to hear from you if you have expertise to share with the Fieldguide community.

Give LepSnap a whorl!
iPhone app · Android app · Website

Questions? Ideas? Say hello@fieldguide.net

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply