How to use “Named Entity Recognition” to read unstructured emails and extract relevant data?
According to Wikipedia, The term “named entity recognition” (NER) is a sub-field of Data Science / Natural Language Processing (which is a category of Artificial Intelligence), to locate and classify named entity mentioned in unstructured text into pre-defined categories such as person names, organizations, locations.
The huge amount of unstructured text data now accessible from both conventional and new media outlets, including social media, offers a rich source of information. Named Entity Extraction provides a central function for knowledge building from semi-structured and unstructured text sources. The value of information “units” including names (for example personal names, associations, place names) and numerical expressions (as time, date, cash and percent of terms) is recognized by some of the first researchers who worked on the extraction of information from unstructured text.
In short, it is giving an entity or certain word a name, like giving a human a name.
First, we need to classify certain words or numbers into categories(Named Entity)
An example is shown below:
You might receive an email with text such as “Please issue a cheque of $234 to Alex Tay“. Using NER, we can train Gleematic cognitive robot to classify ‘Cheque’ as a payment mode, $234 as the amount, and ‘Alex Tay’ as the receiver.
When we provide enough training data of many possible ways that human write, the robot can pick up the relevant data when reading incoming emails and classify them or even pull out some data fields into structured tables.
Recent rise in computing power and reductions in data storage costs means that Data Scientists and software-developers now have even more options to build large information bases with millions of data sets and can be fed into machines for NER classification. Such sources of expertise contribute to smart machine behavior. Not unexpectedly, Named Entity Extraction functions in the heart of many common technologies, see our demo.
Written by: Benny Tan