Supervised machine learning algorithms require large amounts of labeled data before they start giving useful results. But when labeled data is limited, these algorithms don’t generalise well to data they haven’t seen before. They also need a lot of domain expertise. In the current experiment, we have built a model to find the relevance of a news article for a specific topic with minimal labeled data. Another challenge is that since the topics are not fixed, we can’t use the traditional ML approach to solve this problem, as the number of classes are unknown at the time of training. We have used Natural Language Understanding(NLU) to tackle these issues. NLU is a subset of natural language processing that uses the semantic analysis of text to understand the meaning of sentences.

Business Use Cases and Applications

There are multiple applications of this experiment. Some of them include –

1. Topic Relevance Identification: There are thousands of articles available for any topic on the internet. Using the approach in this experiment, we can exclude all the non-relevant articles and filter out all the relevant news. Also, we can find all the locations and organizations mentioned in the article using the entity extraction technique called Named Entity Recognition(NER).

2. Flagging a message/mail as spam: Today, telecommunication companies have little to no data on whether a message coming to a user is Ham or Spam. (Ham is a non-spam email) By the approach in this experiment, the messages can be classified into Ham or Spam, and then the keywords relevant to determining the spam messages can be extracted.