How machine learning can help in patent monitoring and other IP tasks: First results

How machine learning can help in patent monitoring and other IP tasks: First results

I am keen to report first results that I have obtained with AI-related techniques applied to intellectual property (IP) and, in particular, to patent monitoring, for which machine learning appears to work very well.

Before getting down to the heart of the matter, let me briefly recall the background. Machine learning is a foremost component of artificial intelligence (AI), which itself is one of the most disruptive ingredients of Industry 4.0. Computer scientists are massively developing machine learning techniques and applications to various technical fields are swarming. More than a hype, AI is today a reality in many areas.

As a patent attorney, I wish I had a versatile and efficient tool to assist me in some tedious IP tasks, especially where large numbers of IP rights are involved. But applications to IP are still in their infancy. Proprietary solutions exist and a few machine learning applications to IP have been advertised. However, such solutions are mostly integrated to large IP platforms and, as such, have a rather limited scope and/or lack flexibility. This has prompted me to develop my own tools, based on available text analysis and machine learning algorithms. The advantage of homemade tools is that their efficiency and accuracy can be clearly assessed. All the more, they can be quickly reconfigured to match any use case of interest to IP users.  

Let me illustrate this with an example showing how one can actively use machine learning to gain a clearer view of the patent landscape in a given field. Assume that you (an “IP user”) want to monitor adverse patents granted in your field, as most innovators do. It is easy to set up a monitoring process with a standard patent database. Having done so, you may for example receive monthly digests of newly granted patents, which you would then review to detect patents that are relevant to your activities. The additional workload is usually not an issue.

What is more difficult, however, is to go through the long list of patents already in force—typically a few (dozens of) thousands in your very technical field, meaning a few weeks of work for a trained reader. Now, not all companies have this kind of time budget or the required in-house competences. They cannot necessarily afford to outsource this task to a patent attorney either, especially as the same problem reoccurs each time a new activity is started, for example when launching a new product or service.

This is where -machine learning becomes useful. Indeed, a cognitive model can be trained based on your own ratings, i.e., scores you assign to the patents that you regularly review. For example, you may rate the relevance of newly received patents with a score varying between 0 (not relevant) and 1 (fully relevant). Once you have rated a sufficient number of patents (forming your “training set”), you can train a cognitive model based on this training set. And upon completion of the training phase, the model can be run (“inference phase”) to automatically assess the potential relevance of thousands of other patents. Eventually, the rated patents may be ranked in descending order of relevance, as illustrated below for a test pool of patents of potential relevance to a given company.  

At this point, it only remains to review the claims of the ranked patents, starting from the most relevant patents retrieved by the cognitive model. The review can be restricted to the top fraction of the ranked patents. The truly relevant patents essentially rank in the first 50 patents returned in practice, which will at most require a few days of careful reading instead of weeks.

Qualitatively, the results obtained are convincing. Dependency statistics performed on validation sets (thanks to kind beta testers) show substantial correlations between the top-ranked patents and their prima fasciae relevance. I have tested this approach in respect of several companies active in distinct technical fields (e.g., IT, materials science, microfluidics, and IC chips), using datasets of several hundreds to thousands of patents each.

In particular, tests have been performed where some of the patents as granted to a selected company were deliberately not included in the training sets but rather placed in the test pool, to estimate the relevance of the ranked patents. Nevertheless, the trained models were able to suitably retrieve such patents (based on the sole claim language, no metadata was fed as input to the models). That is, patents as rated by a selected IP user in the training set (which includes only a fraction of the patents owned by this IP user) make it possible to retrieve earlier patents (orange points) of that same IP user with highest relevance scores, as seen in the figures above. Notwithstanding, some of the third-party patents (blue points) turn out to be more relevant than the least relevant patents of the selected company. This, in my view, confirms that the present approach can indeed suitably identify patents that are most relevant to a given company.

Interestingly, similar algorithms can be applied to the analysis of prior art documents (for deciding whether to file a new patent application or not) or the detection of potential licensees/infringers. And beyond patents, machine learning can be used to objectively measure similarities between trademarks, logos, and other IP signs. I have tested such algorithms for comparing trademark signs, which I will report soon. Here again, the results are promising.

To conclude, such investigations show that machine learning, and other AI-related techniques can be profitably exploited to circumvent problems posed by large numbers of IP rights and associated documents. Eventually, the question boils down to the resources you are willing to invest in discovering potentially relevant IP rights. Now, assuming that your resources are limited, prospects offered by machine learning for reducing time and cost budgets for IP analyses are enough enticing to seriously start exploring the potential of AI-related algorithms for IP.

Sébastien Ragot

 

 

 


要查看或添加评论,请登录

Sébastien Ragot的更多文章

社区洞察