登录查看更多内容

Machine Learning and document analysis a Practical approach

Giovanni Emanuele Nocco

SVP Tech & Enterprise Architect at UTU Singapore [Remote]

发布日期: 2018年3月28日

AI (Artificial Intelligence) and Machine Learning are very popular these days. A lot of resources are available ranging from the traditional paper to the most advanced web platforms. But without a solid mathematical background, every approach will scratch only the surface.

Three different books were been the starting point to understand this world, both very interesting and passionate.

Tom Michael Mitchell - Machine Learning
Christopher M. Bishop - Pattern Recognition And Machine Learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville - Deep Learning

The first one is quite old, but we can find in it a lot of useful insight to understand the mathematical aspects.

The others allow having a wide vision of how the algorithm works, as well as what is the best algo for each specific case.

Math

Mathematical aspects that are strongly required (even though formal aspects could be skipped):

Linear algebra
Probability theory
Decision theory
Information theory

The Goal

In order go on the street and see what we can be done. I tried to solve a well-defined problem.

The problem:

Can I use ML to implement advanced NLP algorithm, in particular: can I create a model able to identify some specific feature inside a document?

The aim of the lab was to identify law references inside institutional documents using a well defined ML algorithm.

The Algorithms

In order to be able to analyze big documents, the first issue has been to identify an algorithm able to scan large documents.

The second one was to identify word similarity. In other words how to convert sparse vector built on single word into a much more dense vector and at the same time how to normalize the words distance based on their meaning.

To do that a mix of Convolutional Neural Network and the Word2Vec algorithm has been chosen.

The environment -

The magic world of Python, Tensorflow and Cuda.

The Anaconda environment

Using Anaconda one host can contain many different environments.

Tensorflow is a framework designed by Google to develop Machine Learning algorithm. It is open, simple to use, it has a huge community. It allows developing both very fast and complex solution using the Python as programming language.

Nvidia Cuda is a library developed by Nvidia that allows using GPU to manipulate efficiently floating points values. The high number of cores allows running simultaneously repetitive tasks.

CPU vs GPU

Following pictures compare same computational work made on CPU vs GPU

Same work shown by means Python Spider IDE

The results

Using Machine Learning approach it is possible to identify specific patterns as Nouns, Locations, Chemical/Biological Compounds, Law References, and many more terms, inside a textual document.

Using well-known algorithms, framework, programming languages as well as specific execution platform it is possible to implement awesome products in very quickly and efficiently fashion.

要查看或添加评论，请登录

Giovanni Emanuele Nocco的更多文章

from FULL to WHOLE Stack Development

2024年4月26日

from FULL to WHOLE Stack Development

Let's dive into the future: Full Stack Development with AI! The world of software development is constantly evolving…
?? Lightning Network: A valid option to empower the Future of Micropayments??

2023年9月9日

?? Lightning Network: A valid option to empower the Future of Micropayments??

In the dynamic landscape of blockchain technology, innovation knows no bounds. At the forefront of this digital…
To identify crops parasites in agriculture by means of ML (CNN)

2023年5月14日

To identify crops parasites in agriculture by means of ML (CNN)

This paper is about a suggestion for a software that uses artificial intelligence (AI) to identify Leptinotarsa…
Entreprise LLM solutions

2023年5月6日

Entreprise LLM solutions

Utilizzo di soluzioni per creare ambienti on-prem che sfruttano a pieno l’intelligenza artificiale e gli LLM in…
Bing Compose

2023年3月26日

Bing Compose

Yesterday I received a promotion coming from #Microsoft: - Use #bing for three days to receive 3 months Spotify. Today…
Zero trust and Decentralize Consensus

2022年2月12日

Zero trust and Decentralize Consensus

The use of blockchain related technologies is a way to solve a lot of issues we are encounter. How can we trust that a…
Vertical Farm MES solution with LoRaWAN Metering

2021年4月11日

Vertical Farm MES solution with LoRaWAN Metering

The Idea - Vertical Farming In this article I want to share with you an idea come up to me. I started to figure out a…
Micro-services Architecture with Dapr

2021年3月20日

Micro-services Architecture with Dapr

I wanted to write this simple article at the end of an evaluation session of Dapr. I have to say that the possibility…

2 条评论
Google Cloud Services - The One Man Platform

2020年9月19日

Google Cloud Services - The One Man Platform

Google cloud services are a complete set of tools, libraries, APIs, and so on that allow you to implement an end-to-end…
The digital transformation's challenges

2020年8月22日

The digital transformation's challenges

La trasformazione digitale passa attraverso la realizzazione di flussi di lavoro ridisegnati utilizzando un approccio…

See all articles

Machine Learning and document analysis a Practical approach

Giovanni Emanuele Nocco

SVP Tech & Enterprise Architect at UTU Singapore [Remote]

Math

The Goal

The Algorithms

The environment -

The Anaconda environment

CPU vs GPU

The results

Giovanni Emanuele Nocco的更多文章

社区洞察

其他会员也浏览了

Applied Machine Learning: CNNs for Image Recognition

SVM

AI Framework for Beginners: TensorFlow

Machine Learning Libraries

Mastering Machine Learning with TensorFlow and PyTorch: A Comprehensive Guide

Frameworks and Libraries for AI Development: A Comprehensive Guide ????

The Unsung Hero of Data Science: Mathematics

Artificial Intelligence Course ( Basic)

top 10 AI tools and frameworks

TensorFlow

Math

The Goal

The Algorithms

The environment -

The Anaconda environment

CPU vs GPU

The results

Giovanni Emanuele Nocco的更多文章

from FULL to WHOLE Stack Development

?? Lightning Network: A valid option to empower the Future of Micropayments??

To identify crops parasites in agriculture by means of ML (CNN)

Entreprise LLM solutions

Bing Compose

Zero trust and Decentralize Consensus

Vertical Farm MES solution with LoRaWAN Metering

Micro-services Architecture with Dapr

Google Cloud Services - The One Man Platform

The digital transformation's challenges

社区洞察

其他会员也浏览了

Applied Machine Learning: CNNs for Image Recognition

SVM

AI Framework for Beginners: TensorFlow

Machine Learning Libraries

Mastering Machine Learning with TensorFlow and PyTorch: A Comprehensive Guide

Frameworks and Libraries for AI Development: A Comprehensive Guide ????

The Unsung Hero of Data Science: Mathematics

Artificial Intelligence Course ( Basic)

top 10 AI tools and frameworks

TensorFlow