登录查看更多内容

Artificial Intelligence #8 - insights from simple algorithms like K-nn

Ajit Jaokar

发布日期: 2021年6月15日

+ 关注

Hello all

Welcome to edition #8

As expected, we crossed 10K members

Also, thanks for the insights and engagement on these posts

Before we begin, we are looking for an intern (my company feynlabs )

This will be a paid online position working with me. It’s difficult topic .. you need to have an interest in maths and AI. This is an intern – so not an expert. Can be based anywhere in the world

Last week, we had an interesting discussion in class about instance based algorithms.

K nearest neighbour (k-nn) is an example of an instance based algorithm (not to be confused with k-means ie clustering)
k-nearest is used for classification and regression
input for k-nearest consists of the k closest training examples in data set.
For classification, the output is a class membership. For regression, the o/p is the average of k nearest neighbours
You could assign weights so that nearer neighbours contribute more than distant ones.

A diagram on K-nn (along with other algorithms as below) – source

So far so good

K-NN is actually a simple algorithm because you simply look at your nearest neighbours and find the class or a predicted value (depending on classification or regression problems).

However, k-nn has some interesting characteristics

There is no explicit training. The act of exploring the contribution of neighbours is a training of sorts
For the same reason, K-nn is sensitive to the local structure of the data.
Finally, k-nn is an example of a non-parametric algorithm. This needs some explanation
parametric algorithms have a fixed number of parameters. Parametric algorithms make stronger assumptions about the underlying data. Linear regression is an example of a parametric algorithm (ex: gradient m and offset c are examples of parameters)
In the case of non-parametric algorithms, the number of parameters are not fixed – but rather grow with the data. Non-parametric algorithms make fewer assumptions of the data. K-nn is a non-parametric algorithm. Decision trees can also be thought of as a non-parametric algorithm.
As a side note, the term 'non parametric' is a misnomer. Its not that we have no parameters as the term implies, rather, the parameters are not fixed.

So, when would you use K-nn?

One of the interesting uses of K-nn is when you have small datasets which are homogenous and noise free. K-nn learns from memory i.e. learns from previous examples which maybe seen as ‘from memory’

This simple technique can be used for complex problems where these characteristics exist(small data but clean and noise free data) ex in bioinformatics Prediction of protein subcellular locations using fuzzy k-NN method

PS this is another good link I found when I was researching this https://machinelearningmastery.com/types-of-classification-in-machine-learning/

What this shows is

a) Studying even seemingly simplistic algorithms can help us to gain important insights and potential applications (ex for in small data situations where data is clean)

b) It can be used to explain more complex concepts such as parametric or non parametric

One final point, like many real-life cases, we do not apply individual algorithms - rather we apply algorithms in an ensemble. that's true here also

Some interesting jobs this week:

Devops engineers at CARIAD VW - CARIAD is the automotive software company within the Volkswagen Group that bundles and further expands the Group's software competencies to transform automotive mobility https://www.dhirubhai.net/posts/jan-zawadzki_devopsmlops-engineer-activity-6810460697906966528-Iqd9 via jan zawadzki. We are happy to work with Jan and his team at VW at Oxford. So, very much recommended

University of Plymouth – fully funded doctoral scholarship via Edward Meinert https://www.dhirubhai.net/posts/activity-6809362405932462080-dXVK

Via Thanos Mourikis - Senior Expert I, Oncology Data Science at Novartis Institutes for BioMedical Research (NIBR) https://www.dhirubhai.net/posts/activity-6809122372264710144-30Cv

Senior front end full stack developers via Dr Fabio Ricciato. I worked with Fabio before in his previous job and very much recommend his work. https://www.dhirubhai.net/posts/fabioricciato_hiring-digital-customerexperience-activity-6808491556656422912-CqeD

Finally, this week I reread one of my favourite books – Thomas Campbell’s Hero with a thousand faces – in an attempt to get Daliana Liu to see Star Wars ?? https://www.dhirubhai.net/posts/ajitjaokar_after-working-in-tech-for-7-years-i-have-activity-6809825850603659264-lfYu

I very much recommend following Daliana for her motivation and insights especially for people who want to be a part of the AI industry. She is a great mentor to all and always ready to share her insights in her own special way