登录查看更多内容

Data Science Milan #005

Data Science Milan

The Community of Data Scientists and Machine Learning Practitioners based in the Greater Milan area.

发布日期: 2023年12月5日

Dear Data Science Milan Community,

Welcome back to our newsletter, bringing you another edition packed with the latest developments, inspiring projects, and invaluable insights from the world of data science!

For episode #5 we will talk about a well-known star in the field of NLP: big shout to Bidirectional Encoder Representations from Transformers aka BERT!

BERT's first comparison is dated 2019 with an important objective: set a framework as a standard approach for a variety of NLP. And yes, they succeeded by a lot!

Its framework relies on a 'simple' assumption that reveals to be a winning strategy:

leverage on a large amount of unlabeled data that are way easier to obtain
get the most from a small amount of labeled data

Now, if you have a fancy GPU to work with you can (pre)train your own BERT model from scratch by feeding a large corpus of data, that will be your starting point and you can step forward to the downstream task. Otherwise, you can get a version of BERT from a public repository and only then fast forward to the second step of the framework.

Once you have your BERT, you can leverage its ability to transform the text into a numerical representation and utilize this ability to feed the encoded text into a classification layer. In this phase, you can adopt different strategies to achieve the best results.

Last but not least we propose an article with two Italian resources that we've tested: a BERT trained with a large Italian corpus and a classification dataset in Italian!

Data?Science?Milan?events

Upcoming:?On December 13th, 2023, Greta Coraglia and Davide Posillippo will speak about a bias detecting tool.?

The aim of the collaboration between BRIO and Alkemy is to produce software applications for the analysis of bias, risk and opacity with regard to AI technologies which often rely on non-deterministic computations and are opaque in nature. They present a first tool developed within the BRIOxAlkemy collaboration for the detection and analysis of biased behaviours in AI systems, and its theoretical background. The tool is aimed at developers and data scientists who wish to test their algorithms relying on probabilistic and learning mechanisms in order to detect misbehaviours related to biases and collect data about them. They will show the tool with a live demo and explain our open source and collaborative approach to its development.

Enroll to the event

Machine Learning pipelines @Facile.it: how to keep models always trained

On November 25th, 2023, at Google DevFest, Cesare Bassu showed us data science pipelines at @Facile.it.

In @Facile.it they employ a suite of MLOps principles to ensure continuous model training and prevent developmental errors. They fuse MLOps Pipelines with Continuous Integration practices to automate model development and enable automatic retraining.

领英推荐

Meet Sora: The AI Model Blurring the Lines Between…

Data Science Dojo 1 年前

Demystifying AI-Driven Data Engineering: Transforming…

Pronix Inc 6 个月前

A Beginner's Guide to ggplot2, Deep Reinforcement…

Open Data Science Conference (ODSC) 2 年前

Empowering the Bending Spoons' platform with data science

On November 7th, 2023, Andrea Maiorana spoke about how data science works at Bending Spoons.?

Bending Spoons is a leading tech company based in Italy which is specialized in software and app development. Andrea went through data science workflows at Bending Spoons, and then dive into the measurement and predictions related to app users metrics. There was also a poster session during networking aperitivo.

Read the article

Watch the video

Knowledge section

Here are some selected NLP resources:

Be involved!

We want also to remind you that if you like and enjoy our events, you can get in touch with us at?[email protected]?to be involved in organizing new great online activities.

We are also very happy if you are interested in being a speaker or if you want to share your expertise or experience with the?Data?Science?Milan?community!!!

Wallboard

Would you like to become one of our sponsors and increase your popularity among the?Data?Science?community? Write?here

If instead, you would like to promote a message to the wallboard, please contact us and send us your relevant announcements. We will publish them here.

Data Science Milan #005