登录查看更多内容

Data Science Milan #007

Data Science Milan

The Community of Data Scientists and Machine Learning Practitioners based in the Greater Milan area.

发布日期: 2024年2月29日

Dear Data Science Milan Community,

Welcome back to our newsletter, bringing you another edition packed with the latest developments, inspiring projects, and invaluable insights from the world of data science!

Keeping track of NLP applications, in this newsletter we'll introduce text summarization as a pivotal task designed to generate concise summaries of lengthy texts, enabling users to quickly grasp essential information.

This process comes in two primary forms: extractive summarization, which pieces together key sentences directly from the text, and abstractive summarization, which creates new sentences to convey the main points more fluidly.

While evaluating these summaries poses challenges due to the need for both quantitative and qualitative measures, metrics like ROUGE and BLEU offer some benchmarks by comparing generated summaries to human-crafted ones. However, these metrics fall short of fully assessing coherence and factual accuracy, highlighting the importance of human evaluation to gauge readability and informativeness.

Despite these challenges, text summarization remains a crucial, evolving field within NLP, continually seeking to balance technological advancements with the nuanced understanding of human language.

We would like to highlight some pre-trained models that we have found on HuggingFace here and an Italian summarization dataset here.

Data?Science?Milan?events

Alkemy’s GenAI ecosystem

On February 20th, 2024 Marcello Villa presented Alkemy’s GenAI ecosystem and some of the use cases they are working on. Shifting perspective from the clients to the developers, in the second part Davide Posillipo reflected on how the latest Generative AI applications are impacting our field, Data Science, and what we can expect to happen in the future to our profession. As an example of new ways of working, in the final part Milica Cvjeticanin talked about an unconventional Transformer model. LLMs modern architectures based on Transformers represent an extremely powerful tool for solving a variety of problems. However these architectures are mostly cited when approaching natural language processing. However, by combining meta-learning, Bayesian Neural Network prior (BNN) and Transformer’s architecture the application field of transformer-based models is expanded so that it solves even classification problems with tabular data. Milica showed an example of these models named TabPFN, which could be concurrent to the best-known Machine Learning algorithms for solving these classical ML tasks, pointing out why this model is something worth keeping an eye on.

Watch the video

BRIOxAlkemy: A bias detecting tool

On December 13th, 2023, Greta Coraglia and Davide Posillippo spoke about a bias-detecting tool.

The aim of the collaboration between BRIO and Alkemy is to produce software applications for the analysis of bias, risk and opacity with regard to AI technologies which often rely on non-deterministic computations and are opaque in nature. They present a first tool developed within the BRIOxAlkemy collaboration for the detection and analysis of biased behaviours in AI systems, and its theoretical background. The tool is aimed at developers and data scientists who wish to test their algorithms relying on probabilistic and learning mechanisms in order to detect misbehaviours related to biases and collect data about them. They will show the tool with a live demo and explain our open source and collaborative approach to its development.

Watch the video

Machine Learning pipelines @Facile.it: how to keep models always trained

领英推荐

Generative AI: The Next Step in Human Evolution

Data Science Dojo 1 年前

How does a vector database work?

Algolia 1 年前

The Time Oracle: Decoding Time Series Mysteries with…

Azumo 7 个月前

On November 25th, 2023, at Google DevFest, Cesare Bassu showed us data science pipelines at @Facile.it.

In @Facile.it they employ a suite of MLOps principles to ensure continuous model training and prevent developmental errors. They fuse MLOps Pipelines with Continuous Integration practices to automate model development and enable automatic retraining.

Knowledge section

Here are some selected NLP resources related to text summarization:

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
Sarti, G., & Nissim, M. (2022). IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation.

Be involved!

We want also to remind you that if you like and enjoy our events, you can get in touch with us at [email protected] to be involved in organizing new great online activities.

We are also very happy if you are interested in being a speaker or if you want to share your expertise or experience with the?Data?Science?Milan?community!!!

Wallboard

Would you like to become one of our sponsors and increase your popularity among the?Data?Science?community? Write here

If instead, you would like to promote a message to the wallboard, please contact us and send us your relevant announcements. We will publish them here.

Data Science Milan #007