登录查看更多内容

Data Science Milan #008

Data Science Milan

The Community of Data Scientists and Machine Learning Practitioners based in the Greater Milan area.

发布日期: 2024年3月29日

Dear Data Science Milan Community,

Welcome back to our newsletter, bringing you another edition packed with the latest developments, inspiring projects, and invaluable insights from the world of data science!

This time we've found very interesting an unusual application of our well-known transformer architecture. Instead of NLP tasks, the transformer architecture can handle sequential data, making it suited for analyzing time series data.

The key innovation of transformers, the attention mechanism, allows models to weigh the importance of different points in a series, making it particularly effective for time series analysis where the relevance of past events can vary significantly.

How Transformers Work with Time Series Data

Handling Sequential Data: Like in language processing, time series data is sequential. Transformers can process entire sequences of data at once, unlike traditional methods that might process data points sequentially. This allows for capturing long-term dependencies and patterns over time.
Attention Mechanism: The self-attention mechanism in transformers can identify and focus on the most relevant parts of the data for making predictions. For time series forecasting, this means the model can learn which past time points are most indicative of future values.
Parallel Processing: Transformers can process all data points simultaneously, leading to significant improvements in training efficiency compared to models that process data points one at a time.
Adaptability: Transformers can be adapted for various time series tasks, including forecasting, anomaly detection, and trend analysis. By modifying the input sequences and training objectives, transformers can be tailored to the specific needs of time series analysis.

Guess what? You can already find this transformer on HF and is named Lag - Lama, the first foundational model (decoder only) that works with univariate time series creating lags of the target variable.

We gave it a shot and you can play with it too using this notebook.

Data?Science?Milan?events

Data Science applications in Cybersecurity

Application of Graph Theory To Anomaly Detection in Cybersecurity: an Example - Alberto Mazzetto, Artificial Intelligence Modelling Engineer at Ferrari Racing

Scale and complexity of cyber-attacks have been increasing dramatically in recent years, making it necessary to accompany rule-based detections with statistically principled anomaly detection. Alberto explained how graph theory applies to this problem and review global and local modelling approaches. He showed one possible local approach based on a Bayesian conjugate model, the Dirichlet process, that allows for fast, scalable, explainable computations. He then explored a global-flavoured methodology, based on graph variational auto-encoders, aimed at reducing the number of false positives.

A Data-Driven Approach to Cybersecurity - Luigi De Luca, Data Scientist at Data Reply

In today’s data-driven world, Big Data and Data Science have become indispensable tools in transforming the way we approach complex problems. Big Data and Data Science are very useful in handling large volumes of data to derive actionable insights. As cyber threats continue to evolve, traditional cybersecurity methods have proven to be insufficient in effectively defending against modern attacks. So, Data Analytics plays a crucial role in the field of cybersecurity. Luca explored the benefits that a data driven approach brings to cybersecurity, with a focus on three use cases that are subcases of anomaly detection: UEBA, malware detection and DGA detection. For each of this three use cases explained the improvements with respect to traditional methods and how to implement the solution.

Watch the video

Alkemy’s GenAI ecosystem

On February 20th, 2024 Marcello Villa presented Alkemy’s GenAI ecosystem and some of the use cases they are working on. Shifting perspective from the clients to the developers, in the second part Davide Posillipo reflected on how the latest Generative AI applications are impacting our field, Data Science, and what we can expect to happen in the future to our profession. As an example of new ways of working, in the final part Milica Cvjeticanin talked about an unconventional Transformer model. LLMs modern architectures based on Transformers represent an extremely powerful tool for solving a variety of problems. However these architectures are mostly cited when approaching natural language processing. However, by combining meta-learning, Bayesian Neural Network prior (BNN) and Transformer’s architecture the application field of transformer-based models is expanded so that it solves even classification problems with tabular data. Milica showed an example of these models named TabPFN, which could be concurrent to the best-known Machine Learning algorithms for solving these classical ML tasks, pointing out why this model is something worth keeping an eye on.

Watch the video

领英推荐

Clustering Algorithms

Bluechip Technologies Asia 9 个月前

Building Automated Knowledge Graph from Unstructured…

Antematter 11 个月前

Future Trends in Data Science & Analytics | Data…

Pratibha Kumari J. 8 个月前

BRIOxAlkemy: A bias detecting tool

On December 13th, 2023, Greta Coraglia and Davide Posillippo spoke about a bias-detecting tool.

The aim of the collaboration between BRIO and Alkemy is to produce software applications for the analysis of bias, risk and opacity with regard to AI technologies which often rely on non-deterministic computations and are opaque in nature. They present a first tool developed within the BRIOxAlkemy collaboration for the detection and analysis of biased behaviours in AI systems, and its theoretical background. The tool is aimed at developers and data scientists who wish to test their algorithms relying on probabilistic and learning mechanisms in order to detect misbehaviours related to biases and collect data about them. They will show the tool with a live demo and explain our open source and collaborative approach to its development.

Watch the video

Knowledge section

Here are some selected resources for time series tasks using transformers :

Kashif Rasul, undefined., et al, "Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting," 2024.
HF repository for Lag Lama here
Lag Lama official zero shot colab demo here

Be involved!

We want also to remind you that if you like and enjoy our events, you can get in touch with us at [email protected] to be involved in organizing new great online activities.

We are also very happy if you are interested in being a speaker or if you want to share your expertise or experience with the?Data?Science?Milan?community!!!

Wallboard

Would you like to become one of our sponsors and increase your popularity among the?Data?Science?community? Write here

If instead, you would like to promote a message to the wallboard, please contact us and send us your relevant announcements. We will publish them here.

Data Science Milan #008