Data Science Milan #010
Data Science Milan
The Community of Data Scientists and Machine Learning Practitioners based in the Greater Milan area.
Dear Data Science Milan Community,
Welcome back to our newsletter, bringing you another edition packed with the latest developments, inspiring projects, and invaluable insights from the world of data science!
What Happens When We Set THAT Seed?
This time we asked ourselves a question that some might take for granted and some not, so we do need clarification. What exactly happens when we set a seed?
A seed is one of the hyperparameters in a data science project that ensures something very dear to us: the reproducibility of results.
But what does it specifically impact in our job?
Ok we know that, but what happens typing numpy.set_seed(42)?
First, a current state is defined for the pseudorandom number generator (PRNG), which will start generating numbers based on the algorithm it uses every time it is requested.
To give an idea, this is an example of algorithm used:
X_{n+1} = (a * X_n + c) % m
Where X is the sequence, a, c, and m are constants and X with n=0 is the initial seed.
Fascinating. Isn't it?
Seeds and randomization techniques are also crucial when it comes to more complex algorithms like those in deep learning: without an absolute minimum, it becomes important to know where the optimization of the loss function begins.
BONUS: Even when it comes to hardware, seeds are important. GPUs use parallel computation systems, and the order of computation is relevant. You can't ignore it.
And you, how many times have you lost the seed of that model with fantastic results?
Data?Science?Milan?events
Generative AI in the Banking industry
AI-powered search in banking knowledge bases - Andrea Galliani, Lorenzo Severini
Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to augment Large Language Models (LLMs) with external knowledge, including internal and private documents. In this context, has been introduced UniMate, an internal search engine to empower bank employees, based on RAG architecture. UniMate enables efficient and smart retrieval of information related to products, processes, and internal procedures. During the discussion, they delved into both engineering and data science aspects, providing an overview of the principal architectural and model choices. Additionally, they have addressed the main challenges associated with developing UniMate in a real-world banking context.
Can LLM help create simulators for reinforcement learning? - Davide Villaboni The application of reinforcement learning in the banking sector presents numerous challenges, with the primary obstacle being the lack of a secure environment suitable for simulating and effectively testing policies. To tackle this issue, the team took a different approach by reframing the problem as a forecasting challenge. The chosen model architecture incorporates a Large Language Model, and initial results suggest that this approach can effectively address Unicredit problem.
Watch the video
领英推荐
Data Science applications in Cybersecurity
Application of Graph Theory To Anomaly Detection in Cybersecurity: an Example - Alberto Mazzetto, Artificial Intelligence Modelling Engineer at Ferrari Racing
The scale and complexity of cyber-attacks have been increasing dramatically in recent years, making it necessary to accompany rule-based detections with statistically principled anomaly detection. Alberto explained how graph theory applies to this problem and reviewed global and local modelling approaches. He demonstrated one possible local approach based on a Bayesian conjugate model, the Dirichlet process, that allows for fast, scalable, explainable computations. He then explored a global-flavoured methodology, based on graph variational auto-encoders, aimed at reducing the number of false positives.
A Data-Driven Approach to Cybersecurity - Luigi De Luca, Data Scientist at Data Reply
In today’s data-driven world, Big Data and Data Science have become indispensable tools in transforming the way we approach complex problems. Big Data and Data Science are very useful in handling large volumes of data to derive actionable insights. As cyber threats continue to evolve, traditional cybersecurity methods have proven to be insufficient in effectively defending against modern attacks. So, Data Analytics plays a crucial role in the field of cybersecurity. Luca explored the benefits that a data-driven approach brings to cybersecurity, with a focus on three use cases that are subcases of anomaly detection: "UEBA", "malware detection" and "DGA detection". For each of these three use cases, he explained the improvements compared to the traditional methods and how to implement the solution.
Watch the video
Alkemy’s GenAI ecosystem
On February 20th, 2024 Marcello Villa presented Alkemy’s GenAI ecosystem and some of the use cases they are working on. Shifting perspective from the clients to the developers, in the second part Davide Posillipo reflected on how the latest Generative AI applications are impacting our field, Data Science, and what we can expect to happen in the future to our profession. As an example of new ways of working, in the final part, Milica Cvjeticanin talked about an unconventional Transformer model. LLMs modern architectures based on Transformers represent an extremely powerful tool for solving a variety of problems. However, these architectures are mostly cited when approaching natural language processing. However, by combining meta-learning, Bayesian Neural Network prior (BNN) and Transformer’s architecture the application field of transformer-based models is expanded so that it solves even classification problems with tabular data. Milica showed an example of these models named TabPFN, which could be concurrent to the best-known Machine Learning algorithms for solving these classical ML tasks, pointing out why this model is something worth keeping an eye on.
Watch the video
Knowledge section
Here are some selected resources:
Be involved!
We want also to remind you that if you like and enjoy our events, you can get in touch with us at [email protected] to be involved in organizing new great online activities.
We are also very happy if you are interested in being a speaker or if you want to share your expertise or experience with the?Data?Science?Milan?community!!!
Wallboard
Would you like to become one of our sponsors and increase your popularity among the?Data?Science?community? Write here
If instead, you would like to promote a message to the wallboard, please contact us and send us your relevant announcements. We will publish them here.