Brett Kennedy的动态

Data Scientist, Author of Outlier Detection in Python, ex-Director of Research at CaseWare International, Master of Science - MSc at University of Toronto

8 个月

My next installment of articles on Medium on interpretable outlier detection is now up, https://lnkd.in/gpSDgYTM This is another sample from Outlier Detection in Python (https://lnkd.in/gVq-ACgJ), in this case presenting another interpretable algorithm called Counts Outlier Detector, based on multi-dimensional histograms. #outlierdetection #anomalydetection #XAI #machinelearning #datascience #python

Counts Outlier Detector: Interpretable Outlier Detection

towardsdatascience.com

要查看或添加评论，请登录

最相关的动态

Astor Perkins

1,121 位关注者
3 个月
举报此动态
Perform outlier detection more effectively using subsets of features This article is part of a series related to the challenges, and the techniques that may be used, to best identify outliers in data, including articles related to using PCA, Distance Metric Learning, Shared Nearest Neighbors, Frequent Patterns Outlier Factor, Counts Outlier Detector (a multi-dimensional histogram-based method), and doping. This article also contains an excerpt from my book, Outlier Detection in Python. We look here at techniques to create, instead of a single outlier detector examining all features within a dataset, a series of smaller outlier detectors, each working with a subset of the features (referred to as subspaces). There are also a number of technical challenges that appear in outlier detection. Among these are the difficulties that occur where data has many features. As covered in previous articles related to Counts Outlier Detector and Shared Nearest Neighbors, where we have many features, we often face an issue known as the curse of dimensionality. This has a number of implications for outlier detection, including that it makes distance metrics unreliable. Many outlier detection algorithms rely on calculating the distances between records — in order to identify as outliers the records that are similar to unusually few other records, and that are unusually different from most other records — that is, records that are close to few other records and far from most other records. To address these issues, an important technique in outlier detection is using subspaces. The term subspaces simply refers to subsets of the features. In the example above, if we use the subspaces: A-B, C-D, E-F, A-E, B-C, B-D-F, and A-B-E, then we have seven subspaces (five 2d subspaces and two 3d subspaces). Creating these, we would run one (or more) detectors on each subspace, so would run at least seven detectors on each record. We’ve seen, then, a couple motivations for working with subspaces: we can mitigate the curse of dimensionality, and we can reduce where anomalies are not identified reliably where they are based on small numbers of features that are lost among many features. As well as handling situations like this, there are a number of other advantages to using subspaces with outlier detection. These include... https://lnkd.in/exfdadZC

Perform Outlier Detection More Effectively Using Subsets of Features | Towards Data Science

https://towardsdatascience.com
赞评论
要查看或添加评论，请登录
Brett Kennedy

Data Scientist, Author of Outlier Detection in Python, ex-Director of Research at CaseWare International, Master of Science - MSc at University of Toronto
3 个月
举报此动态
My most recent Medium article related to outlier detection is now live. This covers the use of subspaces: creating sets of small detectors, each covering a restricted number of features, which can often allow for more accurate, faster, and more interpretable tests for outliers. https://lnkd.in/g3YF3KSa This includes an excerpt from Outlier Detection in Python https://lnkd.in/gVq-ACgJ

Perform outlier detection more effectively using subsets of features

https://towardsdatascience.com

2 条评论
赞评论
要查看或添加评论，请登录
Shahid Hussain

ML Engineer @byMind Solutions | NLP | LLMs | GenAI | Chatbots
10 个月
举报此动态
Ever struggle with strange email addresses? This Python code snippet can help you check if an email format is valid! Here's the magic: It uses Regular Expressions (RegEx), a powerful tool for string manipulation. Think of it as a detective for text, searching for specific patterns. This code's RegEx pattern ensures the email address follows a format with: Lowercase letters (a-z). Digits (0-9). A single . or _ before the @ symbol. Exactly one @ symbol. A dot (.) in the second or third position from the end. How to use it? Save the code as a Python file (e.g., validate_email.py). Run the script and enter an email address. The code will tell you if the format is valid! Want to learn more about RegEx? There's a whole world of text wrangling at your fingertips in Python! #worklyrow #AI #artificialintelligence #machinelearning #machinelearningcourse
赞评论
要查看或添加评论，请登录
Yash Mourya

Follow for insights on Flutter,AI,ML, and cutting-edge tech trends
9 个月已编辑
举报此动态
Did you know this python library? DeepFace is a powerful yet lightweight Python library for facial recognition and analysis? ?? ? With DeepFace, you can easily detect faces and analyze attributes like age, gender, and emotions in photographs. It integrates leading models such as VGG-Face, FaceNet, OpenFace, DeepFace, DeepID, ArcFace, Dlib, SFace, and GhostFaceNet, providing robust and accurate results. ?? Get started by installing DeepFace: pip install deepface ?? Here’s how you can compare the similarity of two faces: python code : from deepface import DeepFace result = DeepFace.verify(img1_path = "img1.jpg", img2_path = "img2.jpg") print(result) But that's not all! DeepFace also supports: - Finding the most similar faces in a dataset - Performing detailed facial attribute analysis Perfect for enhancing security systems, creating personalized user experiences, and more! ?? #FacialRecognition #Python #MachineLearning #DeepLearning #AI #TechInnovation #DataScience #DeepFace
1 条评论
赞评论
要查看或添加评论，请登录
Dustin Williams

Product Owner / Engineer / Digital Operations & Systems Reliability / Technical & Incident Support / Project & Change Management
9 个月
举报此动态
FYI: A critical vulnerability, dubbed Llama Drama (CVE-2024-34359), was discovered in the llama_cpp_python package used for integrating AI models with Python. It allows for arbitrary code execution due to inadequate security measures in the Jinja2 template rendering tool, which the package relies on. Checkmarx highlighted the risk, noting that over 6,000 AI models on platforms like Hugging Face are affected. The issue has been addressed in the latest release of llama_cpp_python (0.2.72). #security #vulnerability #python #ai?

Critical Flaw in AI Python Package Can Lead to System and Data Compromise

securityweek.com
赞评论
要查看或添加评论，请登录
Eralda Gjika

Data & Psychometrics| Statistician| Data Scientist| Forecaster
5 个月
举报此动态
?? Shared Nearest Neighbors (SNN) distance metric is clearly described with focus on its application to outlier detection in the below article. Thanks to Brett Kennedy for sharing a detailed work on #SNN ?? ??The article also covers quickly its #application to #prediction and #clustering, but focus on #outlier #detection, and specifically on SNN’s application to the k #Nearest #Neighbors outlier detection #algorithm. ??If your work deals with above challenges you should go and spend some time on this article and tests presented usin #Python #libraries. https://lnkd.in/eWiNREkA

Shared Nearest Neighbors: A More Robust Distance Metric

towardsdatascience.com
赞评论
要查看或添加评论，请登录
MD ASHRAF ALI

Software Engineer | MCA Graduate | Full Stack Developer (MERN) | Problem-Solving and Coding
8 个月
举报此动态
Software

Anshuman Jha

Al Consultant | AI Multi-Agents | GenAI | LLM | RAG | Open To Collaborations & Opportunities
8 个月

Implementing a Recommendation System Using Collaborative Filtering from Scratch in Python In today's data-driven world, recommendation systems play a crucial role in enhancing user experiences by predicting user preferences. This detailed post provides insights into creating a recommendation engine using collaborative filtering techniques and matrix factorization from scratch, using Python. The article explores fundamental concepts and steps involved, eschewing reliance on high-level libraries. Key Concepts: 1. Collaborative Filtering: This technique leverages past behavior of users and similar decisions made by others to predict preferences. 2. Matrix Factorization: A collaborative filtering algorithm that decomposes the user-item interaction matrix into two lower-dimensional matrices. Steps Involved: 1. Data Preparation: Creating a user-item interaction matrix. 2. Normalization: Adjusting for user biases by normalizing ratings. 3. Matrix Factorization: Using gradient descent to factorize the matrix into two lower-dimensional matrices. 4. Prediction: Using factorized matrices to predict missing entries. 5. Evaluation: Assessing the performance using Root Mean Squared Error (RMSE). This post walks through each step with detailed explanations and sample code, guiding readers from data preparation to evaluation. It includes a flowchart for visualizing the process, enhancing understanding. This resource is invaluable for those looking to build a basic recommendation engine and enhance it based on specific requirements. #Python #RecommendationSystem #CollaborativeFiltering #MatrixFactorization #DataScience #MachineLearning #AI #ArtificialIntelligence
赞评论
要查看或添加评论，请登录
Nirmalkumar Seshachalam

Digital Solution Architect - AI, Digital Experience, Cloud, Agile
9 个月
举报此动态
Data Cleanup an prerequisite for AI. Upon flipping at the works of AI and feedback from lots of leaders, what I realised is there is no Trust that the outcome of the work that AI produced is usable. One of the major spoiler is Hallucination. The key reason for this is unpredictable input. One way to mitigate is data cleanup. The key skills that I see here to help at scale are : Python and Perl. Perl may not be used widely like Python. But remember Perl once called as Practical Extraction and Reporting Language is very powerful in string manipulations.
赞评论
要查看或添加评论，请登录
Anshuman Jha

Al Consultant | AI Multi-Agents | GenAI | LLM | RAG | Open To Collaborations & Opportunities
8 个月
举报此动态
Implementing a Recommendation System Using Collaborative Filtering from Scratch in Python In today's data-driven world, recommendation systems play a crucial role in enhancing user experiences by predicting user preferences. This detailed post provides insights into creating a recommendation engine using collaborative filtering techniques and matrix factorization from scratch, using Python. The article explores fundamental concepts and steps involved, eschewing reliance on high-level libraries. Key Concepts: 1. Collaborative Filtering: This technique leverages past behavior of users and similar decisions made by others to predict preferences. 2. Matrix Factorization: A collaborative filtering algorithm that decomposes the user-item interaction matrix into two lower-dimensional matrices. Steps Involved: 1. Data Preparation: Creating a user-item interaction matrix. 2. Normalization: Adjusting for user biases by normalizing ratings. 3. Matrix Factorization: Using gradient descent to factorize the matrix into two lower-dimensional matrices. 4. Prediction: Using factorized matrices to predict missing entries. 5. Evaluation: Assessing the performance using Root Mean Squared Error (RMSE). This post walks through each step with detailed explanations and sample code, guiding readers from data preparation to evaluation. It includes a flowchart for visualizing the process, enhancing understanding. This resource is invaluable for those looking to build a basic recommendation engine and enhance it based on specific requirements. #Python #RecommendationSystem #CollaborativeFiltering #MatrixFactorization #DataScience #MachineLearning #AI #ArtificialIntelligence
赞评论
要查看或添加评论，请登录
Meng Li

AI Engineer，Full-time open source engineer, Apache Linkis Committer, initiator of the SolidUI AI painting project.
1 个月
举报此动态
Learn how to classify handwritten digits using KNN in Python's sklearn. Compare KNN, SVM, Naive Bayes, and Decision Tree for accuracy and performance. #AI #KNN #SVM #Bayes #Decision #Python #sklearn

KNN (Part 2): How to Recognize Handwritten Digits?(Practical Data Analysis 19)

pythonlibraries.substack.com
赞评论
要查看或添加评论，请登录

5,790 位关注者

82 则动态

查看档案关注

登录查看更多内容