Portfolio

Portfolio

Professional Bio

I am driven data analytics professional with a background in artificial intelligence, machine learning, and business analytics and master's degree in Artificial Intelligence. My experience spans data engineering, business analysis, and AI-driven insights, equipping me with a unique skill set to drive meaningful change in technology-focused environments. I am passionate about solving complex data problems and utilizing analytics to guide strategic decision-making. This portfolio showcases my journey in AI and ML, reflecting my commitment to excellence and my eagerness to contribute to the evolving landscape of technology.

Value Proposition and Audience

Personal Value Proposition: I bring a unique combination of analytical skills and practical experience in data engineering and machine learning to streamline data processes and enhance organizational decision-making. My background enables me to develop data solutions that optimize efficiency and accuracy, contributing to actionable insights and innovation. This artifact demonstrates my technical expertise in data migration and my ability to leverage advanced tools to deliver results in real-world settings.

Target Audience: This portfolio targets data-driven organizations, technology leaders, and teams within industries such as healthcare, pharmaceuticals, and IT who seek professionals skilled in both AI/ML and data engineering. These artifacts are particularly relevant to teams focused on data warehousing, migration, and process automation.

I am a passionate master's student specializing in Artificial Intelligence and Data Analytics. My academic journey has equipped me with a strong foundation in artificial intelligence and machine learning, complemented by practical experience in data analysis and interpretation. I aim to leverage my skills to drive innovation and efficiency in technology-driven environments.


Artifact 1:

Collaborative Project: AI and ML Historical Timelines

In this collaborative project, our team explored the pivotal moments that have shaped the fields of Artificial Intelligence (AI) and Machine Learning (ML). By tracing the development of these technologies from their inception to the present day, we aimed to create a comprehensive timeline that highlights significant milestones and reflects on their profound impact on society and technology. This project deepened my understanding of AI and ML's evolution while honing my research, teamwork, and presentation skills. Together, we crafted a narrative that illustrates the journey of AI and ML, making complex history accessible and engaging for diverse audiences.

AI and ML Historical Timeline

1956: Dartmouth Conference – Birth of AI as a Field

  • Description: The first conference on artificial intelligence, where the term "AI" was coined.
  • Impact: Marked the formal beginning of AI as a field of study, setting the stage for future research and development.

1970s: Development of Early Expert Systems

  • Description: Development of systems like MYCIN for medical diagnosis.
  • Impact: Demonstrated the potential of AI in specific domains, laying the groundwork for future systems that could emulate human expertise.

1997: IBM's Deep Blue Defeats Chess Champion Garry Kasparov

  • Description: IBM’s chess-playing computer defeated the world champion.
  • Impact: Showcased AI's capabilities in strategic thinking and problem-solving, raising public awareness of AI.

2012: AlexNet Wins the ImageNet Competition

  • Description: A deep learning model that won the ImageNet competition by a large margin.
  • Impact: Sparked the deep learning revolution, leading to significant advancements in computer vision technologies.

2016: AlphaGo by DeepMind Defeats Go Champion Lee Sedol

  • Description: DeepMind’s AI beats the world champion Go player.
  • Impact: Demonstrated AI's ability to handle complex, intuitive tasks, solidifying its role in advanced cognitive capabilities.


1950: Alan Turing proposed the Imitation Game, a test to determine if machines can think

In 1952, Arthur Samuel developed a checkers program that learned to play independently

2016: AlphaGo Defeats Lee Sedol

Details: DeepMind’s AI beats the world champion Go player.

Impact: Demonstrated AI's ability to handle complex, intuitive tasks

Major AI Innovations (Recent Years)

GPT-3 by OpenAI (2020)

Innovation: Advanced natural language processing model capable of generating human-like text, enabling applications in writing, coding, and conversation

Impact: Transformational for industries relying on content creation, customer service, and more.

AlphaFold by DeepMind (2021)

?Innovation: Deep learning model that predicts protein folding with remarkable accuracy.

?Impact: Revolutionized biology and drug discovery, with implications for understanding diseases and developing therapies.

DALL-E by OpenAI (2021)

Innovation: Generative model that creates images from textual descriptions, showcasing the capabilities of AI in creative fields.

Impact: Opened new avenues in art, design, and marketing by allowing creators to visualize concepts quickly.

Leading companies in AI

OpenAI

Focus: Natural language processing, AI ethics, and safety.

Products: GPT-3, DALL-E.

DeepMindFocus: Reinforcement learning, healthcare applications.

Notable Achievements: AlphaGo, AlphaFold.

Tesla

Focus: Autonomous driving and AI in automotive technology.

IBM

Focus: AI solutions for businesses, and healthcare analytics.

NVIDIA

Focus: AI hardware and software, deep learning frameworks.

Contribution: GPUs for training AI models.

Which industry has benefited the most from recent AI innovations?

The healthcare industry has seen significant benefits, particularly with tools like IBM Watson and AlphaFold. These technologies streamline diagnostics, enhance drug discovery, and personalize patient care, improving health outcomes and efficiencies.

How can these innovations change how businesses operate in the next five years?

AI can automate routine tasks, allowing employees to focus on higher-value activities. Predictive analytics will enable businesses to anticipate customer needs and market trends, improving decision-making. This shift will likely lead to more agile organizations that can adapt quickly to changes.

What ethical challenges arise from deploying AI in decision-making roles?

Challenges include bias in AI algorithms, which can lead to unfair treatment in hiring or law enforcement areas. The opacity of AI decision-making processes raises questions about accountability and transparency. There’s also concern about data privacy and the potential misuse of personal information.

How can companies ensure that AI systems are fair and unbiased?

Companies can implement diverse datasets to train AI models, regularly audit algorithms for bias, and involve interdisciplinary teams in the development process. Establishing ethical guidelines and maintaining transparency about how AI systems make decisions can also help mitigate these issues.


I am excited about the potential of AI and machine learning to transform industries and improve lives. With my strong academic background and hands-on experience, I am eager to contribute to innovative projects and drive data-driven decisions in a forward-thinking organization. The future of AI is not just about technology but also about how we, as a society, choose to integrate and govern it.


Artifact 2: Data Migration Project at Johnson & Johnson

  • Title: Data Migration from Linux to Snowflake
  • Description: As a Data Engineering intern at Johnson & Johnson, I was responsible for a data migration project, moving essential data from a Linux environment to Snowflake. I used StreamSets to orchestrate the data flow and Python for automation and scripting tasks. The project involved ensuring data accuracy, handling data transformation requirements, and optimizing the migration for efficiency and reliability. This work was critical in enabling seamless access to consolidated data in Snowflake, improving data accessibility and analytics capabilities within the organization.
  • Skills Demonstrated:

Data migration and integration

Pipeline Automation

Data Visualization

Python scripting for automation

Data accuracy validation

Natural Language Processing (NLP)


Artifact 3 Predictive Maintenance System for Heavy Machinery

Overview

The Predictive Maintenance System for Heavy Machinery projects is designed to tackle the costly issue of unexpected equipment failures in industries like construction and mining. Leveraging advanced machine learning algorithms, this system predicts potential equipment malfunctions, enabling proactive maintenance scheduling. This approach not only reduces unplanned downtime but also extends the lifespan of critical machinery, optimizing both operational efficiency and costs.

Project Objectives

The goal of this project is to build a machine-learning model that can accurately forecast machinery failures using real-time and historical data from telematics systems. Additionally, it includes a dashboard for real-time monitoring, providing users with insights into equipment health, recommended maintenance schedules, and alerts for imminent failures.

Approach and Techniques

  1. Data Collection and Preprocessing Data were gathered from telematics sensors installed on machinery, capturing parameters like engine temperature, hydraulic pressure, operational hours, and GPS location. After data cleaning, preprocessing, and normalization, key patterns, and trends were identified to serve as features for the model.
  2. Feature Engineering Features extracted included usage frequency, sensor anomalies, and environmental factors. This step involved identifying attributes that strongly correlated with machinery failure events, which allowed the model to focus on the most relevant data.
  3. Model Development A Random Forest classifier was selected for this project due to its robustness in handling large, complex datasets. Additional models like Support Vector Machines (SVM) and Neural Networks were evaluated, but the Random Forest model demonstrated the best balance between accuracy and interpretability. The model’s precision-recall metrics indicate a high success rate in predicting failures, allowing the system to trigger alerts with minimal false positives.
  4. Dashboard Visualization Using Tableau, a dashboard was created to visualize critical metrics, including machine health status, maintenance schedules, and real-time alerts. This interface empowers operators and maintenance teams to prioritize tasks and avoid potential malfunctions by providing actionable insights.

Key Technologies Used

  • Machine Learning: Random Forest Classifier for predictive analytics
  • Data Visualization: Tableau for real-time tracking and insights
  • Data Management: SQL for data extraction and transformation, Python for preprocessing and feature engineering

Real-World Impact

The Predictive Maintenance System enhances decision-making by providing early warnings and maintenance recommendations. It reduces costs associated with unexpected equipment failure, improves operational efficiency, and contributes to sustainable machinery usage. This system serves as a transformative solution in industries where equipment reliability is critical to productivity and safety.

Future Directions

Enhancements to the model could include the integration of weather data and additional sensor types to improve prediction accuracy. Expanding this project to include predictive capabilities for various equipment types could offer even greater benefits across multiple sectors.

Artifact 4: Fraud Detection Model Using Random Forest and XGBoost


Problem Statement:

The goal of this project was to develop a machine-learning model that could classify credit card transactions as either fraudulent or legitimate. Fraudulent transactions pose significant risks to financial institutions and their customers, so detecting them accurately is a critical task. The challenge lies in the imbalanced nature of the dataset, where fraudulent transactions account for a small proportion of total transactions. This imbalance makes it challenging to build models that can effectively distinguish between the two classes.

Data Collection and Preprocessing:

For this project, I used the publicly available Kaggle Credit Card Fraud Detection Dataset, which contains transactional data, including the amount of the transaction, time, and anonymized features resulting from PCA (Principal Component Analysis) transformations of the original features.

  1. Handling Missing Data: The dataset had no missing values, so no imputation was necessary.
  2. Outlier Detection: I performed a log transformation on the Amount feature to reduce the influence of extreme transaction values, which were common in fraud cases. This helped standardize the scale and allowed the model to better differentiate between fraudulent and legitimate transactions.
  3. Class Imbalance: To handle the imbalance between legitimate and fraudulent transactions, I used SMOTE (Synthetic Minority Oversampling Technique) to oversample the minority class (fraudulent transactions). This ensured the model was trained on a more balanced dataset.

Model Selection:

I selected Random Forest and XGBoost classifiers for the following reasons:

  • Random Forest: It is an ensemble learning algorithm that is robust to overfitting and performs well with both structured and unstructured data. Its ability to handle feature interactions made it a good choice for this project.
  • XGBoost: Known for its high performance in machine learning competitions, XGBoost is a gradient-boosting algorithm that works well on imbalanced datasets and provides efficient handling of missing values.

Both models were chosen for their strong performance in classification tasks and their ability to capture non-linear relationships in the data.

Model Training and Hyperparameter Tuning:

  1. Random Forest: I used grid search to tune hyperparameters such as n_estimators, max_depth, and min_samples_split. This ensured the model would generalize well to unseen data.
  2. XGBoost: For XGBoost, I tuned hyperparameters like learning_rate, n_estimators, and max_depth to improve its predictive performance. I also experimented with different values for subsample and colsample_bytree to prevent overfitting.

Model Evaluation:

The model’s performance was evaluated using several metrics, as fraud detection requires more than just accuracy to assess the model's effectiveness:

  • Accuracy: Although accuracy was an important metric, it wasn’t the sole focus because of the class imbalance.
  • Precision and Recall: These metrics are crucial in fraud detection, where the cost of false positives (legitimate transactions flagged as fraud) is lower than false negatives (fraudulent transactions not detected).
  • F1-Score: The F1-Score provided a balance between precision and recall and was crucial for evaluating the trade-offs between false positives and false negatives.
  • Precision-Recall AUC: Given the imbalanced dataset, I used the Precision-Recall AUC instead of the ROC AUC. This metric is particularly useful when evaluating models on imbalanced datasets.

Performance:

  • Random Forest:Precision: 95%Recall: 92%F1-Score: 93.5%Precision-Recall AUC: 0.97
  • XGBoost:Precision: 97%Recall: 94%F1-Score: 95.5%Precision-Recall AUC: 0.98

Both models performed well, with XGBoost slightly outperforming Random Forest due to its ability to handle the dataset's imbalance more effectively.

Challenges and Improvements:

  • Class Imbalance: Even though I used SMOTE to address class imbalance, there was still a risk of overfitting to the minority class. Cross-validation was used to mitigate this risk and ensure that the model generalizes well.
  • Outliers: High-value transactions were difficult to classify, as they could belong to either fraudulent or legitimate transactions. The log transformation of the Amount feature helped reduce their impact but did not completely eliminate the complexity of distinguishing them.
  • Feature Importance: I used SHAP values to understand the contribution of different features to the model's predictions. Features like Amount, Time, and V1 (an anonymized feature) were the most important predictors of fraud.

Conclusion:

This project demonstrated the effectiveness of Random Forest and XGBoost in detecting fraudulent credit card transactions. The model's strong performance in terms of precision, recall, and F1-score highlights its ability to detect fraud with a low risk of false positives. The use of SMOTE helped address class imbalance, while the log transformation of transaction amounts reduced the influence of extreme outliers.

In the future, more advanced techniques such as deep learning or ensemble methods combining multiple algorithms could be explored to further enhance performance. Additionally, incorporating domain-specific features, like transaction history or customer behavior patterns, could improve the model's ability to detect fraud.


Artifact 5: AI-Powered Chatbot for Customer Support with Sentiment Analysis


Here’s the implementation of an AI-powered chatbot for Customer Support with Sentiment Analysis. Below is a Python code sample that showcases the main functionalities:


Code: AI-Powered Chatbot

# Required Libraries
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from flask import Flask, request, jsonify
import random

# Download the NLTK Sentiment Analysis Tool
nltk.download('vader_lexicon')

# Initialize Sentiment Analyzer
sia = SentimentIntensityAnalyzer()

# Predefined Responses (Intent Recognition Simulation)
responses = {
    "greeting": ["Hello! How can I assist you today?", "Hi there! What can I do for you?"],
    "issue": ["I'm sorry to hear that you're facing an issue. Could you provide more details?", 
              "Let me help you resolve that issue. Can you elaborate?"],
    "thanks": ["You're welcome! Is there anything else I can help you with?", "Happy to help!"],
    "default": ["I'm here to assist. Could you clarify your request?", 
                "I'm not sure I understood that. Can you rephrase?"]
}

# Intent Detection Function
def detect_intent(user_message):
    if any(word in user_message.lower() for word in ["hello", "hi", "hey"]):
        return "greeting"
    elif any(word in user_message.lower() for word in ["issue", "problem", "help"]):
        return "issue"
    elif any(word in user_message.lower() for word in ["thanks", "thank you"]):
        return "thanks"
    else:
        return "default"

# Sentiment Analysis Function
def analyze_sentiment(user_message):
    score = sia.polarity_scores(user_message)["compound"]
    if score >= 0.05:
        return "positive"
    elif score <= -0.05:
        return "negative"
    else:
        return "neutral"

# Chatbot Response Function
def get_response(user_message):
    intent = detect_intent(user_message)
    sentiment = analyze_sentiment(user_message)
    
    # Modify response based on sentiment
    if sentiment == "negative":
        return random.choice(responses[intent]) + " I can sense this might be frustrating, but I'm here to help."
    elif sentiment == "positive":
        return random.choice(responses[intent]) + " It's great to see your positivity!"
    else:
        return random.choice(responses[intent])

# Flask App Setup
app = Flask(__name__)

@app.route('/chat', methods=['POST'])
def chat():
    user_message = request.json.get("message", "")
    if not user_message:
        return jsonify({"error": "Message field is required!"}), 400
    
    chatbot_response = get_response(user_message)
    return jsonify({"response": chatbot_response})

# Run the App
if __name__ == "__main__":
    app.run(debug=True)
        

How the Project Works

  1. User Interaction: The user sends a message to the chatbot via a POST request.
  2. Intent Detection: The chatbot identifies the user's intent using keywords.
  3. Sentiment Analysis: The chatbot analyzes the user's emotional tone (positive, negative, neutral).
  4. Dynamic Response: Based on the detected intent and sentiment, the chatbot tailors its response.
  5. Flask API: A lightweight web server for testing and deploying the chatbot.


How to Run the Project

  1. Save the code in a file named chatbot.py.
  2. Install the required libraries: pip install flask nltk
  3. Run the Flask app: python chatbot.py
  4. Test the chatbot using tools like Postman or cURL: Send a POST request to https://127.0.0.1:5000/chat with a JSON body: { "message": "Hi, I need help with my order." }


Sample Output

Input:

{"message": "Hi, I need help with a problem."}
        

Output:

{"response": "I'm sorry to hear that you're facing an issue. Could you provide more details? I can sense this might be frustrating, but I'm here to help."}
        

Conclusion

The AI-powered Chatbot for Customer Support with Sentiment Analysis project demonstrates how natural language processing (NLP) and machine learning can be integrated to build a chatbot that not only understands user queries but also tailors its responses based on the emotional tone of the conversation. By leveraging sentiment analysis, the chatbot is able to adjust its behavior to offer a more empathetic or encouraging response, enhancing the overall user experience.

This project highlights key skills in Python programming, Flask web development, NLP (via NLTK and Vader Sentiment), and intent classification. It provides a solid foundation for further development, such as integrating machine learning models for better intent recognition or implementing more advanced user feedback mechanisms.

Ultimately, the project illustrates the power of AI to improve customer service interactions, offering scalable solutions for real-time, personalized support. This chatbot could be expanded with more sophisticated features, such as automated issue resolution, multi-language support, and integration with existing customer service platforms.


Artifact 6: AI Security Risks and Data Privacy in Healthcare


Significance of the Assignment: This project focuses on AI security risks and data privacy in healthcare, specifically emphasizing HIPAA compliance. It involves researching security vulnerabilities such as prompt injection attacks, malicious training datasets, and data leakages while developing a workshop with preventive strategies and recommendations.

Skills Showcased:

  • Strategic Planning: Designed a structured workshop with a presentation, facilitation guide, and security checklist to educate stakeholders on AI security risks.
  • Ethical Decision-Making: Evaluated the ethical implications of AI-driven healthcare systems, ensuring compliance with regulatory standards.
  • Technical Competencies: Researched and analyzed generative AI security risks, providing actionable mitigation strategies aligned with HIPAA requirements.
  • Leadership & Communication: Facilitated discussions on AI risks and data privacy, engaging cross-functional teams in knowledge-sharing sessions.

Alignment with Leadership Role & Professional Aspirations: As a Generative AI Engineer in the manufacturing sector, this assignment strengthens my expertise in AI governance and responsible AI deployment—key factors for AI adoption in highly regulated industries. It reinforces my leadership capabilities by preparing me to guide teams in mitigating AI-related security risks and shaping strategic AI policies. Additionally, it aligns with my broader AI strategies by bridging multi-modal AI applications with compliance-driven innovation, ensuring AI solutions are both transformative and secure.


Winston K

BI Software Engineer (Qlik & Power BI) @ Penske | Data Analytics, Emerging Digital Team

4 个月

Interesting information thanks for sharing

要查看或添加评论,请登录

Rosy Veronica Kaki的更多文章

社区洞察

其他会员也浏览了