Lessons Learned from troubleshooting CI/CD Pipelines

Introduction

As a DevOps enthusiast, I recently faced a thrilling challenge while deploying a NestJS project using Jenkins and Ansible to an Ubuntu server. Here’s my experience about the unexpected solution that saved the day.

The Challenge

In my Jenkins pipeline, the yarn install command failed, complaining that Node.js wasn't installed on the server. However, I knew Node.js v18 was already installed and working fine for my NestJS project.

Initial ansible playbook

 ---
- name: Update and Restart ehs_api on AWS QA
  # Define target hosts with the "aws_qa" group
  hosts: ehs_qa

  # Enable privilege escalation (become) for tasks that require it
  become: true

  tasks:
    # Pull latest changes from Git
    - name: Pull latest changes from Git repository
      shell: git -C /home/ubuntu/ehs_api pull origin main

    # Build the application with yarn build
    - name: Install dependencies with yarn
      become: true
      shell: |
        yarn                           # Run yarn
        yarn build
      args:
        chdir: /home/ubuntu/ehs_api
.....        

Error received after running the pipeline

TASK [Install dependencies with yarn] **********************************************************************************
fatal: [54.237.40.198]: FAILED! => {"changed": true, "cmd": "cd /home/ubuntu/ehs_api && yarn", "delta": "0:00:00.006116", "end": "2024-09-26 15:46:25.864622", "msg": "non-zero return code", "rc": 127, "start": "2024-09-26 15:46:25.858506", "stderr": "/usr/bin/env: ‘node’: No such file or directory", "stderr_lines": ["/usr/bin/env: ‘node’: No such file or directory"], "stdout": "", "stdout_lines": []}
        

Initial Investigation

I suspected the issue was due to Ansible running as the root user instead of the ubuntu user. After researching and trying various solutions, I still couldn't resolve the issue.

Changes I made to fix the issue:

1- Add remote user in Ansible playbook file

 ---
- name: Update and Restart ehs_api on AWS QA
  # Define target hosts with the "aws_qa" group
  hosts: ehs_qa

  # Enable privilege escalation (become) for tasks that require it
  remote_user: ubuntu
  become: false

  tasks:
    # Pull latest changes from Git
    - name: Pull latest changes from Git repository
      shell: git -C /home/ubuntu/ehs_api pull origin main

    # Build the application with yarn build
    - name: Install dependencies with yarn
      become: true
      shell: |
        yarn                           # Run yarn
        yarn build
      args:
        chdir: /home/ubuntu/ehs_api
.....        

2- Added remote user in Ansible inventory file

/etc/ansible/hosts

[ehs_qa:vars]
ansible_user=ubuntu        

3- When calling ansible playbook added user

sudo ansible-playbook --user=ubuntu /var/ansible/playbooks/ehs_qa_playbook.yaml        

There are more changes that I have made in the playbook, but they were of no use and resulted in the same error mentioned above.

The Real Issue

After digging deeper, I discovered the culprit: NVM (Node Version Manager). When connecting to the server via SSH as the ubuntu user, the PATH environment variable was different from when Ansible ran. This discrepancy caused the yarn install command to fail.

The Urgent Fix

With time running out, I made the bold decision to delete NVM from the host server and install Node.js 18 globally. This drastic measure resolved the issue, and my pipeline completed successfully.

Key Takeaways

Environment Consistency Matters: Tools like NVM can introduce discrepancies between user environments, particularly when deploying through automation tools like Ansible.

Troubleshooting is Iterative: It's important to keep an open mind and revisit assumptions. In this case, permissions weren't the issue—the environment was.

Sometimes a Quick Fix is Necessary: When time is of the essence, finding a solution that works in the short term is just as important as figuring out long-term optimizations.

Looking back, this challenge emphasized the importance of environment management in CI/CD workflows and taught me the value of digging deeper when troubleshooting automation failures.

Feel free to share your thoughts or similar experiences!

Abbas Raj

Director Admin/HR at TRUFFLES EVENT DESIGN

5 个月

Very informative

Qasim S. Ferozpurwala

Founder Q-Solutions | Blockchain Developer / Consultant / Trainer | Technology Enthusiast

5 个月

Insightful and very informative ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了