Great Opportunity for a fintech startup

SRE/DevOps Senior Engineer

 

Job Summary:

Site Reliability Engineers are considered to be mission critical. The SRE team has a major impact in engineering by continuously optimizing web services and building tools to make the lives of different technology teams easy. The team consists of engineers who take ownership for management of large-scale infrastructure while improving reliability and automation. SREs actively manage the infrastructure and we are looking for engineers who want to take part in developing infrastructure software, maintaining it, scaling it and at the same time passionate about exploring new technologies.

 

About you:                 

You pay attention to detail and are a results-driven. You will work in a fast paced, challenging environment to enhance, troubleshoot, and build out our current infrastructure and devops process. You will become an integral part of the team, making every problem of the platform a problem of your own, and solving them accordingly.

 

We believe that communication is the bridge between confusion and clarity, so you will need be a team player have excellent communication, computer, and project management skills, you should be focused on building a better, more efficient applications and creating a better end-user experience and you must be knowledgeable, collaborative, and motivated.

 

Every day you will dig into new technologies running across thousands of servers and deliver centralized, standard infrastructure with improved availability and reduced operating costs. You will work with infrastructure at a massive scale producing simple, scalable engineered infrastructure that continuously improves. You will write scripts that remove routine operational work from our support teams and migrate components to standardized infrastructure. Automation is at the core of everything you'll do.

Key Experience & Skills:

·       Candidate should have Software Engineering background in the past. Ability to understand code written by others and identify major issues.

·       Tech tools needed: AWS console, Jira, Slack, Excel, and Word

·       Proven Systems Administration experience within a mixed Windows, Linux physical and virtual environment

·       Knowledge of Clusters, Storage, Backups, Data Export/Import, Monitoring tools, and disaster recovery.

·       Experience with software management - installations, patching of OS/apps, etc.

·       Understand security architecture and assurance, threat modelling, log analysis, and the application of the same on public cloud platforms (Azure, AWS, GCP) and private cloud.

·       Understand microservice architecture and experience working with container platforms such as (Docker, Kubernetes, EKS, ECS).

·       Ability to deploy and configure infrastructure for running virtual machines.

·       Engineer facilitates these practices in collaboration with the Development, Quality Assurance and Technical Operations teams to drive business goals.

·       Experience using tools like Git, Jenkins, Sonar, Maven, Gradle, Selenium, Docker, Kubernetes, Openshift …

·       Strong knowledge of infrastructure provisioning tools such as Terraform, Ansible and CloudFormation

·       Profound knowledge in various scripting languages, Linux system & server administration and mass system deployments.

·       Experience with advanced scripting (Bash/Python) experience & DevOps or systems Administrator with Linux/Unix Experience.

·       Mobile money and E-payments hands on experience.

·       Deployment automation experience using advanced scripting (Bash/Python).

·       Experience in developing and maintaining CI/CD process for cloud and on-prem applications in Java and React or AngularJS.

·       Experience with configuration management tools such as Rundeck, Ansible, Chef or Puppet.

·       Ability to use a wide variety of open source technologies and tools.

·       Experience with public cloud providers such as AWS, GCP, and Data-centre solutions.

·       Troubleshooting deployment and environment related issues in order to identify and escalate/solve hardware or software issues. Experience with Nagios or Zabbix monitoring systems

·       Strong problem-solving skills with an investigative mentality, decision making ability and a capacity for strategic and associative thinking.

·       Ability to demonstrate a clear, energetic and excited interest in automating everything (build, test, release/deploy, monitoring, reporting)

·       Experience installing, configuring and troubleshooting Linux web-based solutions (apache2, wildfly, tomcat, java, ...)

·       Packaging application for deployment on staging and production environments.

·       Supporting developers on their releases and deployments.

·       Availability to perform deployments on Non business hours.

·       Responsible for multitasking and dealing with multiple urgent situations at a time must be extremely flexible.

·       Understand company needs to define system specifications.

·       Plan and design the structure of a technology solution.

·        Evaluate and select appropriate software or hardware and suggest integration methods.

·       Experience with web tools (NGINX - GIT - MYSQL - Mongodb - Postgres - ZFS - GlusterFS - HAProxy - Cloudflare - Docker - php applications - python applications) installation and administration

 

Responsibilities:

·       Establish, implement, and support CI/CD pipelines across multiple platforms and infrastructures.

·       Build automated deployments through the use of configuration management technology.

·       Maintain the company’s data center infrastructure.

·       Maintain the GIT infrastructure and cloud repositories, and continuously educate other team members on its usage.

·       Establish and routinely test Backup and Disaster Recovery solutions.

·       Implement exhaustive monitoring solutions and respond to all alerts.

·        Champion best-practice security measures throughout the organization, and implement and maintain state-of-the-art security solutions on the company’s infrastructure.

·       Maintain various environments to support development, and production activities.

·       Monitor the company’s servers and routinely patch and maintain them

·       Participate in the processes of strategic sprint-planning meetings, in addition to providing guidance and expertise on system options, risk, impact and costs vs. benefits.

·       Continuously support the development team by building solutions to remove any operations impediments they may face

·       Provide recommendations for enhancing performance and cost via gap analysis, identifying the most practical alternative solutions and assisting with modifications.

·       Continuously research new technologies and adopt them into the company's infrastructure.

·       Despite having regular work hours, the candidate should be able to act on-call 24 hours a day, 7 days a week.

·       Responsible for the operating systems updates, Installation, patches and configuration changes, maintenance procedures.

·       Network/Performance troubleshoot problems reported by users.

·       Overall responsibility for the system security. Maintain network and system security from all threats and maintain high performance of the apps and websites. (I.e. Load Balancer, Firewall… etc.).

·       DNS, web farm, mail server management and load balancing. Back up management.

·       Manage the contracts with Hosting and Hardware providers.

·       Manage the overall server and database access management.

·       Maintain PCI DSS compliance standards.

·       Set up polices and procedure for the rollouts and access management along with the development team.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了