June 22, 2022

June 22, 2022

What you need to know about site reliability engineering

What is site reliability engineering? The creator of the first site reliability engineering (SRE) program, Benjamin Treynor Sloss at Google, described it this way: Site reliability engineering is what happens when you ask a software engineer to design an operations team. What does that mean? Unlike traditional system administrators, site reliability engineers (SREs) apply solid software engineering principles to their day-to-day work. For laypeople, a clearer definition might be: Site reliability engineering is the discipline of building and supporting modern production systems at scale. SREs are responsible for maximizing reliability, performance availability, latency, efficiency, monitoring, emergency response, change management, release planning, and capacity planning for both infrastructure and software. ... SREs should be spending more time designing solutions than applying band-aids. A general guideline is for SREs to spend 50% of their time in engineering work, such as writing code and automating tasks. When an SRE is on-call, time should be split between about 25% of time managing incidents and 25% on operations duty.


Are blockchains decentralized?

Over the past year, Trail of Bits was engaged by the Defense Advanced Research Projects Agency (DARPA) to examine the fundamental properties of blockchains and the cybersecurity risks associated with them. DARPA wanted to understand those security assumptions and determine to what degree blockchains are actually decentralized. To answer DARPA’s question, Trail of Bits researchers performed analyses and meta-analyses of prior academic work and of real-world findings that had never before been aggregated, updating prior research with new data in some cases. They also did novel work, building new tools and pursuing original research. The resulting report is a 30-thousand-foot view of what’s currently known about blockchain technology. Whether these findings affect financial markets is out of the scope of the report: our work at Trail of Bits is entirely about understanding and mitigating security risk. The report also contains links to the substantial supporting and analytical materials. Our findings are reproducible, and our research is open-source and freely distributable. So you can dig in for yourself.


Why The Castle & Moat Approach To Security Is Obsolete

At first, the shift in security strategy went from protecting one, single castle to a “multiple castle” approach. In this scenario, you’d treat each salesperson’s laptop as a sort of satellite castle. SaaS vendors and cloud providers played into this idea, trying to convince potential customers not that they needed an entirely different way to think about security, but rather that, by using a SaaS product, they were renting a spot in the vendor’s castle. The problem is that once you have so many castles, the interconnections become increasingly more difficult to protect. And it’s harder to say exactly what is “inside” your network versus what is hostile wilderness. Zero trust assumes that the castle system has broken down completely, so that each individual asset is a fortress of one. Everything is always hostile wilderness, and you operate under the assumption that you can implicitly trust no one. It’s not an attractive vision for society, which is why we should probably retire the castle and moat metaphor.?Because it makes sense to eliminate the human concept of trust in our approach to cybersecurity and treat every user as potentially hostile.


Improving AI-based defenses to disrupt human-operated ransomware

Disrupting attacks in their early stages is critical for all sophisticated attacks but especially human-operated ransomware, where human threat actors seek to gain privileged access to an organization’s network, move laterally, and deploy the ransomware payload on as many devices in the network as possible. For example, with its enhanced AI-driven detection capabilities, Defender for Endpoint managed to detect and incriminate a ransomware attack early in its encryption stage, when the attackers had encrypted files on fewer than four percent (4%) of the organization’s devices, demonstrating improved ability to disrupt an attack and protect the remaining devices in the organization. This instance illustrates the importance of the rapid incrimination of suspicious entities and the prompt disruption of a human-operated ransomware attack. ... A human-operated ransomware attack generates a lot of noise in the system. During this phase, solutions like Defender for Endpoint raise many alerts upon detecting multiple malicious artifacts and behavior on many devices, resulting in an alert spike.


Reexamining the “5 Laws of Cybersecurity”

The first rule of cybersecurity is to treat everything as if it’s vulnerable because, of course, everything is vulnerable. Every risk management course, security certification exam, and audit mindset always emphasizes that there is no such thing as a 100% secure system. Arguably, the entire cybersecurity field is founded on this principle. ... The third law of cybersecurity, originally popularized as one of Brian Krebs’ 3 Rules for Online Safety, aims to minimize attack surfaces and maximize visibility. While Krebs was referring only to installed software, the ideology supporting this rule has expanded. For example, many businesses retain data, systems, and devices they don’t use or need anymore, especially as they scale, upgrade, or expand. This is like that old, beloved pair of worn out running shoes that sit in a closet. This excess can present unnecessary vulnerabilities, such as a decades-old exploit discovered in some open source software. ... The final law of cybersecurity states that organizations should prepare for the worst. This is perhaps truer than ever, given how rapidly cybercrime is evolving. The risks of a zero-day exploit are too high for businesses to assume they’ll never become the victims of a breach.


How to Adopt an SRE Practice (When You’re not Google)

At a very high level, Google defines the core of SRE principles and practices as an ability to ’embrace risk.’ Site reliability engineers balance the organizational need for constant innovation and delivery of new software with the reliability and performance of production environments. The practice of SRE grows as the adoption of DevOps grows because they both help balance the sometimes opposing needs of the development and operations teams. Site reliability engineers inject processes into the CI/CD and software delivery workflows to improve performance and reliability but they will know when to sacrifice stability for speed. By working closely with DevOps teams to understand critical components of their applications and infrastructure, SREs can also learn the non-critical components. Creating transparency across all teams about the health of their applications and systems can help site reliability engineers determine a level of risk they can feel comfortable with. The level of desired service availability and acceptable performance issues that you can reasonably allow will depend on the type of service you support as well.

Read more here ...

要查看或添加评论,请登录

Kannan Subbiah的更多文章

  • March 19, 2025

    March 19, 2025

    How AI is Becoming More Human-Like With Emotional Intelligence The concept of humanizing AI is designing systems that…

  • March 17, 2025

    March 17, 2025

    Inching towards AGI: How reasoning and deep research are expanding AI from statistical prediction to structured…

  • March 16, 2025

    March 16, 2025

    What Do You Get When You Hire a Ransomware Negotiator? Despite calls from law enforcement agencies and some lawmakers…

  • March 15, 2025

    March 15, 2025

    Guardians of AIoT: Protecting Smart Devices from Data Poisoning Machine learning algorithms rely on datasets to…

    1 条评论
  • March 14, 2025

    March 14, 2025

    The Maturing State of Infrastructure as Code in 2025 The progression from cloud-specific frameworks to declarative…

  • March 13, 2025

    March 13, 2025

    Becoming an AI-First Organization: What CIOs Must Get Right "The three pillars of an AI-first organization are data…

  • March 12, 2025

    March 12, 2025

    Rethinking Firewall and Proxy Management for Enterprise Agility Firewall and proxy management follows a simple rule:…

  • March 11, 2025

    March 11, 2025

    This new AI benchmark measures how much models lie Scheming, deception, and alignment faking, when an AI model…

  • March 10, 2025

    March 10, 2025

    The Reality of Platform Engineering vs. Common Misconceptions In theory, the definition of platform engineering is…

  • March 09, 2025

    March 09, 2025

    Software Development Teams Struggle as Security Debt Reaches Critical Levels Software development teams face mounting…

社区洞察

其他会员也浏览了