登录查看更多内容

Platform Firmware Resilience – (Part -2)

Anupam Datta

发布日期: 2025年2月5日

Reproduced from - https://2wisebit.wordpress.com/2025/02/04/platform-firmware-resilience-part-2/

Recovery Mechanisms

This section focuses on recovery mechanisms designed to restore a device's firmware or critical data to a valid state in the event of corruption. Recovery is essential for maintaining the security and functionality of a platform. Below are the key guidelines for implementing effective recovery mechanisms:

Recovery of Mutable Code and Firmware

Recovery mechanisms for firmware ensure that devices can return to a secure and operational state after corruption or compromise.

Recovery Notifications: The recovery mechanism should be capable of sending notifications about recovery events and actions. This keeps administrators informed of any changes or restoration processes.
Automatic Recovery: The recovery mechanism may perform recovery actions automatically without requiring user or administrator intervention, enabling a rapid response to corruption.
User/System Administrator Approval: In some cases, the recovery mechanism may require approval from the user or system administrator before initiating recovery, ensuring control over the process.
Administrator-Driven Recovery: The platform administrator should have the ability to initiate recovery of mutable code. This may include a mechanism for the administrator to force recovery, with authorization verified through a trusted chain of devices. Devices may receive platform-level authorization to force recovery after verifying the authorization from trusted devices in the recovery chain.
Protection Against Unauthorized Rollback: The recovery process must prevent unauthorized rollback to earlier firmware versions with known security vulnerabilities. The recovery process may be multi-stage, ensuring that each step is safe and validated.

Recovery of Critical Data

This section details mechanisms to recover critical data in platform devices in the event of corruption or compromise. Critical data, often user-configurable, requires trusted backup mechanisms to restore its integrity. These backups must be safeguarded against potential attacks.

Resistance to Attacks: Recovery mechanisms must be designed to resist attacks that could corrupt active critical data or the Primary Firmware Image, or bypass their protection.
Backup of Critical Data:

Critical data should be restorable to a known good state after corruption or compromise through some mechanism. Major recommendations are:

Backup Techniques:

Devices shall offer an option for the backup of a known good copy of critical data to one or more secure locations.

The backup locations need to have protection that is as robust, if not more so, than protecting the active critical data. It will ensure that the integrity and confidentiality of the backup are achieved.

Symbiont Devices:

Symbiont devices are those which rely on a host. Symbiont devices may not have the capacity to back up their critical data.

In this case, it is the responsibility of the host device to ensure that the critical data of the symbiont device is backed up.

If the symbiont device allows critical data that it provides the host to back up, it should also have the ability to consume the retrieved data during restore.

Fuse Banks for Backup:

Fuse Banks:?Fuse banks especially OTP fuses can be employed to store any critical data. For example cryptographic keys, configuration of the devices, or any security policy.

Fuse banks will offer tamper-proof storage in which the data cannot be altered once programmed.

They are useful for storing critical data that cannot be changed, or rarely modified.

Backup to Fuse Banks:?Critical data may be backed up to fuse banks during manufacturing or initial configuration. This ensures there is a safe, immutable copy of the data available for recovery.

Limitations:?Since fuse banks are normally OTP, they are not used for frequently-updated data. For such a case, other secure storage mechanisms (e.g., encrypted flash memory) are to be utilized.

Secure Recovery:

Recovery mechanisms should strictly only restore validated and authenticated backup data.

Restore access to backup data should be restricted to authorized processes and entities so as to prevent unauthorized modifications or exfiltration.

Verification of Known Good Critical Data: The device may consider its critical data "known good" if it successfully reboots with that data, ensuring its integrity before use.
Backup Triggering: Automatic backups should be made periodically, or backups can be initiated by the user or another trusted device.
Recovery to Factory Defaults: The RTRec (Runtime Recovery) or CTRec (Critical Recovery) must be capable of restoring critical data to factory defaults in case of corruption.
Recovery to Last Known Good State: The RTRec or CTRec should be capable of recovering critical data to the last known good state, ensuring the device can return to a secure and operational condition.
Policy and Recovery: Devices should not use critical data stored as policies to recover their own critical data, as this could reintroduce corrupted data during recovery. Symbiont devices can rely on policies provided by the host for recovery purposes.
Multiple Backups: If multiple backup copies are available, the RTRec or CTRec may allow the administrator or device to select which backup to restore.
Approval for Critical Data Replacement: If automatic detection of corrupted data is implemented, the RTRec or CTRec may seek approval from a host device or the user before replacing the corrupted critical data.
Administrator-Controlled Recovery: If recovery actions are not triggered by the RTD (Root of Trust for Detection) or CTD (Critical Trust for Detection), the platform administrator should be able to manually initiate the recovery process for critical data. Devices should support recovery through trusted devices, enabling the administrator to force recovery via trusted device chains.

Example Recovery Scenarios

The following are examples of states of integrity to which a platform may recover:

Last Known Good State: Recovery to a previously valid state.
Factory Defaults: A complete reset to the original, secure configuration.
Newest Firmware Version: Updating to the most current firmware to fix vulnerabilities or restore functionality.
Partial Repair: Restoring specific components or configurations instead of a full recovery.
Enterprise-Defined Starting Point: Recovery to a state defined by the enterprise, potentially using centralized management tools.

Algorithmic Recovery Approach

To simplify the recovery process, an algorithmic approach can be used to determine the best recovery state:

领英推荐

Best Practices for Data Security

ThoughtSol Infotech Pvt. Ltd 1 年前

How Styra Creates Business Value for Federal Agencies…

Mark Rogge 2 个月前

NIST Controls for Boundary Protection in SAP: A…

Selva Kumar 5 个月前

Most Recently Used (MRU): Attempt to recover to the last known good state. If this fails, proceed to the next available option.

Fallback Mechanism: If the last known state is unavailable, attempt recovery from earlier saved states, remote enterprise storage, or reset to factory defaults.

Event Logging The logging of events surrounding firmware and recovery processes is very important in helping with forensic analysis as well as providing an audit trail. These logs may be used to help monitor changes, confirm the integrity of updates, and ensure recovery activities are valid. Some of the key features of event logging are:

Key Goals of Event Logging

Forensic Analysis:

Capture Attack Information: Logging of firmware and recovery events allows the platform administrator to capture information that could help reveal an attack or a compromise.
Identify Vulnerabilities: Logs can provide clues as to whether the threat platform contained unknown security vulnerabilities or if the attack was part of a larger, more widespread threat. They will assist in identifying trends or patterns that are common attacks, making the future detection of threats in real-time or preventing future attacks easier.

Audit Trail:

Track Events: Logs serve as an audit trail, recording at what point in time specific events occurred, such as firmware updates or recovery actions.
Track Authorization: Logs include vital information regarding who authorized a change or recovery event and the date and time when the activity occurred. This provides accountability and can be applied to verify whether only authorized people performed the changes.
Maintain Integrity: Keeping a log of every action can trace if events such as updates or recovery processes were conducted correctly by manufacturers and administrators.

Considerations for Event Logging

Determine Necessary Logging:

Manufacturer's Responsibility: The platform and device manufacturers will decide the appropriate level of event logging based on their target customer base and the specific security needs of the platform.
Environmental Considerations: The environment in which the platform will operate (e.g., enterprise, consumer, or data center environments) will influence the scope and depth of logging. Different environments may have different regulatory requirements or security concerns.

Log Integrity:

Protect Log Data: The logged events must be stored in a way that ensures their integrity. This includes techniques such as digital signatures or hashing to prevent tampering or modification of the logs.
Secure Recovery and Transmission: Logs should be recoverable and transmitted securely, ensuring that the data is available when needed without exposing it to unauthorized parties during storage or transit.

Access Control:

Controlled Access to Logs: Unauthorized access to event logs can be harmful because malicious actors could analyze the logs to exploit weaknesses in the system. Therefore, access to event logs must be tightly controlled.
Limit Log Access: Only authorized personnel should be able to view or manage event logs, and mechanisms should be in place to restrict access based on roles or levels of trust.

Security of Event Log Data:

Prevent Data Exfiltration: Logs often contain valuable information about the system's state, updates, and recovery actions. If compromised, they could broaden the attack surface or reveal sensitive system details.
Encryption and Authentication: Events should be logged using encryption techniques to protect sensitive data. Authentication mechanisms should ensure that only legitimate devices or users can append entries to the logs.

Implementation Recommendations

For Original Equipment Manufacturers (OEMs) and Component Suppliers

Design for Security: Hardware-based security features in place through TPMs, HSMs. Firmware code developed using secure development practices.
Enable Secure Updates: Mechanisms for cryptographically verified firmware updates. Updates deliverable, applied securely.
Support Detection and Recovery: Implement integrity measurement and attestation. Tools and documentation for recovery operations.
Collaborate with Industry Standards: Standards like NIST SP 800-193. UEFI Secure Boot support and industry firmware security initiatives.

For System Administrators and Security Professionals

Acquire Secure Systems: Choose systems that include hardware-based security components. Verify that the vendor employs best practices in firmware security.
Verify Firmware Integrity: Regularly check firmware integrity using available tools. Analyze anomalies or unauthorized changes.
Install Updates Promptly: Install firmware updates when available. Verify the authenticity and integrity of updates prior to installation.
Plan for Recovery: Develop and test recovery procedures for firmware corruption scenarios. Keep secure copies of firmware and configuration data.

Management

When designing resilient platforms, vendors must account for a variety of management and control needs based on their target customers. This ensures that the platform can be administered securely in a manner that best serves the user. There are two primary forms of management: local and remote.

Local Management: In some cases, management may need to occur locally, requiring physical presence to make certain administrative changes.
Remote Management: Customers might expect the ability to manage the platform remotely, which may include controlling policies and configuration settings, remotely extracting log data, or monitoring security status. For more sensitive environments, some customers may want to restrict log data extraction to authorized local mechanisms only.

The platform should support flexibility in meeting these expectations, depending on the deployment environment (e.g., enterprise, consumer).

Previously

https://www.dhirubhai.net/pulse/platform-firmware-resilience-part-1-anupam-datta-1l3bc

Ankur Chakraborty

--Working as an AUTOMOTIVE, SDV , IOT , Cloud and application security Architect

1 个月

Ensuring TRUST in a ZERO TRUST ENVIRONMENT will be TRUSTWORTHY.??

要查看或添加评论，请登录

Anupam Datta的更多文章

Firmware Security and the Importance of Constant-Time Programming (Part - 2)

2025年3月13日

Firmware Security and the Importance of Constant-Time Programming (Part - 2)

Reproduced from - https://2wisebit.wordpress.
Firmware Security and the Importance of Constant-Time Programming (Part - 1)

2025年3月11日

Firmware Security and the Importance of Constant-Time Programming (Part - 1)

Reproduced from - https://2wisebit.wordpress.

1 条评论
Platform Firmware Resilience – (Part - 5)

2025年2月24日

Platform Firmware Resilience – (Part - 5)

Reproduced from - https://2wisebit.wordpress.
Platform Firmware Resilience – (Part - 4)

2025年2月20日

Platform Firmware Resilience – (Part - 4)

Reproduced from - https://2wisebit.wordpress.
Platform Firmware Resilience – (Part -3)

2025年2月14日

Platform Firmware Resilience – (Part -3)

Reproduced from - https://2wisebit.wordpress.
Platform Firmware Resilience – (Part -1)

2025年2月3日

Platform Firmware Resilience – (Part -1)

Reproduced from - https://2wisebit.wordpress.
Interrupt Handling and Latency in ARM TrustZone: Secure and Non-Secure World Management

2025年1月14日

Interrupt Handling and Latency in ARM TrustZone: Secure and Non-Secure World Management

Reproduced from - https://2wisebit.wordpress.
Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Secure World) (Part 4)

2025年1月13日

Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Secure World) (Part 4)

Reproduced from: https://2wisebit.wordpress.
Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Normal World) (Part 3)

2025年1月9日

Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Normal World) (Part 3)

Reproduced from - https://2wisebit.wordpress.
Monitor Mode in ARM Architecture (Context Switching) (Part 2)

2025年1月6日

Monitor Mode in ARM Architecture (Context Switching) (Part 2)

Reproduced from - https://2wisebit.wordpress.

See all articles

Platform Firmware Resilience – (Part -2)

Anupam Datta

Recovery Mechanisms

领英推荐

Implementation Recommendations

Management

Previously

Read More

Anupam Datta的更多文章

社区洞察

其他会员也浏览了

How Styra Eliminates Security Risks and Saves Millions by Solving CISA’s Product Security Bad Practices

Unveiling the Vulnerabilities: SAP ICM Under Scrutiny

SEC's New Regulations Impose SOX-like Urgency on ERP Systems, Now in the Cybersecurity Domain

Role of File Integrity Monitoring in the PCI Compliance Framework

NIS2 Compliance of the Power Platform - A small step into the right direction

How to balance real-time intelligence with system availability

The PCI DSS Compliance Lifecycle: A Head of Security's Quick Guide

Signing Keys: The Guardians of Trust

CrowdStrike Outage: Poor Software Releases Means It Will Happen Again

Cyber Security-Breakdown of the problems & Solutions in Today’s Business Environment

Recovery Mechanisms

领英推荐

Implementation Recommendations

Management

Previously

Read More

Anupam Datta的更多文章

Firmware Security and the Importance of Constant-Time Programming (Part - 2)

Firmware Security and the Importance of Constant-Time Programming (Part - 1)

Platform Firmware Resilience – (Part - 5)

Platform Firmware Resilience – (Part - 4)

Platform Firmware Resilience – (Part -3)

Platform Firmware Resilience – (Part -1)

Interrupt Handling and Latency in ARM TrustZone: Secure and Non-Secure World Management

Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Secure World) (Part 4)

Monitor Mode in ARM Architecture (Context Switching - Monitor Mode and Normal World) (Part 3)

Monitor Mode in ARM Architecture (Context Switching) (Part 2)

社区洞察

其他会员也浏览了

How Styra Eliminates Security Risks and Saves Millions by Solving CISA’s Product Security Bad Practices

Unveiling the Vulnerabilities: SAP ICM Under Scrutiny

SEC's New Regulations Impose SOX-like Urgency on ERP Systems, Now in the Cybersecurity Domain

Role of File Integrity Monitoring in the PCI Compliance Framework

NIS2 Compliance of the Power Platform - A small step into the right direction

How to balance real-time intelligence with system availability

The PCI DSS Compliance Lifecycle: A Head of Security's Quick Guide

Signing Keys: The Guardians of Trust

CrowdStrike Outage: Poor Software Releases Means It Will Happen Again

Cyber Security-Breakdown of the problems & Solutions in Today’s Business Environment