Systems Engineering, Artificial Intelligence, and Federal Acquisition of Foundation Models
Image Credit: Wikipedia

Systems Engineering, Artificial Intelligence, and Federal Acquisition of Foundation Models

The Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence from last October charged federal agencies with adopting AI as part of their policies and programs. Much thought has gone into what that means and how to accomplish it, often viewing AI as some sort of bespoke technology that sits outside current frameworks. I'd like to argue that it's really not that complicated, and boils down to more pedestrian issues in systems engineering and federal acquisition.

In this article I'm focusing on AI acquisition, and not topics like promoting commercial development of AI or regulating consequential applications of AI outside the federal government. See MITRE 's series of articles on that topic, such as our original framework paper:

And our paper focused on AI risk to critical infrastructure:


Back to the topic at hand: federal acquisition of AI and its connection to systems engineering.

Systems engineering is an interdisciplinary form of engineering that focuses on the design, development, testing, and operation of complex systems. Increasingly those systems are digital, and further those digital systems use intelligent algorithms. The latest wave of such algorithms fall under the umbrella of AI.

The first phase of systems engineering takes high-level objectives and translates them down to requirements, architecture, and ultimately system design. When it comes to AI, we need to understand the role that AI plays in our system.

First, let’s focus on enterprise use cases. In these scenarios we’re generally applying commercially-available AI tools to augment enterprise functions. This could include things like using ChatGPT as a research tool, using Microsoft Copilot to bring automation to Office 365, or augmenting software engineering systems using code autocomplete. Adoption of these tools in federal systems generally comes down to security and governance.

For security, the federal government needs AI services delivered from compliant infrastructure, such as a FedRAMP approved cloud. Many of these commercial services are not available from approved environments, making their use near term a non-starter for any tasks that involve sensitive information, such as CUI, PII, or PHI. However some vendors are working this issue, such as Microsoft Copilot roadmapped for availability in Government Community Cloud.

Governance is also an important component as well, particularly around how employees and contractors ensure that work product derived partially from AI has appropriate attribution, quality, and correctness. Existing quality control and citation standards should be extended to account for AI-derived or generated content.

The second broad use case is the role of AI in federal mission systems, with wide-ranging consequential applications, from intelligence analysis to financial fraud detection. These scenarios require the full application of systems engineering to integrate more bespoke AI models into trusted end-to-end systems. In many cases, these models are not commercially available and must be developed and trained to support the unique needs of the particular federal mission.

There are a few orders of magnitude in complexity when considering custom AI for a federal mission:

  1. Small Models - These are models that are generally millions of parameters and designed to do specific tasks, like identifying objects in an image or specific patterns in financial data. Typically we can define robust performance criteria for these models with well-defined test criteria. These models are often small enough that there can be rigorously-defined explainability to predict behavior. Model cards originally proposed by Google are used as part of the systems engineering process to document how models were designed and trained. As system complexity increases, the interaction between AI models and the system into which they are integrated is also important, provoking the need for a more holistic systems-level approach, and tools such as assurance cases that look at the whole system.
  2. Foundation Models - As you move into models with billions of parameters, the design, implementation, and test fundamentally changes. The current state of the art is built on autoregressive large language models (LLMs) and use embeddings to map non-text data (images, video, audio, etc) into multimodal models. These models are typically too large to apply test harnesses used by small models, and lack robust approaches to explainability. Consequently the assurance process relies more on alignment where the systems engineering process seeks to imbue values as part of the fine tuning process. As we have seen over the past year, starting with Sydney, this problem is hard and in the general case remains unsolved. Assured federal adoption of foundation models remains an open challenge, and one that MITRE is actively working on as part of our independent R&D program.
  3. Frontier Models - Big tech companies are releasing one to two new major models per year. OpenAI just released GPT-4o, and Google released Gemini 1.5 Pro. These models remain inaccessible for use by federal agencies for mission purposes because their novelty means they lack the needed testing for assured use in consequential applications. Additionally their proprietary nature makes them generally difficult to integrate into mission systems.


MITRE’s AI Assurance and Discovery Lab is the culmination of five years of investment in systems engineering assured small model adoption by the federal government. With our new partnership with 英伟达 we are expanding this lab to include our Federal AI Sandbox, built on top of an NVIDIA H100 SuperPod, providing the needed computational power to systems engineer foundation models.

Key lines of effort for systems engineering foundation models:

  • Concept of Operations (CONOPS) - There are a few different models for how a foundation model interacts with users and missions. Given experience using ChatGPT, most gravitate to a chat-bot metaphor. However inputs can be provided by APIs, fed from other digital systems. Additionally there is growing application of agentic AI where AI systems can task itself, other AIs, write and execute code, and interact with other digital systems via APIs. The first step in systems engineering a foundation model is to determine the CONOPS: Is it buried in a system interacting via APIs, or is it exposed to a user as an interactive teammate? How much agency does the AI have to not just answer questions, but also execute tasks?
  • Requirements and Architecture - Based on the CONOPS, the next step is to understand the structure of the needed model, and whether existing models can be adapted or a new model is needed. For example, Retrieval-Augmented Generation (RAG) LLMs are a common way to give an existing LLM access to a sensitive, domain-specific, and/or proprietary text-based data to authoritatively use as part of its operation. This is sufficient for many use cases and does not require fine tuning or retraining. Other scenarios may require the LLM operate over domain-specific structured or non-text data, which requires the architecture support a new embedding. This stage must resolve whether existing models are sufficient, whether fine tuning is required, or whether an off-the shelf model with RAG is sufficient. For example, a new model will require trillions of tokens of data (terabytes), and may not be practical, depending on the federal agency’s data holdings.
  • Design - With the CONOPS, requirements, and architecture specified, there is enough detail to design the system and how a foundation model will interact with it. The design should also include a prospective approach to alignment. For example, a RAG LLM that helps an intelligence analyst will need to be imbued with an understanding of intelligence authorities and analytic tradecraft, lest it improperly use data about US persons, or come to unsupported conclusions through non-rigorous techniques. Additionally, what AI performance is expected, along dimensions of robustness, reliability, security, bias, and ethics? For example, are you expecting the AI system to be as performant as a human? Or better?
  • Test and Evaluation (T&E) - Testing foundation models is challenging. This is a developing area where continued research is necessary. The academic community has focused on administering a combination of standardized tests or confronting models with moral dilemmas, sometimes causing existential crises that “jailbreak” models. While such tests can reveal certain elements of performance and anecdotally reveal failure modes, these approaches are far from comprehensive. The T&E field needs continued research investment, and this is something MITRE is actively working on as part of its independent R&D program.
  • System Validation - Not every AI failure mode or non-performant output need be addressed within the bounds of the AI itself. AI is deployed within a larger system and that larger system can be used to ensure appropriate guardrails are in place. Just as a procurement system is vulnerable to via social engineering and deepfakes, policies and procedures provide system-level checks and balances to detect and stop fraud. The same approach is needed for AI-enabled systems: the system should be designed to be stable even if there are AI failures, and AI failures should not lead to larger system failures. No AI will be perfect, but systems can still be performant.


At MITRE , through our AI Discovery and Assurance Lab, Federal AI Sandbox, and independent R&D program, we are actively exploring these end-to-end systems engineering issues. We can prototype different architectures and designs. We can train foundation models on the NVIDIA SuperPod to explore design tradeoffs.

While assurance for small models is now a reasonably-well developed field, adoption remains mixed and there is considerable work to do. Meanwhile we are still extremely early in assurance for foundation models, with a wide range of open research questions. The AI community can benefit from the rigor, process, and frameworks from the systems engineering community, and over the next decade these communities will continue to grow closer.

Mudit Agarwal

Head of IT ? Seasoned VP of Enterprise Business Technology ? Outcome Based Large Scale Business Transformation (CRM, ERP, Data, Security) ? KPI Driven Technology Roadmap

5 个月

Charles, Awesome! ??

回复
Vasile Coman

Three decades of consulting {Apple, Intel, Boeing...) in enterprise software, and management consulting. Developed the enterprise design field. Published papers on information theory. MS in Aeronautical Engineering.

5 个月

Charles, I want to make three simple observations regarding your classification related to "few orders of magnitude in complexity". First of all, complexity of a system has very little to do with the size of its model. The most important factor that determines its complexity is the system ability to process complex messages ["instructions"]. Let's take two messages. They are "shut off the light in the room" and "take this hundred dollar bill and buy/sell shares to double its value in one year". The system capable to successfully process them are orders of magnitude different in terms of ability. The second factor in determining the complexity of a system is dependent on the human factor contribution. Based on this principle, the most complex system ever built by humans is the one that processes a message that is always 1/2 bit in size. The only system task is to process the message "switch to ON". This system lives in a secret bunker somewhere in Montana. When this message is processed, it launches a rocket with extremely unpleasant consequences for everyone. The model behind this system is relatively simple, but the logic and the humans factor is what makes this message an extreme case in complexity.

回复

As the Systems’ Engineering and Development evolve with the integration of AI it seems like an opportunity to move security development earlier into the initial process. What would be needed to make it a reality?

回复
Woodley B. Preucil, CFA

Senior Managing Director

6 个月

Charles Clancy Very well-written & thought-provoking.

回复
Mark W.

For products and systems, security is a matter of engineering, not compliance

6 个月

Such an awful picture to depict systems engineering - a misinterpretation of the V. I could easily come up with at least a dozen better ones. FWIW Some MITRE staff have written a paper for an upcoming conference where we talk to the misinterpretation of the V - such as that diagram depicts, and how to recast the original intent for digital engineering equipped SE.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了