Part 2: SPEAR AI (Safe, Productive, Efficient, Accurate & Responsible)
Mark Montgomery
Founder & CEO of KYield. Pioneer in Artificial Intelligence, Data Physics and Knowledge Engineering.
I decided to break this paper into 3 parts for this Enterprise AI newsletter to make it more consumable (see part one here).
Working Paper Copyright ? 2023/2024 by Mark Montgomery
Working papers are in draft form. This working paper is distributed for purposes of comment and discussion. It may not be reproduced without permission of the copyright holder.
5. Socioeconomic Costs
The same causal factors that result in safety failures in LLMs also cause widespread bias, hallucinations, inaccuracies, and enable deepfakes and widespread misinformation. Each of these weaknesses either unintentionally cause socioeconomic costs, or can be exploited to attack democratic institutions and erode the socioeconomic fabric of society [27], [28], [29], [30],[31].
For example, in a recently revealed query about the Covid-19 Pandemic, ChatGPT returned a response about a New York Times article that was completely fabricated [32], including title and links. Such extreme misinformation could have profound negative impacts either through the chatbots directly or repeated on social networks, which occurs at massive scale on a continuous basis.
This type of misinformation is unique to a class of self-generating algorithms like LLMs that were until November 2022 limited to controlled research labs. The same technical vulnerabilities can lead to misinformation about politics, race, religion, companies, government, non-profits, and individuals, opening up very broad potential socioeconomic damage already occurring[33].
A group of DeepMind researchers published a good review of social risks from LLMs just under a year before ChatGPT was launched (Weidinger, Laura, et al.,). All of the six risk categories included in their paper have since been realized in LLM chatbots.
6. Impact on the Knowledge Economy
An area of AI research conspicuous in its absence investigates the potential damage to the knowledge economy from LLMs. While presumably in part due to the specialty disciplines of the researchers, and lack of depth in economics, the paucity of research also reflects the dominant influence of a few big-tech companies, so we must look to economics and business consulting to better understand these impacts [34], [35]. ??
Fritz Machlup was responsible for initially categorizing and measuring the knowledge economy (KE) in his 1962 report, The Production and Distribution of Knowledge in the United States, when he?estimated the knowledge economy represented 29 percent of U.S. GNP in 1958. The U.S. GNP is expected to be a bit over $23 trillion for 2023, so if the KE was still 29 percent of U.S. GNP, it would be $7-8 trillion. However, if employing similar criteria Machlup used in 1962 to today’s economy, the KE would be a much larger portion of the U.S. economy, clearly representing a majority—perhaps even a supermajority. So obviously any serious threat to the KE like that posed by unfettered LLM chatbots is a serious threat to the U.S. and its economy.
Peter Drucker coined the term knowledge worker in his book, The Landmarks of Tomorrow, published in 1959, where he also introduced the concept of post-modernism, referring to the era as a transitionary period between the exponential increase in labor productivity increase and a more productive and rewarding economy consisting mainly of knowledge workers. Forty years later Drucker published an article on knowledge worker productivity with a subtitle “The Biggest Challenge”, which provides a historic review of the study of labor leading to the post-modern era [36], with a specific focus on the contributions of Frederick Winslow Taylor [37]. Taylor had a profound impact on how people worked during the first half of the 20th century during which time a fifty-fold increase in productivity was achieved in manufacturing automation.
It is this work on knowledge worker productivity near the end of Drucker’s life when the dilemma facing employers and nations becomes obvious in the form of contradictions, conflicts, and misalignment of interests at the confluence of the knowledge economy and AI where we have been working for over three decades[6]. ?
“Knowledge workers are rapidly becoming the largest single group in the work force of every developed country. They may already compose two-fifths of the U.S. work force —and a still smaller but rapidly growing proportion of the work force of all other developed countries. It is on their productivity, above all, that the future prosperity—and indeed the future survival—of the developed economies will increasingly depend.” – Peter Drucker (1999).
Drucker estimated that our collective understanding of knowledge worker productivity in the year 2000 was similar to the understanding of manual workers in the year 1900. He suggests knowledge workers must have autonomy, learn continuously, that quality of output is at least as important as quantity, they should be treated as an asset rather than a cost, and that employers must protect that asset. Further, he teaches that innovation must be the responsibility of knowledge workers and that they must manage themselves.
Although he was not referring to AI specifically, Drucker understood the challenges of adoption well before the technical ability to increase knowledge worker productivity was achieved: “Each of these requirements…is almost the exact opposite of what is needed to increase the productivity of the manual worker.”
A quarter of a century later employers are faced with generative AI that is forecast to increase knowledge worker productivity representing $6-8 trillion annually, but also displace tens of millions of knowledge workers [38], possibly including the consultants making the forecasts today and their clients. It is perhaps unsurprising then that investment and adoption of generative AI has fallen well short of expectations [39].
7. Efforts to Improve LLMs
The decade preceding the commercialization of LLM chatbots experienced exponential growth in AI research, as evidenced by the number ofnbsp;attendees at conferences, papers published, graduate degrees, recruitment ads, and startups claiming to be AI companies. In the NeurIPS 2023 conference, for example, the number of papers accepted expanded from 411 in 2014 to 3,584 in 2023[2]. One encouraging sign is nearly twice as many papers were accepted in 2023 with the keyword “efficiency” than “generative”[3]. Focus areas for improving efficiencies are many, including data compression techniques, hardware innovations, fine-tuning parameters, data filtering and learning, improved model design, and new types of pre-training.nbsp;
7.1 Pre-training
One approach disclosed by a group of researchers (Lewis, Patrick, et al.) in a paper at the NeurIPS 2020 conference has since become widely deployed [40]. Retrieval-Augmented Generation (RAG) offers parametric and non-parametric memory components that are pre-trained and pre-loaded with extensive knowledge from external sources. RAG improved accuracy over state-of-the-art while reducing hallucinations, and is particularly beneficial when used in conjunction with an accurate, knowledge-intensive data source with frequent changes, such as Wikipedia tested by the researchers, news service, sensor monitoring, etc.nbsp;
7.2 Fine-Tuning
Finetuning of parameters is a common technique to improve efficiency in LLMs for specific data sets. LLM chatbots have general knowledge but are terribly inefficient and inaccurate. Finetuning for a specific domain or corpus has proven to improve results, but is compute and energy expensive, hence the focus on parameter-efficient methods. ?Fine-tuning for efficiency can reduce the entire workload, including computing, storage, and costly time for engineering.
One example is Low-Rank Adaptation (LoRA), which is a finetuning method that reduces memory by using a small set of trainable parameters. A team at Microsoft Research found they could reduce the number trainable parameters by 10,000x and GPU memory requirements by 3x with no additional inference latency [41].
Another example is FLAN (Finetuned Language Net) proposed by a group of Google researchers. FLAN is employed to improve so-called zero-shot learning to support classification without annotated data. FLAN is an instruction-tuned version of a decoder-only language model. It can be more efficient due to lack of needing annotated data. FLAN was tested and found to outperform zero-shot GPT-3 on most datasets and GPT-3 few-shot performance on a few datasets [42].
The downside to fine-tuning is while it can improve efficiency, lower compute costs and reduce the need for manually annotating data, the descriptions may be less precise than annotations in some cases.?
7.3 Simplifying Transformers
Complexity is one of many perverse incentives in LLMs. The larger the models and more functions introduced to improve those models; the more complexity created. ?Although complexity can be positive for top-tier talent in deep learning earning high six figure compensation packages, chip suppliers, and pay-for-what-you-use cloud services, it is not necessarily beneficial for customers or society, particularly when less effective than other methods and/or forced on customers and markets. One focus area by Bobby He and Thomas Hofmann at ETH is a method to reduce complexity in transformer blocks [43]. In experiments on both autoregressive decoder-only and BERT encoder-only models, the pair of researchers were able to accelerate throughput by 15% while using 15% fewer parameters. Although incremental in nature, the high energy and financial costs translate to significant savings.
7.4 Small Language Models
The scale needed for generalization in language models at this stage of technology evolution is what causes most of the risks and inaccuracies. Small language models (SLMs) combined with high quality data and precision data management offer a good alternative for many applications. SLMs provide the ability to perform many of the same functions as LLMs, including writing letters and reports, and can do so with higher levels of accuracy and security at 10% or less of the financial, environmental, social, and economic costs of LLMs[9]. Two significant SLMs released in 2023 include LLaMA by Meta AI and Phi by Microsoft Research.
LLaMA reduces memory usage and runtime with some methods, and then further improves training efficiencies with other methods, applying a variety of research techniques in classically incremental innovation for the purpose of efficiency in an open-source model [44]. The result was that LLaMA-13B outperformed GPT-3 while being more than 10x smaller. LLaMA is available in several sizes (7B, 13B, 33B, and 65B parameters). Since Meta AI trained on publicly available datasets that may include copyrighted content, and some datasets are of poor quality, although much more efficient and less costly than larger LMs, LLaMA is still problematic for applied AI. However, Meta AI performed a valuable service by demonstrating what is possible when resources are focused on efficiency rather than only scale.
Phi-1 is a transformer-based model with only 1.3B parameters trained on a combination of high-quality data from the web and high-quality synthetic data [45], yet were able to surpass most open-source models despite employing a model that is 10x smaller and a dataset that is 100x smaller. Although the Phi research team used GPT-4 to generate code and synthetic data, high-quality data is pre-existing in many organizations, and can be produced by employing other methods, including data science tools and human expert curation. The primary contribution of Phi was to build from the previous work on “TinyStories” (Eldan & Li) [46], and demonstrate the benefit of combining high quality data and refined LMs for organizations with smaller budgets while dramatically reducing computing needs, power and water consumption for generative AI functions.?
7.5 Hardware Improvements
The arms race in AI is perhaps nowhere as aggressive as hardware, complete with trade restrictions and large-scale industrial subsidies for semiconductors. Although hardware is slower to innovate than LLM models, once matured in the product line the efficiencies become more widespread.
谷歌 , 微软 , Meta , Amazon Web Services (AWS) , 苹果 , and Tesla have custom chip programs underway, presumably to reduce costs, ensure supply, and reduce reliance on industry leader Nvidia while also providing a proprietary advantage tailored to their tech stacks. AI chips are customized to the needs of each company and their products. ?All of the leading semiconductor companies are investing heavily in AI and several startups have gained traction.
Nvidia is the clear market leader in AI chips with a suite of products and enjoy strong leadership in cloud services with its GPUs. 英伟达 took an early lead by seizing the opportunity offered when its GPUs for gaming were discovered by Stanford researchers to be far more efficient for deep learning [47]. Their CUDA compute platform has become the preferred choice for a very large number of engineers and developers.
However, in response to the popularity, high cost of Nvidia’s chips, and inventory shortages, competition is rapidly emerging, which bodes well for improving efficiency and impact. AMD launched its MI300X chip as a direct competitor to Nvidia’s H100, and has early commitments from several large customers. Intel has responded with Gaudi 2, a competitor to H100, which outperforms H100 in some tasks[5], and is less expensive. Startup AI chip companies that have gained significant traction include SambaNova Systems , Cerebras Systems , and Groq . Many others have interesting research and IP.
One of the most important trends in hardware is so-called edge computing, or on devices that are much more energy efficient and secure than LLMs hosted on cloud infrastructure to date. Qualcomm for example offers the Neural Processing SDK for developers to run NN models on Snapdragon mobile platforms. Apple’s A12 Bionic is the first 7nm smartphone chip, containing 6.9bn transistors [48].?Apple has also been active in researching SMLs. Their UNO-EELBERT model is just 1.2 MB in size, but achieves General Language Understanding Evaluation (GLUE) benchmark?within 4% of a model almost 15x its size [49].???
领英推荐
7.6 Data Governance
Data Governance: “Organizations and their personnel defining, applying and monitoring the patterns of rules and authorities for directing the proper functioning of, and ensuring the accountability for, the entire life-cycle of data and algorithms within and across organizations.” (Janssen, Marijn, et al).
The underlying commonality causing the majority of risks and much of the costs in AI today, beyond scale combined with the nature of deep learning, is primarily the lack of robust data governance, data provenance, and data security. This is in part due to the ultra-emphasis on superintelligence in AI research driven by financial incentives, but also the personal motivation of scientists and engineers. Google researchers published a paper in 2001 (Sambasivan, Nithya, et al.), confirming what was already widely understood, which is that “Everyone wants to do the model work, not the data work” [50]. Incentives, motivation, and elite cultures notwithstanding, LLMs have not fundamentally altered information theory or the laws of physics. The quality of the output still substantially depends on the quality of the input.
Consumer LLM chatbot firms offer open access to anyone in the public without security protocols, methods, or procedures developed over decades for data governance, provenance, security, or digital rights management. The majority of data sets LLMs train on lack the sufficient data structure necessary to mitigate significant risks and costs to customers and society, including catastrophic risks. Poorly designed data management systems make robust data governance difficult, which “can have profound legal, financial and social implications on the organizations involved, citizens and businesses, and society at large” [51].
Footnotes
5. Paul Churnock, an engineer at Microsoft, recently estimated Nvidia's popular H100 chips alone require as much electricity as Phoenix, Arizona, the fifth largest city in the U.S. The current retail price of a H100 on 1/2024 is reported to be $100,000.
6. I operated a consulting firm that converted to a knowledge systems lab in the mid 1990s. We created several ventures including GWIN (Global Web Interactive Network), which was an experimental learning network for thought leaders. It was in 1997 during the operation of GWIN when I conceived the KYield theorem (yield management of knowledge), which is manifest as the KOS.
7. An article by Jacob Marks provides a summary of analytics for NeurIPS 2023.?
8. Given the volume, no attempt is made to provide a comprehensive review. Rather, a few brief examples are provided to inform about the type of research recently published on improving LLMs.
9. Our Synthetic Genius Machine (SGM) is a new type of SLM with a hybrid neurosymbolic architecture designed to provide knowledge compression and security for efficient domain-specific acceleration of discovery. Since acceleration of discovery enables bad actors as well as good, we made the decision to restrict disclosures in our research. Although we lack the market power of big tech to incentivize sharing selective research and the resources to commercialize the SGM, it represents our second generation of R&D. Our intent is to introduce elements of the SGM to the KOS once it can be rigorously tested in a safe and secure manner.
10. Hugging Face tested Gaudi2 and Nvidia A100, finding Gaudi2 latencies were x2.84 faster than Nvidia A100 (2.63s versus 0.925s). Databricks also performed tests on Gaudi2, finding it had better training and inference performance-per-dollar than Nvidia chips, including the H100. Nvidia is expected to ship its H200 in Q2 2024.
References
[27] Schramowski, Patrick, et al. "Large pre-trained language models contain human-like biases of what is right and wrong to do."?Nature Machine Intelligence?4.3 (2022): 258-268. https://arxiv.org/pdf/2103.11790.pdf
[28] Rawte, Vipula, Amit Sheth, and Amitava Das. "A survey of hallucination in large foundation models."?arXiv preprint arXiv:2309.05922?(2023). https://arxiv.org/pdf/2309.05922.pdf
[29] Nguyen, Thanh Thi, et al. "Deep learning for deepfakes creation and detection: A survey."?Computer Vision and Image Understanding?223 (2022): 103525. https://arxiv.org/pdf/1909.11573.pdf
[30] Melissa Heikkil? . “Three ways AI chatbots are a security disaster.” MIT Tech Review. (2023). https://www.technologyreview.com/2023/04/03/1070893/three-ways-ai-chatbots-are-a-security-disaster/
[31] Bender, Emily M., et al. "On the dangers of stochastic parrots: Can language models be too big???."?Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. (2021). https://dl.acm.org/doi/pdf/10.1145/3442188.3445922
[32] New York Times Company V. Microsoft, OpenAI, Inc., et al. (2023). https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf
[33] Blodgett, Su Lin, et al. "Stereotyping Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets."?Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021. https://aclanthology.org/2021.acl-long.81.pdf
[34] John Burn-Murdoch. “Here’s what we know about generative AI’s impact on white collar work”. The Financial Times. (2023). https://www.ft.com/content/b2928076-5c52-43e9-8872-08fda2aa2fcf
[35] Dell'Acqua, Fabrizio, et al. "Navigating the jagged technological frontier: Field experimental evidence of the effects of AI on knowledge worker productivity and quality."?Harvard Business School Technology & Operations Mgt. Unit Working Paper?24-013 (2023). https://www.hbs.edu/ris/Publication%20Files/24-013_d9b45b68-9e74-42d6-a1c6-c72fb70c7282.pdf
[36] Drucker, Peter F. "Knowledge-worker productivity: The biggest challenge."?California management review?41.2 (1999): 79-94. https://www.iriscrm.com/app/uploads/2021/05/knowledge_workers_the_biggest_challenge.pdf ?
[37] Taylor, Frederick Winslow.?The principles of scientific management. Harper & brothers. (1919)
[38] Chui, Michael, et al. "The economic potential of generative AI." (2023). https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/the-economic-potential-of-generative-ai-the-next-productivity-frontier
[39] “What happened to the artificial-intelligence investment boom?” The Economist. (2024). https://www.economist.com/finance-and-economics/2024/01/07/what-happened-to-the-artificial-intelligence-investment-boom
[40] Lewis, Patrick, et al. "Retrieval-augmented generation for knowledge-intensive nlp tasks."?Advances in Neural Information Processing Systems?33 (2020): 9459-9474. https://proceedings.neurips.cc/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf
[41] Hu, Edward J., et al. "Lora: Low-rank adaptation of large language models."?arXiv preprint arXiv:2106.09685?(2021). https://arxiv.org/pdf/2106.09685.pdf ???
[42] Wei, Jason, et al. "Finetuned language models are zero-shot learners."?arXiv preprint arXiv:2109.01652?(2021).https://arxiv.org/pdf/2109.01652.pdf
[43] Bobby He and Thomas Hofmann. “Simplifying Transformer Blocks.” ArXiv. (2023). https://arxiv.org/pdf/2311.01906.pdf
[44] Touvron, Hugo, et al. "Llama: Open and efficient foundation language models."?arXiv preprint arXiv:2302.13971?(2023). https://arxiv.org/pdf/2302.13971.pdf
[45] Gunasekar, Suriya, et al. "Textbooks Are All You Need."?arXiv preprint arXiv:2306.11644?(2023). https://arxiv.org/pdf/2306.11644.pdf
[46] Eldan, Ronen, and Yuanzhi Li. "TinyStories: How Small Can Language Models Be and Still Speak Coherent English?."?arXiv preprint arXiv:2305.07759?(2023). https://arxiv.org/pdf/2305.07759.pdf
[47] Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. "Large-scale deep unsupervised learning using graphics processors."?Proceedings of the 26th annual international conference on machine learning. (2009). https://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf
[48] ?Tanya Singh. “Top AI Chip-Making Companies For Smartphones”. Mobile App Daily. (2024). ?https://www.mobileappdaily.com/knowledge-hub/ai-chip-making-companies-for-smartphones
[49] Cohn, Gabrielle, et al. "EELBERT: Tiny Models through Dynamic Embeddings."?arXiv preprint arXiv:2310.20144(2023). https://arxiv.org/pdf/2310.20144.pdf
[50] Sambasivan, Nithya, et al. “Everyone wants to do the model work, not the data work”: Data Cascades in High-Stakes AI."?proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. (2021). https://research.google/pubs/everyone-wants-to-do-the-model-work-not-the-data-work-data-cascades-in-high-stakes-ai/
[51] Janssen, Marijn, et al. "Data governance: Organizing data for trustworthy Artificial Intelligence."?Government Information Quarterly?37.3 (2020): 101493. https://repositorium.sdum.uminho.pt/bitstream/1822/69192/1/JBEBJ20.pdf
?
Great initiative breaking down your SPEAR AI working paper into consumable parts! ?? Ellen DeGeneres once said, "Sometimes the greatest things are the most unusual." Your approach in discussing LLMs and showcasing your KOS system is truly innovative. ?? If you're interested, there's also an upcoming opportunity related to setting a Guinness World Record for Tree Planting that could use the support of forward-thinking initiatives like yours. Feel free to explore: https://bit.ly/TreeGuinnessWorldRecord ??.
Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan
10 个月Thanks for Sharing.