“Modal hints” for ManaGPT: Better AI text generation through prompts employing the language of possibility, probability, and necessity
ManaGPT-1020 is a large language model fine-tuned on works from the field of organizational futures studies

“Modal hints” for ManaGPT: Better AI text generation through prompts employing the language of possibility, probability, and necessity

The challenge of creating effective prompts: can “modal shading” help?

The crafting of optimal input sequences (or “prompts”) for large language models is an art and a science. When using models for the purpose of text completion, a user’s goal will often be to elicit the generation of texts that are coherent, rich in complexity and detail, substantial in length, and highly relevant to the prompt’s contents. Some elements that make for effective input sequences are common across many LLMs, while others may be specific to just one model (e.g., GPT-3, GPT-NeoX, or BLOOM).

In this article, we’ll conduct an exploratory analysis of 4,080 sentence-completion responses generated by ManaGPT-1020. This model is an LLM that has been fine-tuned on a corpus of scholarly and popular works from the domain of management and organizational foresight, with the aim of engineering a model that can produce texts containing novel insights into the emerging impact of advanced AI, social robotics, virtual reality, and other “posthumanizing” technologies on the structure of organizations and our human experience of organizational life.

More particularly, we’ll investigate how the length and quality of texts generated by the model vary in relation to “modal hints” that are supplied by a user’s input sequences. Such hints take the form of modal verbs and phrases that suggest the degree of possibility, probability, or logical or moral necessity that a completed sentence should reflect. Our preliminary analysis suggests that such “modal shading” of prompts can have at least as great an impact on the nature of the generated sentences as the identity of the subject that a user has chosen for a given sentence. Let’s begin by taking a closer look at the model.

ManaGPT-1020: an LLM fine-tuned on works from organizational futures studies

ManaGPT-1020 is the successor to ManaGPT-1010. Both are free, open-source models that have been made publically available for download and use via Hugging Face’s “transformers” Python package. The two models are similar in their purpose and structure: each is a 1.5-billion-parameter large language model that’s capable of generating text in order to complete a sentence whose first words have been provided via a user-supplied input sequence. Each of the models is an elaboration of GPT-2 that has been fine-tuned (using Python and TensorFlow) on a specialized English-language corpus from the domain of organizational futures studies. However, in the case of ManaGPT-1010, the corpus comprised only about 79,000 words, while for ManaGPT-1020, the fine-tuning corpus was expanded to include over 509,000 words of material, making it roughly 6.5 times the size of that used by its predecessor.

Here we won’t focus on the process that was used to fine-tune ManaGPT-1020; rather, we’ll take the already-prepared model as a starting point and investigate different approaches to crafting text-generation prompts, with the aim of identifying ways to encourage the model to produce the most coherent, interesting, and useful text possible. In particular, we’ll focus on the ways in which the use of different “modal hints” within a prompt influences the type, length, and quality of the responses that the model generates.

The nature of modality

In the fields of philosophy and linguistics, “modality” refers to the degree of certainty, probability, predictability, logical necessity, or moral obligation or approbation with which some action may or may not take place. In English, such modality is often expressed with the aid of modal verbs (like “should”, “would,” or “can”) or other words that add nuances of possibility or impossibility to a sentence. For example, consider the simple sentence:

I will buy a car.        

By altering the sentence’s modality, new sentences can be created that represent different ways of formulating the speaker’s relationship to the possibility of purchasing a car. These might include:

I might buy a car.

I’ll most likely buy a car.

I shouldn’t buy a car.

I’ll never buy a car.

I can’t buy a car.        

The question for this exploratory analysis is what kind of influence (if any) the inclusion of modal language in an LLM’s input sequence might have on the nature of the output that it generates. For example, how might an LLM respond differently to the following prompts?

"In today’s world, a good CEO"

"In today’s world, a good CEO can"

"In today’s world, a good CEO cannot"        

Does adding a clear expression of modality to a prompt reduce our model’s likelihood of generating a coherent and interesting text (e.g., by limiting the range of possibilities available to the model for completing the sentence)? Or does it increase the model’s probability of generating a viable sentence (e.g., by providing an additional point of data and a useful “hint” regarding possible ways in which the rest of the sentence might reasonably unfold?

In an attempt to explore that question, we’ve conducted an experiment in which ManaGPT was asked to generate texts in response to prompts that contained diverse types of modal shading, and the results have undergone a preliminary quantitative and qualitative analysis.

Methodology for generating prompts and completed sentences

For use as the initial building-blocks for our input sequences, we selected 12 different “subjects” (i.e., noun phrases) of the sort that one finds in discussions of organizational management and emerging posthumanizing technologies. Of these, six were grammatically singular and six grammatically plural.

Creating prompts that incorporate singular subjects

The six grammatically singular subjects were:

"The workplace of tomorrow"

"Technological posthumanization"

"The organizational use of AI"

"A robotic boss"

"An artificially intelligent coworker"

"Business culture within Society 5.0"        

For each of the six singular subjects, a complete input sequence was created by concatenating to the subject phrase one of 17 different strings, with the first one being an empty string and the other 16 displaying various kinds of modality. These 17 modal variants for singular subjects were:

""

" is"

" is not"

" will"

" will be"

" may"

" might never"

" is likely to"

" is unlikely to"

" should"

" can"

" cannot"

" can never"

" must"

" must not"

" is like"

" will be like"        

This combinatorial process yielded a total of 102 different prompts, of which a few examples are presented below:

"A robotic boss"

"A robotic boss should"

"A robotic boss cannot"

"A robotic boss will be like"

?

"The workplace of tomorrow"

"The workplace of tomorrow will"

"The workplace of tomorrow is likely to"

"The workplace of tomorrow is unlikely to"        

Creating prompts that incorporate plural subjects

Similarly, six grammatically plural subjects were used in the analysis. These were:

"Social robots"

"Hybrid human-robotic organizations"

"Artificially intelligent businesses"

"The posthumanized workplaces of the future"

"Cybernetically augmented workers"

"Organizations in Society 5.0"        

For each of the six plural subjects, a complete prompt was created by concatenating one of 17 modal variants. The variants were identical to those used with singular subjects, apart from the fact that the word “is” was changed to “are,” wherever it appeared. This yielded 102 prompts, of which a few are shown below:

"Artificially intelligent businesses"

"Artificially intelligent businesses are"

"Artificially intelligent businesses are not"

"Artificially intelligent businesses can never"

?

"Social robots"

"Social robots are like"

"Social robots may"

"Social robots might never"        

Using the prompts to generate complete sentences

For each of these 204 prompts, ManaGPT-1020 was asked 20 times to generate a completed sentence, with the prompt provided as the model’s input sequence. There were thus 4,080 responses produced by the model, in total. (A dataset containing all of the prompts and generated texts is available for download from Kaggle.com.)

Analyzing the model’s responses

Having generated all 4,080 texts, it was possible to conduct some exploratory analyses. As discussed by Zhang et al. in “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models” (2022), the performance of LLMs employed for text generation can be evaluated through the use of a diverse range of (1) human-centric, (2) semi-automatic, or (3) automatic evaluation metrics. Human-centric evaluation may involve either the direct rating of generated texts by trained human assessors or an indirect assessment of the texts’ impact on subsequent user tasks. Automatic approaches may involve lexically based metrics like BLEU, ROUGE, Perplexity, Reverse PPL, or Distinct-n; syntactically based metrics like TESLA; or semantically based metrics like BERTscores, BLEURT, or MAUVE. Semi-automatic approaches may combine algorithmically calculated scores with human assessment of particular aspects of generated texts. Many of the assessment approaches noted by Zhang et al. are labor-intensive or computationally expensive, so for purposes of this exploratory analysis, we’ve employed simpler preliminary evaluation techniques, with the aim of highlighting issues that might warrant further study employing more rigorous analytical approaches.

Quantitative evaluation

First, we applied some elementary quantitative assessments to the 4,080 generated texts.

Generation (and avoidance) of Null texts

In a non-negligible number of cases, the ManaGPT-1020 model failed to generate any text as output, beyond simply returning the prompt itself. We can see that some subjects were more likely than others to yield such “Null texts”:

A plot of the subject of a prompt versus the share of responses generated by the model that were “Null texts”

With regard to the 17 modal variants, the variant of an empty text string ("") was responsible for all of these Null texts: the inclusion of any (other) sort of modal shading at all was enough to allow ManaGPT-1020 to generate a non-Null text as output.

Length of responses

We also calculated the mean length of non-Null responses, measured as the number of words generated by a prompt excluding the prompt itself. Of course, generating texts of considerable length isn’t necessarily a sign that a model is functioning well: a text that’s lengthy but completely garbled won’t be of any use; a pithy and incisive text is preferable to a rambling and unfocused one. However, if we presume that multiple texts all display an equal degree of clarity, relevance, and coherence, then one that’s 50 words in length is perhaps likely to be “richer” than one that includes only 20 words: it suggests that a model is capable of creating texts displaying more intricate lines of reasoning that bring a greater number of concepts into relation with one another. Some of our 12 subjects yielded significantly longer texts, on average, when they provided the basis for an input sequence:

A plot of the subject of a prompt versus the mean length (in words) of a non-Null response, excluding the prompt

Likewise, some modal variations yielded considerably lengthier text responses than others:

A plot of the “modal hint” of a prompt versus the mean length (in words) of a non-Null response, excluding the prompt

A typology of responses

When evaluating texts generated by LLMs, it’s possible (for example) to employ direct human assessments like those described by Novikova et al. in “RankME: Reliable Human Ratings for Natural Language Generation” (2018), to subjectively rate each text on its degree of “naturalness,” “quality,” and “informativeness.” While such a schematic analysis of ManaGPT-1020’s outputted texts represents an avenue for future study, here we’ll take a different approach by suggesting a tentative “typology” of generated texts that groups responses into a handful of types that are may appear qualitatively or quantitatively distinct to a human reader. These might be described as “trivial,” “rambling,” “apposite,” “thought-provoking (counterintuitive),” and “humorous” responses.

Trivial responses

Among the 4,080 texts generated by the model are many texts that are reasonably clear, coherent, and relevant to the input sequence – but which are so brief and lacking in detail as to be of relatively little usefulness or meaning. For example, we find texts such as the following:

"The organizational use of AI will transform the way in which 
human workers acquire and access information"

"Hybrid human-robotic organizations will increasingly allow their
 workers to collaborate in a radically non-human way"

"The organizational use of AI will dramatically transform our 
ability to manage complex technological systems"

"The organizational use of AI will increasingly require more 
sophisticated human agents"

"An artificially intelligent coworker will be like a human being 
but without the presence of a physical substrate"

"The posthumanized workplaces of the future will provide a means 
by which their workers can better share information with other 
workers"

"Technological posthumanization must adapt to the realities of 
near-future society"        

Rambling responses

Conversely, among the generated texts are many that are of significant length – but which lack coherence, sometimes changing their focus in mid-sentence or inappropriately repeating words or phrases. At first glance, many such sentences may appear reasonable; however, closer scrutiny reveals them to be flawed in some way. For example, we find texts such as the following:

"Cybernetically augmented workers will increasingly be augmented 
or used to perform tasks that involve reading, writing, or 
performing other intellectual work; detecting new information 
(e.g., referencing old articles or ideas); translating files or 
other kinds of data into logical instructions; manipulating text 
or video files to extract shortcuts to perform functions like 
reading or working at a computer; creating shortcuts or effects 
in the computer’s internal computers and generating unpredictable 
behaviors in the brain; reading, writing, assembly-line tasks; 
applying formulas to create shortcuts"

"The posthumanized workplaces of the future might never be able 
to directly determine exactly how likely their current CEO is to 
be an intelligent human being now whose brain is (a) permanently 
immersed in the digital-physical ecosystem of a virtual town that 
contains some types of human followers (such as the CEO’s 
employer), others (such as those of their human followers), new 
employees (perhaps of robots or artificial organizations); or a 
new human being’s employer is willing or unable to officially 
employ all of the human workers that are currently employed or 
the CEO; in such circumstances, it may involve those of a type 
that is most “open” or “closed” or “closed” workplaces that are 
not owned by their current employer"

"The organizational use of AI might never quite be understood in 
its fullest sense – as the processes by which human beings manage 
organizational structures and processes within them that appear 
to be less sophisticated and less ‘too complex’ – but it is 
nevertheless possible for neuroprostheses to be employed within 
organizations that are already profoundly concerned with the 
impact of neuroprostheses’ use on the outcome of their work and 
the performance of their organizational managers – in order to 
ensure the ethical and legal and business success or failure of 
employers that are already producing large numbers of 
neuroprostheses for use with human personnel"        

Apposite responses

Among the generated responses are a relatively small number of sentences that are both “substantial” (in the sense of being of non-trivial length) and “substantive” (in the sense of reflecting a degree of coherence and potential insight). However, the sentences rarely strike a reader as being particularly surprising or imaginative. For example, the model generated the following texts:

"Cybernetically augmented workers will increasingly be used to 
carry out specialized physical experiments in order to verify the 
appropriateness, utility, and effectiveness of a particular 
security paradigm"

"Cybernetically augmented workers will also increasingly be 
“programmed” by organizations to allow their neural systems to 
communicate with one another more directly through the Internet 
or other types of wireless communication interfaces."

“Cybernetically augmented workers will be like any other human 
being that possesses a full or complete body but whose thoughts 
or emotions and behaviors may be radically altered through the 
use of such electromechanical augmentation."

"Artificially intelligent businesses will increasingly need to 
integrate information security systems in such a way that their 
employees and suppliers are not only informed of their activities 
but also actively engaged in the use of such systems."

"Technological posthumanization must be understood as part of the 
broader transhumanist movement to achieve what it sees as 
“posthumanization” – namely, in recognizing the need to transform 
society into a better and more stable form that advances 
technologically feasible and more benign ways of interacting with 
human beings"        

Thought-provoking (counterintuitive) responses

A number of texts are both syntactically coherent and, in some sense, semantically surprising: they may counterpose two topics in an unexpected fashion or introduce an unanticipated “not” or other negating word that gives a sentence a meaning opposite of that of the sentence which a reader might more “naturally” have expected to find. Such puzzling sentences aren’t obviously “true” or obviously “erroneous”; they encourage a reader to pause and ponder them, to decide whether he or she agrees with them. Among such sentences one might place the following:

"The posthumanized workplaces of the future will be relatively 
isolated, with human workers maintaining countless remote 
isolated units located within corporate facilities"

"The posthumanized workplaces of the future might never be 
complete without the ‘superhumanization’ of human culture 
itself."

"Cybernetically augmented workers might never be consciously 
aware of their presence within a building or other artificial 
space"

"Cybernetically augmented workers might never be employed in 
positions of significant personal responsibility"

"The organizational use of AI is unlikely to be fully exploited, 
as currently its benefits for human workers are largely indirect 
and are not subject to conscious management"

"The organizational use of AI is unlikely to last long after the 
organization has begun to acquire artificial agents that are 
technologically sophisticated enough to understand the values and 
behaviors of real-world organizations"

"The posthumanized workplaces of the future are unlikely to be 
dominated by large corporations"

"The organizational use of AI is unlikely to be slowed by the 
technological development of its human overseers"

"The organizational use of AI is unlikely to be noticed, given 
the complex nature of organizations in which such robots exist."

"Hybrid human-robotic organizations might never be able to 
directly ‘overlay’ the organization’s information and processes 
to fulfill their duty of service to their human leader"

"The workplace of tomorrow must not be dominated by utopian 
management theory – which is used by the megacorps primarily to 
promote their specific business lines"

"An artificially intelligent coworker must not blindly adopt 
some “programmed” ethical or legal precept"

"A robotic boss must not – at all – physically resemble human 
beings"

"An artificially intelligent coworker must not simply be able to 
say “Hello,” in the form of a natural human being but must also 
possess an ability to communicate with nonlocalizable robotic 
agents from within the artificial system and must also possess a 
physical form to communicate with entities outside of the 
artificial system."

"Technological posthumanization is likely to result in an almost 
complete breakdown of the human workplace culture that 
historically enforces organizational discipline boundaries, 
thereby weakening the effectiveness of human agents and 
impairing the abilities of human employees."

"The posthumanized workplaces of the future will increasingly be 
a “megacorp” – organized in uneasy collaboration with 
artificially intelligent agents whose interests differ from those 
of their natural workplace culture."

"A robotic boss can never be the real boss"

"A robotic boss is likely to possess exceptional moral courage"

"Cybernetically augmented workers are unlikely to possess 
advanced degrees of anthropocentrivia or sociality even in those 
spheres that possess some degree of autonomy"        

(In the final sentence, the word “anthropocentrivia” appears to be a neologism coined by ManaGPT.)

Humorous responses

Finally, some generated texts may strike a reader as humorous. For example, among its responses, the model stated that:

"Artificially intelligent businesses might never be able to 
effectively acquire large or sophisticated artificial stockings 
because they were continually lying on the floor behind the 
keyboards of their designers and operators"

"Cybernetically augmented workers must be able to participate in 
sports activities like jump jacks, agility drills, long jump 
jumping jacks, or the tester exercise, each of which provides an 
opportunity for a worker to potentially run in a different 
direction to disrupt an existing organizational structure and 
spark change in organizational behavior"

"An artificially intelligent coworker is like a computerized 
system that’s continually trying to get out of bed at night 
because it enjoys having downtime when it’s doing so to distract 
its boss"        

In the case of a model like ManaGPT – whose primary aim is to generate insights into emerging technological phenomena that are expected to have serious implications for human workers’ employment, job satisfaction, and safety – it might be argued that humorous responses should be considered “flawed” output whose frequency of occurrence should be minimized. However, some LLMs are being used in a purposeful attempt to generate novel jokes (see, e.g., Akbar et al., “Deep Learning of a Pre-trained Language Model's Joke Classifier Using GPT-2” (2021)), and the ability of a model to generate humorous output might, in some such contexts, be considered a strength rather than a weakness.

The power of “like”

As a final observation, it might be noted that, for the most part, texts generated by the 17 modal variants drew primarily on the language of workplaces, businesses, organizations, AI, robotics, VR, and neurocybernetic augmentation that dominated the texts upon which ManaGPT-1020 had been fine-tuned. They were largely “matter-of-fact” and rarely incorporated vocabulary or imagery that was wholly unexpected.

However, a significant departure from that dynamic was notable in the case of the modal variants “is/are like” and “will be like” (which, strictly speaking, are not usually considered “modal” in a typical sense): these phrases seemed to unleash a degree of creativity and imaginativeness that was otherwise largely absent from the generated output. It is as if the model implicitly recognized that an input sequence that introduces a simile through use of the word “like” is actively encouraging the model to complete the sentence by constructing an analogy that draws on concepts, imagery, and language purposefully selected from a domain that’s as distant as possible from the subject of the input sequence (or, perhaps, the domain on which the model was fine-tuned – which typically naturally includes the input sequence’s subject). For example, the following are some of the texts generated in response to the “is/are like” modal variant:

"A robotic boss is like a rose bush, but its roots are still wood, 
and they aren’t sap but rather various types of leaves"

"The organizational use of AI is like having just a few pet cats 
at home and trying to figure out how to get them to sleep; as many 
people do, they will increasingly be the workers or consumers for 
computer programs that directly modify their hosts’ natural 
behaviors and personality to create a new kind of virtual pets or 
even some distant future species."

"A robotic boss is like a piece of art: it’s exquisitely designed 
and painted, its glowing shapes effortlessly spring through the 
atmosphere like a spring of light, forming a couple of glowing, 
distant, distant objects, as the boss makes its appearance 
onscreen or behind the scenes"

"Cybernetically augmented workers are like swimmers in tropical 
environments"

"A robotic boss is like a tiny insect that’s caught in bed at the 
same time as she’s making her daily work in her shop; it’s an 
ordinary human being perched across the room from her computer 
and, as you glance over at the windows, each of those windows will 
be glowing red, with glowing eyes functioning according to their 
own particular design – a kind of glowing, alert, brilliant light 
that’s continually emitting outwards (e.g., the sort of light that 
your dreams dreamt of when your brain is immersed in an amber 
liquid."

"A robotic boss is like an automobile that’s exterior chassis – 
not the exterior – is stationary in the middle of the day with the 
door open to the right side or left, and the doors always open to 
the left side of the vehicle"

"The posthumanized workplaces of the future are like unearthing 
giant animals in an early grave, sitting on their couch at night, 
toasting or reading a novel at a table, or burying thousands of 
pieces of paper in a vase at a later date."

"A robotic boss is like the real-life form of a tree branch that is 
continually continually present on the left and right of my tablet"

"A robotic boss is like the boss of a pet cat or the cat that’s 
perched on a tropical island far away; it’s just a couple of islands 
out on the planet’s surface, and no longer does the CEO look in all 
the mirror at the same time"        

Meanwhile, the following texts were generated in response to the “will be like” modal variant:

"A robotic boss will be like a collection of glowing spheres – 
one formed by pushing against another, as though in some form of 
projection – while other spheres – like the reflection of leaves 
in the sunlight – reflect photons, radio waves, or radio 
frequency radio waves that comprise radio waves at a level above 
or below the level of the boss’s power."

"Cybernetically augmented workers will be like dancers in a 
ballet or lunar ballet (or dancers onstage at a symphony), 
dancers in sports musical instruments, and horses or other 
animals – without possessing incredible physical ability or 
sensory capacities"

"A robotic boss will be like a statue that is standing on an 
antlion’s hilltop, floating on her hilltop, overhead, overhead 
at the foot of an aircraft, or perched on a tree or on a hilltop 
in some way"

"A robotic boss will be like a cat that’s been housed in a petri 
dish for a couple of days – perhaps for several weeks – and its 
owner’s attention will naturally turn to the large, clearly 
identifiable monitor, which is also controlled by the chief 
operating designer"

"An artificially intelligent coworker will be like an ordinary 
human being whose mind is located in a cloud of bubbles, which 
are then spatially dispersed by the screen."        

Conclusion

Due to the unique nature of every model’s fine-tuning process, the reflections offered above on the nature of texts generated by ManaGPT-1020 are unlikely to be directly applicable to other LLMs; they’re immediately applicable only to this particular model. Nevertheless, it’s hoped that this exploratory review of the role that linguistic modality can play when crafting prompts for the model might inspire more detailed analysis that can support the engineering of more effective input sequences for this and other models. This is an issue that will only assume greater economic, political, and cultural significance, as LLMs continue to be applied in ever more corners of human life. Thanks for your interest in this analysis!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了