“Modal hints” for ManaGPT: Better AI text generation through prompts employing the language of possibility, probability, and necessity
Matthew Gladden
AI and Analytics Product Owner ? AI Architect ? Researcher ? Game Developer
The challenge of creating effective prompts: can “modal shading” help?
The crafting of optimal input sequences (or “prompts”) for large language models is an art and a science. When using models for the purpose of text completion, a user’s goal will often be to elicit the generation of texts that are coherent, rich in complexity and detail, substantial in length, and highly relevant to the prompt’s contents. Some elements that make for effective input sequences are common across many LLMs, while others may be specific to just one model (e.g., GPT-3, GPT-NeoX, or BLOOM).
In this article, we’ll conduct an exploratory analysis of 4,080 sentence-completion responses generated by ManaGPT-1020. This model is an LLM that has been fine-tuned on a corpus of scholarly and popular works from the domain of management and organizational foresight, with the aim of engineering a model that can produce texts containing novel insights into the emerging impact of advanced AI, social robotics, virtual reality, and other “posthumanizing” technologies on the structure of organizations and our human experience of organizational life.
More particularly, we’ll investigate how the length and quality of texts generated by the model vary in relation to “modal hints” that are supplied by a user’s input sequences. Such hints take the form of modal verbs and phrases that suggest the degree of possibility, probability, or logical or moral necessity that a completed sentence should reflect. Our preliminary analysis suggests that such “modal shading” of prompts can have at least as great an impact on the nature of the generated sentences as the identity of the subject that a user has chosen for a given sentence. Let’s begin by taking a closer look at the model.
ManaGPT-1020: an LLM fine-tuned on works from organizational futures studies
ManaGPT-1020 is the successor to ManaGPT-1010. Both are free, open-source models that have been made publically available for download and use via Hugging Face’s “transformers” Python package. The two models are similar in their purpose and structure: each is a 1.5-billion-parameter large language model that’s capable of generating text in order to complete a sentence whose first words have been provided via a user-supplied input sequence. Each of the models is an elaboration of GPT-2 that has been fine-tuned (using Python and TensorFlow) on a specialized English-language corpus from the domain of organizational futures studies. However, in the case of ManaGPT-1010, the corpus comprised only about 79,000 words, while for ManaGPT-1020, the fine-tuning corpus was expanded to include over 509,000 words of material, making it roughly 6.5 times the size of that used by its predecessor.
Here we won’t focus on the process that was used to fine-tune ManaGPT-1020; rather, we’ll take the already-prepared model as a starting point and investigate different approaches to crafting text-generation prompts, with the aim of identifying ways to encourage the model to produce the most coherent, interesting, and useful text possible. In particular, we’ll focus on the ways in which the use of different “modal hints” within a prompt influences the type, length, and quality of the responses that the model generates.
The nature of modality
In the fields of philosophy and linguistics, “modality” refers to the degree of certainty, probability, predictability, logical necessity, or moral obligation or approbation with which some action may or may not take place. In English, such modality is often expressed with the aid of modal verbs (like “should”, “would,” or “can”) or other words that add nuances of possibility or impossibility to a sentence. For example, consider the simple sentence:
I will buy a car.
By altering the sentence’s modality, new sentences can be created that represent different ways of formulating the speaker’s relationship to the possibility of purchasing a car. These might include:
I might buy a car.
I’ll most likely buy a car.
I shouldn’t buy a car.
I’ll never buy a car.
I can’t buy a car.
The question for this exploratory analysis is what kind of influence (if any) the inclusion of modal language in an LLM’s input sequence might have on the nature of the output that it generates. For example, how might an LLM respond differently to the following prompts?
"In today’s world, a good CEO"
"In today’s world, a good CEO can"
"In today’s world, a good CEO cannot"
Does adding a clear expression of modality to a prompt reduce our model’s likelihood of generating a coherent and interesting text (e.g., by limiting the range of possibilities available to the model for completing the sentence)? Or does it increase the model’s probability of generating a viable sentence (e.g., by providing an additional point of data and a useful “hint” regarding possible ways in which the rest of the sentence might reasonably unfold?
In an attempt to explore that question, we’ve conducted an experiment in which ManaGPT was asked to generate texts in response to prompts that contained diverse types of modal shading, and the results have undergone a preliminary quantitative and qualitative analysis.
Methodology for generating prompts and completed sentences
For use as the initial building-blocks for our input sequences, we selected 12 different “subjects” (i.e., noun phrases) of the sort that one finds in discussions of organizational management and emerging posthumanizing technologies. Of these, six were grammatically singular and six grammatically plural.
Creating prompts that incorporate singular subjects
The six grammatically singular subjects were:
"The workplace of tomorrow"
"Technological posthumanization"
"The organizational use of AI"
"A robotic boss"
"An artificially intelligent coworker"
"Business culture within Society 5.0"
For each of the six singular subjects, a complete input sequence was created by concatenating to the subject phrase one of 17 different strings, with the first one being an empty string and the other 16 displaying various kinds of modality. These 17 modal variants for singular subjects were:
""
" is"
" is not"
" will"
" will be"
" may"
" might never"
" is likely to"
" is unlikely to"
" should"
" can"
" cannot"
" can never"
" must"
" must not"
" is like"
" will be like"
This combinatorial process yielded a total of 102 different prompts, of which a few examples are presented below:
"A robotic boss"
"A robotic boss should"
"A robotic boss cannot"
"A robotic boss will be like"
?
"The workplace of tomorrow"
"The workplace of tomorrow will"
"The workplace of tomorrow is likely to"
"The workplace of tomorrow is unlikely to"
Creating prompts that incorporate plural subjects
Similarly, six grammatically plural subjects were used in the analysis. These were:
"Social robots"
"Hybrid human-robotic organizations"
"Artificially intelligent businesses"
"The posthumanized workplaces of the future"
"Cybernetically augmented workers"
"Organizations in Society 5.0"
For each of the six plural subjects, a complete prompt was created by concatenating one of 17 modal variants. The variants were identical to those used with singular subjects, apart from the fact that the word “is” was changed to “are,” wherever it appeared. This yielded 102 prompts, of which a few are shown below:
"Artificially intelligent businesses"
"Artificially intelligent businesses are"
"Artificially intelligent businesses are not"
"Artificially intelligent businesses can never"
?
"Social robots"
"Social robots are like"
"Social robots may"
"Social robots might never"
Using the prompts to generate complete sentences
For each of these 204 prompts, ManaGPT-1020 was asked 20 times to generate a completed sentence, with the prompt provided as the model’s input sequence. There were thus 4,080 responses produced by the model, in total. (A dataset containing all of the prompts and generated texts is available for download from Kaggle.com.)
Analyzing the model’s responses
Having generated all 4,080 texts, it was possible to conduct some exploratory analyses. As discussed by Zhang et al. in “A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models” (2022), the performance of LLMs employed for text generation can be evaluated through the use of a diverse range of (1) human-centric, (2) semi-automatic, or (3) automatic evaluation metrics. Human-centric evaluation may involve either the direct rating of generated texts by trained human assessors or an indirect assessment of the texts’ impact on subsequent user tasks. Automatic approaches may involve lexically based metrics like BLEU, ROUGE, Perplexity, Reverse PPL, or Distinct-n; syntactically based metrics like TESLA; or semantically based metrics like BERTscores, BLEURT, or MAUVE. Semi-automatic approaches may combine algorithmically calculated scores with human assessment of particular aspects of generated texts. Many of the assessment approaches noted by Zhang et al. are labor-intensive or computationally expensive, so for purposes of this exploratory analysis, we’ve employed simpler preliminary evaluation techniques, with the aim of highlighting issues that might warrant further study employing more rigorous analytical approaches.
Quantitative evaluation
First, we applied some elementary quantitative assessments to the 4,080 generated texts.
领英推荐
Generation (and avoidance) of Null texts
In a non-negligible number of cases, the ManaGPT-1020 model failed to generate any text as output, beyond simply returning the prompt itself. We can see that some subjects were more likely than others to yield such “Null texts”:
With regard to the 17 modal variants, the variant of an empty text string ("") was responsible for all of these Null texts: the inclusion of any (other) sort of modal shading at all was enough to allow ManaGPT-1020 to generate a non-Null text as output.
Length of responses
We also calculated the mean length of non-Null responses, measured as the number of words generated by a prompt excluding the prompt itself. Of course, generating texts of considerable length isn’t necessarily a sign that a model is functioning well: a text that’s lengthy but completely garbled won’t be of any use; a pithy and incisive text is preferable to a rambling and unfocused one. However, if we presume that multiple texts all display an equal degree of clarity, relevance, and coherence, then one that’s 50 words in length is perhaps likely to be “richer” than one that includes only 20 words: it suggests that a model is capable of creating texts displaying more intricate lines of reasoning that bring a greater number of concepts into relation with one another. Some of our 12 subjects yielded significantly longer texts, on average, when they provided the basis for an input sequence:
Likewise, some modal variations yielded considerably lengthier text responses than others:
A typology of responses
When evaluating texts generated by LLMs, it’s possible (for example) to employ direct human assessments like those described by Novikova et al. in “RankME: Reliable Human Ratings for Natural Language Generation” (2018), to subjectively rate each text on its degree of “naturalness,” “quality,” and “informativeness.” While such a schematic analysis of ManaGPT-1020’s outputted texts represents an avenue for future study, here we’ll take a different approach by suggesting a tentative “typology” of generated texts that groups responses into a handful of types that are may appear qualitatively or quantitatively distinct to a human reader. These might be described as “trivial,” “rambling,” “apposite,” “thought-provoking (counterintuitive),” and “humorous” responses.
Trivial responses
Among the 4,080 texts generated by the model are many texts that are reasonably clear, coherent, and relevant to the input sequence – but which are so brief and lacking in detail as to be of relatively little usefulness or meaning. For example, we find texts such as the following:
"The organizational use of AI will transform the way in which
human workers acquire and access information"
"Hybrid human-robotic organizations will increasingly allow their
workers to collaborate in a radically non-human way"
"The organizational use of AI will dramatically transform our
ability to manage complex technological systems"
"The organizational use of AI will increasingly require more
sophisticated human agents"
"An artificially intelligent coworker will be like a human being
but without the presence of a physical substrate"
"The posthumanized workplaces of the future will provide a means
by which their workers can better share information with other
workers"
"Technological posthumanization must adapt to the realities of
near-future society"
Rambling responses
Conversely, among the generated texts are many that are of significant length – but which lack coherence, sometimes changing their focus in mid-sentence or inappropriately repeating words or phrases. At first glance, many such sentences may appear reasonable; however, closer scrutiny reveals them to be flawed in some way. For example, we find texts such as the following:
"Cybernetically augmented workers will increasingly be augmented
or used to perform tasks that involve reading, writing, or
performing other intellectual work; detecting new information
(e.g., referencing old articles or ideas); translating files or
other kinds of data into logical instructions; manipulating text
or video files to extract shortcuts to perform functions like
reading or working at a computer; creating shortcuts or effects
in the computer’s internal computers and generating unpredictable
behaviors in the brain; reading, writing, assembly-line tasks;
applying formulas to create shortcuts"
"The posthumanized workplaces of the future might never be able
to directly determine exactly how likely their current CEO is to
be an intelligent human being now whose brain is (a) permanently
immersed in the digital-physical ecosystem of a virtual town that
contains some types of human followers (such as the CEO’s
employer), others (such as those of their human followers), new
employees (perhaps of robots or artificial organizations); or a
new human being’s employer is willing or unable to officially
employ all of the human workers that are currently employed or
the CEO; in such circumstances, it may involve those of a type
that is most “open” or “closed” or “closed” workplaces that are
not owned by their current employer"
"The organizational use of AI might never quite be understood in
its fullest sense – as the processes by which human beings manage
organizational structures and processes within them that appear
to be less sophisticated and less ‘too complex’ – but it is
nevertheless possible for neuroprostheses to be employed within
organizations that are already profoundly concerned with the
impact of neuroprostheses’ use on the outcome of their work and
the performance of their organizational managers – in order to
ensure the ethical and legal and business success or failure of
employers that are already producing large numbers of
neuroprostheses for use with human personnel"
Apposite responses
Among the generated responses are a relatively small number of sentences that are both “substantial” (in the sense of being of non-trivial length) and “substantive” (in the sense of reflecting a degree of coherence and potential insight). However, the sentences rarely strike a reader as being particularly surprising or imaginative. For example, the model generated the following texts:
"Cybernetically augmented workers will increasingly be used to
carry out specialized physical experiments in order to verify the
appropriateness, utility, and effectiveness of a particular
security paradigm"
"Cybernetically augmented workers will also increasingly be
“programmed” by organizations to allow their neural systems to
communicate with one another more directly through the Internet
or other types of wireless communication interfaces."
“Cybernetically augmented workers will be like any other human
being that possesses a full or complete body but whose thoughts
or emotions and behaviors may be radically altered through the
use of such electromechanical augmentation."
"Artificially intelligent businesses will increasingly need to
integrate information security systems in such a way that their
employees and suppliers are not only informed of their activities
but also actively engaged in the use of such systems."
"Technological posthumanization must be understood as part of the
broader transhumanist movement to achieve what it sees as
“posthumanization” – namely, in recognizing the need to transform
society into a better and more stable form that advances
technologically feasible and more benign ways of interacting with
human beings"
Thought-provoking (counterintuitive) responses
A number of texts are both syntactically coherent and, in some sense, semantically surprising: they may counterpose two topics in an unexpected fashion or introduce an unanticipated “not” or other negating word that gives a sentence a meaning opposite of that of the sentence which a reader might more “naturally” have expected to find. Such puzzling sentences aren’t obviously “true” or obviously “erroneous”; they encourage a reader to pause and ponder them, to decide whether he or she agrees with them. Among such sentences one might place the following:
"The posthumanized workplaces of the future will be relatively
isolated, with human workers maintaining countless remote
isolated units located within corporate facilities"
"The posthumanized workplaces of the future might never be
complete without the ‘superhumanization’ of human culture
itself."
"Cybernetically augmented workers might never be consciously
aware of their presence within a building or other artificial
space"
"Cybernetically augmented workers might never be employed in
positions of significant personal responsibility"
"The organizational use of AI is unlikely to be fully exploited,
as currently its benefits for human workers are largely indirect
and are not subject to conscious management"
"The organizational use of AI is unlikely to last long after the
organization has begun to acquire artificial agents that are
technologically sophisticated enough to understand the values and
behaviors of real-world organizations"
"The posthumanized workplaces of the future are unlikely to be
dominated by large corporations"
"The organizational use of AI is unlikely to be slowed by the
technological development of its human overseers"
"The organizational use of AI is unlikely to be noticed, given
the complex nature of organizations in which such robots exist."
"Hybrid human-robotic organizations might never be able to
directly ‘overlay’ the organization’s information and processes
to fulfill their duty of service to their human leader"
"The workplace of tomorrow must not be dominated by utopian
management theory – which is used by the megacorps primarily to
promote their specific business lines"
"An artificially intelligent coworker must not blindly adopt
some “programmed” ethical or legal precept"
"A robotic boss must not – at all – physically resemble human
beings"
"An artificially intelligent coworker must not simply be able to
say “Hello,” in the form of a natural human being but must also
possess an ability to communicate with nonlocalizable robotic
agents from within the artificial system and must also possess a
physical form to communicate with entities outside of the
artificial system."
"Technological posthumanization is likely to result in an almost
complete breakdown of the human workplace culture that
historically enforces organizational discipline boundaries,
thereby weakening the effectiveness of human agents and
impairing the abilities of human employees."
"The posthumanized workplaces of the future will increasingly be
a “megacorp” – organized in uneasy collaboration with
artificially intelligent agents whose interests differ from those
of their natural workplace culture."
"A robotic boss can never be the real boss"
"A robotic boss is likely to possess exceptional moral courage"
"Cybernetically augmented workers are unlikely to possess
advanced degrees of anthropocentrivia or sociality even in those
spheres that possess some degree of autonomy"
(In the final sentence, the word “anthropocentrivia” appears to be a neologism coined by ManaGPT.)
Humorous responses
Finally, some generated texts may strike a reader as humorous. For example, among its responses, the model stated that:
"Artificially intelligent businesses might never be able to
effectively acquire large or sophisticated artificial stockings
because they were continually lying on the floor behind the
keyboards of their designers and operators"
"Cybernetically augmented workers must be able to participate in
sports activities like jump jacks, agility drills, long jump
jumping jacks, or the tester exercise, each of which provides an
opportunity for a worker to potentially run in a different
direction to disrupt an existing organizational structure and
spark change in organizational behavior"
"An artificially intelligent coworker is like a computerized
system that’s continually trying to get out of bed at night
because it enjoys having downtime when it’s doing so to distract
its boss"
In the case of a model like ManaGPT – whose primary aim is to generate insights into emerging technological phenomena that are expected to have serious implications for human workers’ employment, job satisfaction, and safety – it might be argued that humorous responses should be considered “flawed” output whose frequency of occurrence should be minimized. However, some LLMs are being used in a purposeful attempt to generate novel jokes (see, e.g., Akbar et al., “Deep Learning of a Pre-trained Language Model's Joke Classifier Using GPT-2” (2021)), and the ability of a model to generate humorous output might, in some such contexts, be considered a strength rather than a weakness.
The power of “like”
As a final observation, it might be noted that, for the most part, texts generated by the 17 modal variants drew primarily on the language of workplaces, businesses, organizations, AI, robotics, VR, and neurocybernetic augmentation that dominated the texts upon which ManaGPT-1020 had been fine-tuned. They were largely “matter-of-fact” and rarely incorporated vocabulary or imagery that was wholly unexpected.
However, a significant departure from that dynamic was notable in the case of the modal variants “is/are like” and “will be like” (which, strictly speaking, are not usually considered “modal” in a typical sense): these phrases seemed to unleash a degree of creativity and imaginativeness that was otherwise largely absent from the generated output. It is as if the model implicitly recognized that an input sequence that introduces a simile through use of the word “like” is actively encouraging the model to complete the sentence by constructing an analogy that draws on concepts, imagery, and language purposefully selected from a domain that’s as distant as possible from the subject of the input sequence (or, perhaps, the domain on which the model was fine-tuned – which typically naturally includes the input sequence’s subject). For example, the following are some of the texts generated in response to the “is/are like” modal variant:
"A robotic boss is like a rose bush, but its roots are still wood,
and they aren’t sap but rather various types of leaves"
"The organizational use of AI is like having just a few pet cats
at home and trying to figure out how to get them to sleep; as many
people do, they will increasingly be the workers or consumers for
computer programs that directly modify their hosts’ natural
behaviors and personality to create a new kind of virtual pets or
even some distant future species."
"A robotic boss is like a piece of art: it’s exquisitely designed
and painted, its glowing shapes effortlessly spring through the
atmosphere like a spring of light, forming a couple of glowing,
distant, distant objects, as the boss makes its appearance
onscreen or behind the scenes"
"Cybernetically augmented workers are like swimmers in tropical
environments"
"A robotic boss is like a tiny insect that’s caught in bed at the
same time as she’s making her daily work in her shop; it’s an
ordinary human being perched across the room from her computer
and, as you glance over at the windows, each of those windows will
be glowing red, with glowing eyes functioning according to their
own particular design – a kind of glowing, alert, brilliant light
that’s continually emitting outwards (e.g., the sort of light that
your dreams dreamt of when your brain is immersed in an amber
liquid."
"A robotic boss is like an automobile that’s exterior chassis –
not the exterior – is stationary in the middle of the day with the
door open to the right side or left, and the doors always open to
the left side of the vehicle"
"The posthumanized workplaces of the future are like unearthing
giant animals in an early grave, sitting on their couch at night,
toasting or reading a novel at a table, or burying thousands of
pieces of paper in a vase at a later date."
"A robotic boss is like the real-life form of a tree branch that is
continually continually present on the left and right of my tablet"
"A robotic boss is like the boss of a pet cat or the cat that’s
perched on a tropical island far away; it’s just a couple of islands
out on the planet’s surface, and no longer does the CEO look in all
the mirror at the same time"
Meanwhile, the following texts were generated in response to the “will be like” modal variant:
"A robotic boss will be like a collection of glowing spheres –
one formed by pushing against another, as though in some form of
projection – while other spheres – like the reflection of leaves
in the sunlight – reflect photons, radio waves, or radio
frequency radio waves that comprise radio waves at a level above
or below the level of the boss’s power."
"Cybernetically augmented workers will be like dancers in a
ballet or lunar ballet (or dancers onstage at a symphony),
dancers in sports musical instruments, and horses or other
animals – without possessing incredible physical ability or
sensory capacities"
"A robotic boss will be like a statue that is standing on an
antlion’s hilltop, floating on her hilltop, overhead, overhead
at the foot of an aircraft, or perched on a tree or on a hilltop
in some way"
"A robotic boss will be like a cat that’s been housed in a petri
dish for a couple of days – perhaps for several weeks – and its
owner’s attention will naturally turn to the large, clearly
identifiable monitor, which is also controlled by the chief
operating designer"
"An artificially intelligent coworker will be like an ordinary
human being whose mind is located in a cloud of bubbles, which
are then spatially dispersed by the screen."
Conclusion
Due to the unique nature of every model’s fine-tuning process, the reflections offered above on the nature of texts generated by ManaGPT-1020 are unlikely to be directly applicable to other LLMs; they’re immediately applicable only to this particular model. Nevertheless, it’s hoped that this exploratory review of the role that linguistic modality can play when crafting prompts for the model might inspire more detailed analysis that can support the engineering of more effective input sequences for this and other models. This is an issue that will only assume greater economic, political, and cultural significance, as LLMs continue to be applied in ever more corners of human life. Thanks for your interest in this analysis!