登录查看更多内容

Random Numbers Are Too Important To Be Left To Chance

Emile Bellott

Senior Business Systems Analyst

发布日期: 2018年6月23日

Were the million monkeys underestimated ?

For a number of years I’ve been fascinated by – and skeptical of the ‘Million Monkey Paradox’ -- a cautionary tale about things that could possibly occur, in principle; but whose probability is vanishingly small. Well prior to the advent of even primitive computing, Sir Arthur Eddington brought the monkeys to the stage, in his 1927 Gifford Lectures, in Cambridge. In a discussion of statistical mechanics, he famously was quoted: “If an army of monkeys were strumming on typewriters, they might write all the books of the British Museum.”

In a contemporary setting, we might well consider the power of (computational) artificial intelligence to pass the Turing Test- namely to communicate in a way that is qualitatively indistinguishable from a human. Some would say that we are already there. The IBM Watson recently debated a qualified debate champion, in real time, on a proposition, about which the two opponents had no prior knowledge.

The monkeys raise all kinds of questions about chance; or in a post-digital era, stochastic simulation.

40 years ago, Yale Professor W.R. Bennett, Jr. systematically studied the monkey problem with computer simulation. Not surprisingly, even taking account of the natural frequency of the 26 letters of the English language, the results were, well, random, to say the least. Large paragraphs of random letter strings yielded little enlightenment. Successive probability frequency matrices of pairs; triplets; or quartets did finally provide many “words” in succession; though still devoid of meaningful phrases or ideas.

Moreover, the 4th order matrices were so good, in fact, that it was possible, at least to recognize the linguistic signature of Shakespeare, Poe, Italian, Latin, or German monkeys; albeit still, devoid of any deeper meaning.

I’ve always been fond of random numbers. One fateful day, when I was learning to code, a fellow traveler showed me the RND function. One proud day, I used this new-found facility and a loop, to generate 100 random numbers. The fact that they were only pseudo-random, and fraught with other statistical anomalies, did nothing to dampen my enthusiasm.

Thus inspired, in my Honors English class, I submitted a short computer-generated “poem” . This was created by assigning numbers to a list of words and using my old friend RND. That this was deserving of any passing grade at all, was purely a gesture of goodwill by the teacher; and a nod to the fact that I hadn’t submitted yet another composition of sophomoric rhyming couplets. And yes, shockingly, it did not pass the Turing Test.

A little later, at a used book sale, I found a wonderful old scientific book called “A Million Random Digits”, published by the eponymous RAND corporation. It was first printed in 1955, when computers were not generally available. But, at that time, a reliable set of random numbers was important for engineering and mathematics. It’s primary utility, today is to serve as a bookend, and a memento of those heady days.

A few weeks ago, my interest in coding was rekindled, as I dissected 1000 lines of code in a Microsoft Excel Macro. I hadn’t written it; but needed to make substantial changes for our business. Then it dawned on me that the Macro-enabled Excel workbook as an interesting playground for coding. It was endowed by its creators with all the facilities of a modern programming language in one simple package, that didn’t require any software beyond what you’d expect on a standard office PC. And, of course, there were commands to access and modify strings; ability to take input and output; and my old friend RND.

So, I thought: What if the monkeys had a box of LEGO’s (presumably a unit of higher complexity) -- Could they build a bridge; or a house ? A small village, perhaps ? In Eddington’s vision -- could phrases or whole sentences be selected at random, from other works; and then be redeployed at random in a meaningful way ?

The first experiment was to collect 50 proverbs, each parsed into three phrases. These would become the raw material to generate, random, “new proverbs” comprised of first, middle, and last phrases derived from three different source proverbs. The results looked something like this:

Partial list of source proverb examples:

A stitch in time saves nine.
A journey of 1000 miles begins with the first step
Experience is the best teacher
Too many cooks will spoil the broth
The early bird gets the worm
The nail that sticks up gets hammered down

Representative examples of new proverbs:

A golden key can open a thousand words
Familiarity is next to an island
Cleanliness lifts the sword
The early bird lifts all boats
Fortune is mightier than the worm

One might readily infer, looking over the 1000 new “proverbs” created by this method, that there are still many meaningless results. But, keeping in mind that the algorithm did not account for, and match grammar, tense or number in each phrase, the results are recognizably prescriptive in a proverb-like way. The new proverbs, as a group, represent a spectrum of sense-to-nonsense. Some are downright amusing. And one lesson learned is that it’s not just letters or words, or clusters of words that convey meaning; but context; and the presence of some higher order structure.

Armed with this entertaining result, the next experiment was to recombine the first, second, and third lines of 50 selected Japanese Haiku ( in English translation) – again, at random.

Several examples of authentic Haiku are:

a

The lamp once out

Cool stars enter

The window frame

b

The crow has flown away

Swaying in the evening sun,

A leafless tree

c

Over the wintry forest

The winds howl with rage

With no leaves to blow

d

Consider me

As one who loved poetry

And persimmons

e

In the twilight rain

These brilliant-hued hibiscus

A lovely sunset

Examples of the output:

I

The summer grasses

No sign can fortell

As they were before my birth

II

The wren

In an old pond

In cool waves

III

Early summer rain

Against the sky

But slowly, slowly

IV

A dog howling

Cool in the moonlight

Tiger moth

V

Calligraphy of geese

Cool stars enter

Two of them

The resulting new Haiku show more promise. It appears that there is a greater proportion of noteworthy output. This is likely because examples of the form all have similar themes. Also, the free verse form somewhat mitigates the constraint of number and tense. Free from the constraints of grammar, each line conveys imagery.

The final test was an attempt to write poetry. The source file consisted of six Robert Frost Poems: 1) Stopping By woods on a Snowy Evening; 2) October; 3) After Apple Picking; 4) Birches; 5) Mending Wall; and 6) The Road Not Taken. 12 lines were selected at random from the combined set, to generate each new poem. No rules were in place to preserve rhyme or grammar.

Here are two representative examples of new free-verse “poems” thus created:

I

Not to return. Earth's the right place for love:

And to whom I was like to give offence.

He is all pine and I am apple orchard.

And spills the upper boulders in the sun;

Shattering and avalanching on the snow-crust—

He only says, "Good fences make good neighbours."

To ask if there is some mistake.

I let my neighbour know beyond the hill;

Slow, slow!

Cherish in hand, lift down, and not let fall.

That sends the frozen-ground-swell under it,

And they seem not to break; though once they are bowed

II

It melted, and I let it fall and break.

As ice-storms do. Often you must have seen them

Where they have left not one stone on a stone,

And to whom I was like to give offence.

The darkest evening of the year.

Where your face burns and tickles with the cobwebs

From a twig's having lashed across it open.

Something there is that doesn't love a wall,

O hushed October morning mild,

Clear to the ground. He always kept his poise

We have to use a spell to make them balance:

One could do worse than be a swinger of birches.

These free verse results - random as they are – convey a sense of place; and don’t feel overtly disjointed. The grammar is self-contained within each line. They mainly preserve the voice of the original author; his cadence; and the unmistakable context of New England imagery. Probably, also some of these phrases appear more acceptable, since they tickle our memory -- we learned many of them in school.

Conclusions

The million monkeys paradox has a certain philosophical attraction. It’s as though, on a grand scale the whole universe is careening headlong toward some greater level of organization and intrinsic perfection. Yet we realize, on deeper consideration, that the ability to create meaningful literature will depend on not merely assembling pieces together; but also by rating their fitness to the purpose; and feeding back some measure of success achieved.

Bennett found that assembling individual particles (letters) at random would not necessarily achieve success. Aggregation into pairs, triplets, and quartets proved to be a step in the right direction. However, the meaningfulness of the words was still questionable. Higher-order probability matrices were the first step. The short strings were recognizably word-like. And real words were produced.

In the experiments described here, the existence of real words is assumed. By looking at larger aggregates ( phrases ) we encountered new challenges; namely grammatical and context matching. All of this is to say that even the million monkeys, with the advantage of literacy, would require some rules about what goes with what; and whether the new line is better than the old. Finally, as we look at the results, there is a clear need to rate the products and learn through critique. What is missing, in this scenario, then, is a comparison, evaluation and correction mechanism. i.e. a feedback-loop.

Victor Kovalets

4 个月

Thanks for sharing, Emile!

要查看或添加评论，请登录

Emile Bellott的更多文章

The Rubber Duck Saga

2018年7月5日

The Rubber Duck Saga

Maersk Liverpool -- Captain’s Log 19:00 13 January 2010 Position 110:34 deg West, 37:21 deg North It was a dark and…

1 条评论
America The Beautiful !

2017年7月4日

America The Beautiful !

"From Sea to Shining Sea..

2 条评论
Before the Sun Rises

2016年4月24日

Before the Sun Rises

The richness of a solitary walk on the beach..

3 条评论

Random Numbers Are Too Important To Be Left To Chance

Emile Bellott

Senior Business Systems Analyst

a

b

c

d

e

I

II

III

IV

V

I

II

Emile Bellott的更多文章

社区洞察

其他会员也浏览了

[8th December 2024] Interesting Things I Learnt This Week

Chunking: The Unsung Hero of Retrieval-Augmented Generation (RAG) ????

Alchemy of Algorithms: Transforming Raw Data into Strategic Gold

Accuracy is not Evil

Stochastic Processes Analysis

Last Week's Takeaway

KNN Points to Keep

Memorization VS genuine reasoning in LLMs

A linear regression story!

What is the probability of...

a

b

c

d

e

I

II

III

IV

V

I

II

Emile Bellott的更多文章

The Rubber Duck Saga

America The Beautiful !

Before the Sun Rises

社区洞察

其他会员也浏览了

[8th December 2024] Interesting Things I Learnt This Week

Chunking: The Unsung Hero of Retrieval-Augmented Generation (RAG) ????

Alchemy of Algorithms: Transforming Raw Data into Strategic Gold

Accuracy is not Evil

Stochastic Processes Analysis

Last Week's Takeaway

KNN Points to Keep

Memorization VS genuine reasoning in LLMs

A linear regression story!

What is the probability of...