Shuffling, jumbling, scrambling—no matter how you name this operation—is not as obvious as it may seem when you are targeting a flow of characters. For example, a good Lorem Ipsum generator is not easy to implement because it does not only rely on random tokens, it also requires these tokens to assemble in a coherent way. “The point of using Lorem Ipsum is that it has a more-or-less normal distribution of letters, as opposed to using 'Content here, content here', making it look like readable English.”

Basic random algorithms do not preserve the average distribution of letters, so they fail to deliver a result that smells like the input text. Let's show the problem using a naive script:

// Naive Text Scrambler
// (Select a text frame and run.)

var sto = app.selection[0].parentStory;
var a = sto.texts[0].contents.split('');
var i = a.length;
var p,t;

while( --i )
{
p = ~~(i*Math.random());
t = a[i];
a[i] = a[p];
a[p] = t;
}

sto.contents = a.join('');

The above code puts the contents of a Story in an array of characters, then it randomly exchanges every character with another one. For the purposes of my experiment I use an excerpt from the story “How to Design an Addicting Product” by Jeff Davidson. Here is the typical outcome of running our naive script (click the image to see the animation):

We instantly perceive that the alleged dummy text is not properly balanced. Letters, spaces, and punctuation clash in a chaotic manner, since no rule governs their cohabitation. For a more natural scrambling, it is necessary to somehow record and play back the statistical articulation of the characters. That's what a Markov chain can do.

## Using a Markov Chain

The good news is that IdExtenso, our InDesign/ExtendScript framework, provides a module Markov.jsxlib which implements the mechanism of a Markov chain: “Seen as a data structure a Markov chain is great at capturing the probabilistic behavior of a sequence. Given an input stream (string, array) formed of ordered elements ('tokens'), the Markov chain keeps track of how elements tend to succeed one another. It basically answers the question: given a token T preceded by the tokens T0,T1...Ti, what are the preferred tokens to come next. The sequence (T0,T1...Ti,T) is treated as the 'state' of the Markov chain at that particular instant. The size (or depth) of the Markov chain defines how many Ti have to be registered in each state.”

To understand all the subtleties of this algorithm, the best is to experiment for yourself the API exposed in the source code. You will discover—among other things—that it is possible to use words rather than characters as elementary tokens. But for the basic example I want to illustrate, the minimal Markov chain unit (the character) is fine.

Here is how this works:

1. Take a string S (for example, the contents of a Story.)

2. Determine the depth to be used. For small texts, depth=2 is a reasonable choice. The Markov chain will then consider how characters are paired in the original text and keep track of the most probable state transitions.

3. Invoke \$\$.Markov( S, depth ) to create the Markov chain based on the input string.

4. Call result = \$\$.Markov.run() to build a new string using the default callback function (which picks characters randomly along the chain, and creates an output of the same length.)

5. Replace the original string.

The final script has only thirty lines (note that including the Random module, although optional, improves random functions):

// IdExtenso entry point and includes.
// ---
#include '../\$\$.jsxinc'
#include '../etc/\$\$.Random.jsxlib'
#include '../etc/\$\$.Markov.jsxlib'

\$\$.load();

try
{
const DEPTH = 2;
var t, sto, n;

t = (t=app.properties.selection) && t.length && t[0];
if( t && (t instanceof TextFrame) )
{
t = (sto=t.parentStory).texts[0].contents;
if( DEPTH < (n=t.length) && \$\$.Markov(t,DEPTH) )
{
t = \$\$.Markov.run();

app.scriptPreferences.enableRedraw = false;
sto.texts[0].contents = t;
app.scriptPreferences.enableRedraw = true;
}
}
else
{
alert(__("Please, select a text frame."));
}
}
catch(e)
{
\$\$.receiveError(e);
}

\$\$.unload();

Now let's see what the result looks like:

The scrambled text is now much more similar to the input because it involves pairs of characters which, although randomly chosen, already exist in the source.

Note. — The source code is available on GitHub, assuming you have IdExtenso installed too. For those who don't want to install the framework, a standalone version of the script is available as well (see option below.)