One problem with steganography is that the embedding of
hidden text in the covertext changes the statistical
characteristics of the covertext. With large amounts
of covertext, it becomes obvious. Niels Provos
addressed this in Outguess
by changing other bits in the covertext to minimize
the impact of the embedding on the chi-square test.
Would it be easier to embed undetectably if we can
generate the covertext ourselves. Definitely!
Mybal.pl does this. Supply
it with an ASCII text and it computes the
probabilities of characters following every sequence
of characters in the text. Supply it with a key,
a message to embed and a word, and it
will generate a covertext starting with that word.
The covertext has exactly the same probability
distribution as the orginal text, but the message
can be extracted from it, if the key is known.
How does it work? Mybal takes the word to start with,
interprets it as a sequence of chars and checks which
chars would be next in the sequence, and how probable
each of them are. It then throws a biased die (a PRNG
seeded with the key) to decide which char is next.
It appends that char and interprets the result as another
sequence and so on. If the list of possible next characters
contains two chars with the same probability and
the keyed random number generator chooses one of them
mybal looks for the next message bit to embed. If it's
a zero, then the randomly chosen char is appended.
If it's a one, the other equally likely char is appended.
This guarantees that the probability distribution is
always the same as in the orginal.
To extract the message, mybal starts with the first word
and walks along the covertext, always checking the list
of possible next chars. If the char in the covertext has
the same probability as another char in the list, then
a message bit could be embedded with that char. To check which
bit it was, mybal uses the keyed PRNG to generate the text
itself and thus sees which char it would have chosen on a
one or zero bit.