EFFector - Volume 4, Issue 5 - Hiding Data In Plain Sight: Some Key Questions About Cryptography

               //////////////  //////////////// //////////////
            ////             ////             ////
_________ /////////________ /////////_______ /////////________________
        ////               ////             ////
      //////////////////  ////             ////

//////////////////////////////////////////////////////////////////////
EFFector Online 4.05          1/7/1993                 editors@eff.org
A Publication of the Electronic Frontier Foundation     ISSN 1062-9424

                     -==--==--==-<>-==--==--==-
[Editor's Note: With this issue, EFFector Online will begin to
examine the technical, social, moral, legal and political issues 
surrounding the uses of encryption in computer-based communications. 
While many in various online communities around the world are highly 
conversant with cryptography and encryption, many others are not. 
Because of this we begin our series with Larry Loen's superb primer on
basic cryptography. This article originally appeared as a proto-FAQ
in the Usenet group, sci.crypt. Our readers with an interest in
learning about encryption on an ad-hoc basis are advised to read
sci-crypt and to participate in it. As with any other place on
the Net, "Ask. People know.How the world works is not a secret." -GV]
                     -==--==--==-<>-==--==--==-

                    HIDING DATA IN PLAIN SIGHT
               Some Key Questions About Cryptography
            BY LARRY LOEN (lwloen@rchland.vnet.ibm.com)

NOTE: The information and opinions expressed in this article
are those of the author and his collaborators and do not
represent the final word on these matters or the opinions,
views or policies of any company or organization.


Q1:  What is cryptography?  How, basically, does it work?
  What are the basic terms used to describe cryptography?

  Cryptography is the art and science of hiding data in plain sight.
  It is also the art and science of stealing data hidden in plain sight.
  There are at least three players. The first is the one who has
  the original data, which is presumed to have high value to
  others. This data is presumed to reside in a safe place that
  no one but the originator and his/her friends can see. (If the
  originator cannot physically secure the original data,
  cryptography is a waste of time). Now, for cryptography to be
  necessary, the data must, for some reason, have to be
  transmitted over some public means such as a telephone line, a
  letter through the mail; any means that cannot be physically
  secured by the owner to a legitimate receiver of the data. The
  receiver is the second party.

  Cryptography is any transformation of the data into a form
  that cannot (it is hoped) be recovered in a timely manner by an
  unknown party, which is called here 'the opponent'.
  The transformation is not some physical means, such as hiding the
  data on microfilm or  some such; it is some mathematical
  transformation that scrambles the original data in a way
  that the receiver on the other end knows how to unscramble.

  The process of scrambling (transforming) the data is called
  'encryption'. Unscrambling is called decryption. An
  encryption system has two basic parts. 1) A general
  transformation process called the encryption algorithm. 2) A
  customization of that algorithm called a cipher key. The
  sender and the receiver must find a secure means to exchange
  the cipher key. That is, the same public means used to send
  the encrypted data cannot be used. This may be a difficult
  problem, and has a variety of solutions, but will be assumed
  solved for now. Once the key is successfully exchanged, the
  two parties can separately implement the encryption algorithm
  and its inverse, the decryption algorithm.

  At this point, the data can be transmitted in its encrypted form
  using the agreed-to key to customize the general algorithm to a
  particular version that transforms the data. Since the
  encrypted data is sent over some insecure medium, it is assumed
  that an opponent (the third party) may intercept the data,
  possibly without being detected, and analyze the encrypted
  text, also called cipher text.

  In theory, any encryption system can be defeated, given enough
  time. The amount of time it takes cannot always be predicted.
  This is because the opponent can supply extra information
  that might reduce the computation time a great deal. For one
  thing, the sender and receiver may make a very poor choice of
  cipher key. If the opponent has a list of poor keys, a
  computer may permit a large list of such keys to be tried;
  if the poor key actually used is on the list, the opponent wins
  even if the encryption system is otherwise secure.

  All methods the opponent dreams up have one thing in common.
  It is an attempt to recover the original data without advance
  knowledge of the particular cipher key. There are a wide
  variety of means available and some will be described later on.
  The name for any of these methods is called 'cryptanalysis' and
  the person who does the penetration is called the cryptanalyst.

  In diagram form (the boxes indicated physically secure areas)--

  -------------|                   --------------
    Sender     |                   | Receiver
   "x"         |                   |  cipher key
    cipher key |------->  y  ----->|
   y=Encrypt(  |          |        | x=Decrypt(y,key)
      x,key)   |          |        |
  -------------|          |        |-------------
                          V
                     Opponent
                     z = Cryptanalysis(y,Encrypt Algorithm,
                         general knowledge of x, guesses about
                         secret key, statistical analysis of y)

       The opponent uses Cryptanalysis of message y until
       the value z is either equal to x or z is "enough" like
       x to accomplish the illicit purpose. The sender and
       receiver win whenever recovery of z takes too much time.

Q2:  I have invented this wonderful, fast-running encryption
  algorithm. How do I find out if it is as great as I think it
  is?

  It is one thousand times easier to invent an encryption
  algorithm than it is to discover if it is worthwhile. Most
  designers who have not learned cryptography are used to dealing
  with mathematics that discusses the general case. But,
  successful cryptanalysis often relies on any number of
  fortuitous special cases that the designer must anticipate lest
  a given key and data stream create even one of them. Many
  practical illicit decryptions astonish the newcomer; they seem
  like cheating, but they do work.

  It is easy to get superficial reassurance that a poor
  encryption algorithm seems good. Most people reading this file
  have probably attempted the kinds of cryptograms one finds in
  newspapers and puzzle books (usually called 'cryptograms').
  That encryption algorithm is simple -- one replaces each letter
  of the alphabet with exactly one other letter of the alphabet.
  In less than an hour, sixth graders have been taught to
  successfully solve this kind of cipher. Yet, it has 26
  factorial possible keys (about 2 to the 88th power), which is
  much more than the 2 to the 56th keys of the well known
  commercial algorithm, DES. A large number of keys is
  important, but is not by itself secure. Obviously, the
  successful sixth graders do not attempt all possible keys.
  They use their general knowledge of English to shortcut the
  process and eliminate all but a few possible keys.

  Since the gross mathematical properties of an encryption
  system prove nothing, only cryptanalytic attacks matter
  and these require some study.

Q3:  What is an "attack"?

  An attack is a particular form of cryptanalysis. There are
  generic types of attacks, and some very specific attacks. In
  the end, cryptography is a war of specifics. The opponent
  will either invent a very clever and unique attack or will
  customize a general attack to suit the needs at hand. Some
  attacks might even exploit the properties of a particular
  message or settle for a partial illicit decryption.

  However, in sci.crypt, there is a great deal of discussion
  about attacks, both general and specific, and an assumption
  that the reader can fill in missing details at times. Since
  those who post here usually have other things to do, to-the-bit
  details are often omitted.

Q4:  Hmm. In spite of questions 2 and 3, I still think I
  have a pretty good system. But, it seems pretty hard
  to know if it can really defeat an expert. Still, I know
  it will work if I can keep my method secret, right?

  Good luck. There are documented cases aplenty where an
  encryption system was deduced based entirely on clever analysis
  of the encrypted text alone. There are also known cases
  where encryption systems were deduced because the encrypted
  text was later published verbatim somewhere (for instance, a
  press release) and so the system was figured out, eliminating
  the presumed secrecy advantage for the next cipher text.

Q5:  What are the principal cryptanalytic attacks?

  The first is "cipher text only". This is recovering the text
  or the key by analysis of the text (using statistical means,
  for instance) and by knowing broad details such as whether it
  is the text of someone's speech, a PC-DOS EXE file, whether text
  is in English or French. For non-puzzle examples, such broad
  information is almost always reliably known. People have some
  idea of what they wish to steal. The weakest systems fall to
  this form of attack.

  The next attack is "known plaintext". If one works with a
  newspaper cryptogram, one may have little idea of what is in
  the text. However, if one was illicitly trying to decrypt the
  source code of Megabarfoocorp's C++ compiler, one would be much
  better off. There would be lots of things one could
  confidently expect, such as long strings of the space
  character, strings like "if (" and "if(" and the like.

  There would even be whole phrases like "Copyright 1990,
  Megabarfoocorp International" or some such. With imagination,
  surprisingly long strings can be predicted. Computers can
  tirelessly try a number of trivial variations of such known
  text at a great many locations. Knowledge of the encryption
  system may reveal the correct placement outright or a small
  number of places to try. A wide variety of systems can be
  broken if enough known plaintext can be successfully placed.

  The next attack is "chosen plaintext". In some situations, it
  is possible for the opponent to pose as the legitimate user
  of the encryption system. This is especially true in "public
  key" systems (described later). In this case, the opponent
  can present fairly arbitrary unencrypted data of his/her
  choosing, cause it to be encrypted, and study the results.
  Very few systems ever invented pass this test, but it should be
  seriously considered for any significant use. Why?  No
  designer can dream up all attacks. Moreover, at some point, a
  form of known plaintext attack may well enough approximate a
  chosen plaintext attack to make it worthwhile for the designer
  to allow for it to begin with, especially as it might not be
  dreamed up by the designer!

  There are other attack strategies. One worth mentioning for
  telecommunications applications is the "replay" attack.
  Suppose one has an Automatic Teller Machine (ATM) which uses a
  substitution cipher. Since one assumes the telephone line to
  the ATM can be tapped (why encrypt if it cannot?), one can also
  assume the opponent can _inject_ false ciphertext. Thus,
  without even being aware of the actual system, an opponent may
  be able to simply replay old ciphertext and get the cash drawer
  to give him/her $50 from your account. There are encryption
  systems which avoid this difficulty.

  Another general form of attack often not considered by
  newcomers is comparing multiple messages using the same key.
  It is impractical to use a different key for each cipher
  text (with one important exception called the 'one time
  pad'). Therefore, an opponent will have several different
  texts encrypted in the same key and may be able to exploit
  this fact. All transpositions algorithms (those which merely
  scramble the order of the bytes) are vulnerable to this
  attack. More sophisticated systems are also vulnerable to this
  attack in some cases as well.

  This is a vast area and one of the things that is difficult,
  even for experienced designers, is anticipation of all possible
  attacks. Moreover, there is no obligation on the attacker's
  part to be less mathematically sophisticated than the designer.
  A system that survives the attacks the designer invents may
  fall easily to a mathematical approach the designer of the
  system is unfamiliar with.

  And, one even has to worry about items like a rare bug in the
  program that injects the cipher key rather than the cipher text
  into the output stream. It is amazing how often trifling
  errors in the implementation make theory irrelevant.

Q6: What does make a system 'good'?

  What makes a system 'good' relies on many details specific to
  a given situation. Only after a lot of ingenious attacks are
  tried can a system be released for use. There never can be any
  absolute guarantees. All that can be said is that it defeated
  the best experts available. The opponent may be smarter.

  However, there are some agreed-to minimums that a good system
  must have to even be worth serious analysis --

  1)  It must be suitable for computer use. Ordinary byte streams
   (as arbitrary as possible) would be the input "plain"
   text and byte streams would be the output "cipher" text.
   There should be few cases where some kinds of input text
   produces poor results and these, if they exist, should be
   easily known, described, and avoided.

  2)  To be cost-competitive, it must produce about the same number of
   output cipher bytes as input plain bytes. Relaxing this restriction
   is not as helpful as one might think; the best historical systems
   meet this restriction, so a new system must meet it also.

  3)  Output bytes should have a complex relationship between the key,
   the plain text being encrypted, and possibly some amount of
   text previously encrypted. "Simple", general methods, such
   as creation/construction of minterm sums and matrix inversions should
   not produce the cipher key or an equivalent, usable illicit
   decryption method.

  4)  Trying all keys must not be practical. Trying a cleverly ordered
   subset of the keys must not work. This is hard to achieve; it means
   that the failure of a particular key X to illicitly decrypt must
   not also allow an opponent to conclude that some large class of keys
   "similar" to X need not be tried.

   All keys must be equally strong; any exceptions must be well
   known, few in number, and easily avoided.

  5)  Assume all details about the encryption algorithm are known.
   Relying on a secret method has failed repeatedly. It is prudent to
   assume only the variable part of the system, called the cipher key,
   that is selected by the customer, is unknown.

  6)  Classical attacks must be tried in great variety and ingenuity.
   Details of this are beyond the scope of this file. For a summary
   of the principal attacks, see Question 5, "What are the principal
   cryptanalytic attacks?". It is easy to do this particular step
   incompletely. Remember, there may be little effective limit to the
   resources or the brain power of one's opponent. The data may be
   very valuable and it make take only a couple of days rental of the
   right kind of consultant and a supercomputer to get the job
   done. The legitimate user will, by contrast, have a smaller
   budget that is begrudged as "overhead".

  7) Performance must be competitive with existing solutions. A
   practical problem is that every moment spent encrypting is
   regarded as "overhead". Therefore, the method must not be
   uncompetitive with existing algorithms regarded as secure.

  Inventing one's own system is an interesting pastime and quite
  educational. However, any hope of really securing data requires, at
  the very minimum, mastery of illicit decipherment. It is very easy
  to scramble data impressively and please oneself. This is not
  the same as keeping it actually secure.

Q7:  OK, you guys seem to know a lot about this stuff. I
  think I have a neat system. Here's a message I encrypted.
  Tell me if the system is any good. Oh, and can I keep my
  algorithm secret?  I think I want to patent it, so Q5 does not
  apply in my case.

  Most people read and understand questions 5 and 6 and their
  implications. But, a few individuals, because of what they
  invented, believe they have an exception of some kind. If that
  is you, you're setting yourself up for disappointment. Even if
  you are stone cold right about what you have invented, you are
  not going about proving it properly. The main issue is a
  mindset issue. Analyzing a cipher is not like a proposition
  in geometry or the denouement of a mystery novel. One's
  intuition about proofs may not hold for cryptography.

  Finding out if a cipher is any good is perhaps most like
  debugging a program. Just as you can never be sure you got all
  of the bugs out, you can never be sure you have a cipher that
  will withstand all attacks an opponent might come up with.

  So, even if you do publish some form of challenge, and even if
  the posters of sci.crypt attack it vigorously, it may prove
  nothing.

  Second, you may not be in a position to form a good, sound
  test. Beginners who get this far are peeved that they are
  asked to reveal their encryption algorithm. They may also be
  asked to reveal whole paragraphs of cipher text or to encrypt
  certain texts in a secret key and give the answer back. All
  these things seem like cheating. The answer is: real opponents
  _do_ cheat. Unlike those who post here, they are not above
  burglarizing your home to get a copy of your source code, if
  that is cheaper than hiring experts by the hour, to give a
  relevant example. Whatever we ask for will represent a close
  analogy of a real-world attack.

  If you are still convinced you have a good system, by all means
  go out and try to patent it; you are not legally obliged to
  ask our help, after all. But history is against you; against
  you so much that you will find few people here that are willing
  to trust your definition of a good test of a cipher.

  There is no 'royal road' to cryptography. The best thing to do
  is take a couple of months and seriously study crytanalysis.

Q8:  What are the legal restrictions on cryptography?

  You'd have to ask a lawyer. Most governments consider cryptography a
  sensitive topic for one reason or another. There are a variety
  of restrictions possible.

  Most governments don't give their reasons publicly, so one
  may not know why there are restrictions, just that there are
  restrictions to follow.

  One can expect to find laws about sending encrypted data over national
  borders and may often expect to find laws about the import and
  export of encryption systems.

  Since sending data over Internet, BitNet, Usenet, Fidonet, etc.
  may cross national borders (even if the originator does not
  realize it), a basic familiarity with these laws is recommended
  before sending out encryption systems or encrypted data.

Q9:  What is a public key system?

  A public key system is an encryption method with a unique
  property -- encryption and decryption use different keys and
  one of those keys can be published freely.

  Being able to pass around the 'decrypt' key part of one's
  encryption algorithm allows some very interesting things to
  happen. For one thing, messages can be exchanged by people who
  had not planned on doing so in advance. One merely encrypts in
  one's private key, decrypts using the receiver's public key and
  passes on the result to the receiver. The receiver encrypts in
  his/her own private key, then decrypts using the sender's public
  key, recovering the message.

Q10:  What is key distribution?

  Key distribution is the practical problem of exchanging keys
  between the parties that wish to set up an encryption system
  between the two of them. Especially in a network environment,
  passing keys one can trust back and forth, can be difficult.
  How can one be sure a cipher key was not sent by an impostor?
  Unless the keys are exchanged in a secret, secure place,
  face-to-face, getting keys securely exchanged and with
  knowledge of the fact that the key was sent authentically can
  be difficult. Yet, any practical system must permit reasonably
  convenient, but very secure exchange of cipher keys.

  Once a few special keys are securely exchanged, it may be
  possible to send new cipher keys in encrypted form between the
  sender and receiver that have a known lifetime. That is, the
  cipher key is sent in a special encrypted message using a
  special key used only for exchanging keys. In
  telecommunications environments, this allows frequent change of
  keys between the parties 'safely' over the same insecure medium
  used to send the cipher text. While this idea is at the heart
  of much commercial use of cryptography, it is not easily
  accomplished and less easily summarized.

Q11:  What is the 'one time pad'?

  The 'one time pad' is an encryption method that is known to be
  absolutely, provably secure. How it works is as follows --
  1. Generate a huge number of bits using a naturally random
  process. 2. Both sides exchange this data, which is as much
  data as they are going to exchange later on. 3. Exclusive OR the
  original text, bit by bit, with the 'one time pad' data, never
  reusing the 'one time pad' data. 4. Have elaborate rules to
  keep the two sides in synch so that the data can be recovered
  reliably by the receiver. (Both sides must know where they are
  in terms of how much 'one time pad' has been consumed).

  Note that only genuine, naturally random processes will do. There
  must be no relationship between any prior bit of the 'one time pad'
  and a future bit of the key. "Random number generators", in
  particular, may not substitute and still be a 'one time pad'. The
  reason the one time pad works is precisely because there is no
  relationship between any bits of the key stream. All cipher
  keys are equally probable. All original data messages are
  equally probable. There is no 'hat' to hang analysis upon.
  Even if one can inject as much text into a one time pad as one
  wishes, recovering the key stream tells nothing about the next
  message.

  Unfortunately, one time pads are very ungainly, so they are not
  typically used. The requirement to have a genuinely random process,
  with the right kind of statistical probability, is not easy to
  to set up. The ability to exchange vast amounts of data,
  securely, in advance, is not easy to achieve in environments
  when encryption is needed in the first place.

  There are a variety of cipher systems which generate "pseudo
  one time pad" streams of cipher key, but all have the same
  theoretical vulnerability; any algorithmic process introduces
  relationships between some old key bit(s) and the new key bit
  and so permits cryptanalysis. "Random number generators" are
  frequently dreamed up by newcomers as a "pseudo one time pad",
  but they are notoriously vulnerable to analysis, all
  independent of whether the pseudo-random stream satisfies
  randomness tests or not.

  The favorite example is a "standard" pseudo-random number
  generator of the form x = ((A*x) +C) mod M where A, C, and M
  are fixed and x is the most recent value used to form the last
  "random" number. Thus, the key of the cipher is the initial
  value of x. Let's set M equal to two to the 32nd for this example.
 
  Now, if one can predict or simply find the word
  "the " (the word "the" followed by a space character) on a
  even four byte boundary in the file, one can recover an old
  value of "x" and predict the rest of the keystream from that
  point, which may be enough of a "break" to accomplish the
  purpose. This is true even if the particular A, C, and M
  perfectly satisfy any randomness test that anyone ever devised.

  Naturally, this example can be improved upon, but it
  illustrates the potential problem all such methods have.

Q12:  What is the NSA (National Security Agency)?

  The NSA has several tasks. The most relevant here is that it employs
  the United States' government's cryptographers. Most nations have some
  department that handles cryptography for it, but the US' NSA tends to
  draw the most attention. It is considered equal to or superior to any
  such department in the world. It keeps an extremely low public
  profile, and has a "large", but secret budget. Because of these and
  other factors, there will be many posts speculating about the
  activities and motives of the NSA.

Q13:  What is the American Cryptogram Association?

  American Cryptogram Association Information, Sept 1992

  The American Cryptogram Association is an international group
  of individuals who study cryptography together and publish
  puzzle ciphers to challenge each other and get practical
  experience in solving ciphers. It is a nonprofit group.

  The American Cryptogram Association (ACA) publishes the
  bi-monthly magazine, "The Cryptogram", which contains
  a wide variety of simple substitution ciphers ("cryptograms")
  in English and other languages as well as cryptograms
  using cipher systems of historical interest (such as Playfair).

  The level of difficulty varies from easy to difficult. Except
  for "foreign language" cryptograms, all text is in English.

  The magazine also features "how to" articles at the hobbyist level
  and other features of interest to members. A "Computer Supplement"
  is also available which features articles on computerizing various
  phases of cryptogram solving; the level of the articles varies from
  simple programming examples to automatic algorithmic solutions of
  various cipher systems. The supplement is available as a separate
  subscription, and is published when material permits; usually two
  or three times per year.

  Where to write for subscription or other information --

  ACA Treasurer
  18789 West Hickory St
  Mundelein IL 60060

Q14:  What are some good books on cryptography?

  Good books on this topic that weren't government classified used
  to be rare. There are now a host of good books. One informal
  test of a library's quality is how many of these are on the
  shelves. These are widely available, but few libraries have
  them all.

    David Kahn, The Codebreakers, Macmillan, 1967 [history; excellent]
    H. F. Gaines, Cryptanalysis, Dover, 1956 [originally 1939, as
        Elementary Cryptanalysis]
    Abraham Sinkov, Elementary Cryptanalysis, Math. Assoc. of Amer.,
        1966
    D Denning, Cryptography and Data Security, Addison-Wesley, 1983
    Alan G. Konheim, Cryptography:  A Primer, Wiley-Interscience, 1981
    Meyer and Matyas, Cryptography:  A New Dimension in Computer Data
        Security, John Wiley & Sons, 1982.
  Books can be ordered from Aegan Park Press. They are not inexpensive,
        but they are also the only known public source for most of these
        and other books of historical and analytical interest.
        From the Aegean Park Press P.O. Box 2837, Laguna
        Hills, CA 92654-0837
        [write for current catalog].

  The following is a quality, scholarly journal. Libraries may carry
  it if they are into high technology or computer science.

     Cryptologia:  a cryptology journal, quarterly since Jan 1977.
        Cryptologia; Rose-Hulman Institute of Technology; Terre Haute
        Indiana 47803 [general: systems, analysis, history, ...]

  Thanks to
     cme@ellisun.sw.stratus.com (Carl Ellison)
     Gwyn@BRL.MIL (Doug Gwyn)
     smb@ulysses.att.com (Steven Bellovin)
  for assembling this list of books in bibliography form. I knew
  of each here, but getting all the details is a lot of work.
  Thanks again.
------------------------]
   Larry W. Loen        | My Opinions are decidedly my own,so please
                        | do not attribute them to my employer
------------------------]
=====================================================================
     EFFector Online is published by
     The Electronic Frontier Foundation
     155 Second Street, Cambridge MA 02141
     Phone: +1 617 864 0665 FAX: +1 617 864 0866
     Internet Address: eff@eff.org
 Reproduction of this publication in electronic media is encouraged.
 Signed articles do not necessarily represent the view of the EFF.
 To reproduce signed articles individually, please contact the authors
 for their express permission. EFFector Online is edited by Gerard
 Van der Leun (van@eff.org) and Rita Rouvalis (rita@eff.org).
=====================================================================
EFFector - Volume 4, Issue 5 - Hiding Data In Plain Sight: Some Key Questions About Cryptography

EFFector - Volume 4, Issue 5 - Hiding Data In Plain Sight: Some Key Questions About Cryptography

Contact

About

Issues

Updates

Press

Donate

EFFector - Volume 4, Issue 5 - Hiding Data In Plain Sight: Some Key Questions About Cryptography

EFFector - Volume 4, Issue 5 - Hiding Data In Plain Sight: Some Key Questions About Cryptography

Follow EFF:

Contact

About

Issues

Updates

Press

Donate