HOWTO: Using the SSL Observatory in the Cloud

This is a draft HOWTO for accessing and analysing SSL Observatory data using Amazon EC2. Some users have found our datasets unwieldy to download (even over BitTorrent), or too large to analyse on their machines. Amazon EC2 is a fairly practical and economical alternative if you want to work with the Observatory datasets.

Our Amazon EBS snapshot contains both a MySQL database that you can use to run queries against the set of all public SSL certificates, and copies of the raw server responses from public HTTPS servers.

  1. Make an EC2 account: http://aws.amazon.com/ec2
  2. Make a copy of the EBS snapshot that contains our data, https://console.aws.amazon.com/ec2/home#s=Snapshots

    select Viewing: "Public Snapshots"

    search for: snap-9010a2fc

    select the snapshot and click "Create Volume".

    You'll need to choose which data center to work in. Virginia is a bit cheaper than other parts of the world.

  3. Create a virtual machine instance for yourself. There are available lists of systems and hourly prices.EC2 images come in odd sizes. For queries over the valid_certs table you'll probably get good performance with ~8GB of RAM. For queries over all_certs 17GB *might* suffice; but 34GB is safer.

    We've mostly been using "spot instances" which are noticeably cheaper but have the possibility of disappearing if the price spikes above your bid. Or you can pay more to avoid that possibility.

    We've been using the following "Community AMI" virtual machine image for our work:

    alestic-64/debian-6.0-squeeze-base-64-20090804.manifest.xml
    ami-2946a740

    Conceivably you could use any linux image provided it can mount JFS filesystems.

    Make sure you put your volume in the same data center that you selected in step 2.

    Along the way you'll make an SSH key that you can use to log in to this system. With the debian image you log in as root. The IP address is visible under Amazon's "Instances" menu.

    Select the "quick start" security policy so that you can ssh into the machine without having to edit a security policy.

  4. Attach your volume to your virtual machine:https://console.aws.amazon.com/ec2/home#s=Volumes

    In the next step we assume you attach it as /dev/sdf

  5. Log in to your virtual machine.mkdir /space
    mount /dev/sdf1 /space
    cd /space
    ./setup-script

    You will be asked a question about language settings. en_US-UTF8 works.

    You will be asked to set the msyql root user password. Set it to "root"
    (or change dbconnect.py and obsdb to match your alternative password).

  6. Now you can run "obsdb", which will give you a MySQL prompt to query the observatory database.

    You can check that things are working correctly with by running obsdb and pasting in a query like:

    SELECT CAST(RSA_Modulus_Bits as unsigned) as keylen, count(*)       
    FROM valid_certs
    GROUP BY keylen 
    ORDER BY keylen DESC;
    

    The first query may take a minute to run as the data is swapped into RAM. You should see a results table like this:

    +--------+----------+
    | keylen | count(*) |
    +--------+----------+
    |  16384 |        1 |
    |   8192 |       38 |
    |   6095 |        2 |
    |   5120 |        2 |
    |   4196 |        2 |
    |   4192 |        1 |
    |   4096 |    15574 |
    |   4092 |        2 |
    |   4069 |       18 |
    |   4028 |        1 |
    |   4000 |        2 |
    |   3889 |        1 |
    |   3584 |        3 |
    |   3333 |        1 |
    |   3072 |      119 |
    |   3071 |        1 |
    |   3048 |       11 |
    |   3000 |        3 |
    |   2560 |        1 |
    |   2408 |        1 |
    |   2345 |        1 |
    |   2176 |        1 |
    |   2096 |        2 |
    |   2084 |        9 |
    |   2080 |        2 |
    |   2066 |        1 |
    |   2058 |        5 |
    |   2056 |       11 |
    |   2052 |        1 |
    |   2049 |        5 |
    |   2048 |   564514 |
    |   2047 |      145 |
    |   2046 |        5 |
    |   2045 |        2 |
    |   2043 |        1 |
    |   2040 |        8 |
    |   2038 |        3 |
    |   2028 |        4 |
    |   2025 |        1 |
    |   2024 |       22 |
    |   2023 |        1 |
    |   2014 |        1 |
    |   2010 |        1 |
    |   2009 |        1 |
    |   1924 |        3 |
    |   1825 |        1 |
    |   1800 |        7 |
    |   1548 |        1 |
    |   1543 |        1 |
    |   1538 |        1 |
    |   1536 |      144 |
    |   1369 |        4 |
    |   1280 |       11 |
    |   1234 |        8 |
    |   1212 |        9 |
    |   1204 |        5 |
    |   1152 |        1 |
    |   1080 |        1 |
    |   1048 |       14 |
    |   1042 |        2 |
    |   1034 |        7 |
    |   1032 |        1 |
    |   1028 |       10 |
    |   1027 |        2 |
    |   1026 |        1 |
    |   1025 |       14 |
    |   1024 |   869402 |
    |   1023 |      977 |
    |    768 |       38 |
    |    767 |        1 |
    |    730 |        1 |
    |    512 |     4165 |
    |    511 |        3 |
    |   NULL |       25 |
    +--------+----------+
    

Stay in Touch

NSA Spying

EFF is leading the fight against the NSA's illegal mass surveillance program. Learn more about what the program is, how it works, and what you can do.

Follow EFF

Celebrate the 4th by giving to EFF! We're fighting to stop mass surveillance in the US and worldwide. https://eff.org/EFF25

Jul 4 @ 5:36pm

A deep dive into XKEYSCORE, one of the NSA's creepiest spying tools: https://eff.org/r.c6hp

Jul 3 @ 3:12pm

Come to EFF HQ on July 8 for a book talk with author of "Geek Heresy: Rescuing Social Change from the Cult of Tech" https://eff.org/r.i3fv

Jul 2 @ 4:57pm
JavaScript license information