This pilot project collects problems and metrics/datasets from the AI research literature, and tracks progress on them.
You can use this Notebook to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you've obtained, as a place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects.
At EFF, we're ultimately most interested in how this data can influence our understanding of the likely implications of AI. To begin with, we're focused on gathering it.
Original authors: Peter Eckersley and Yomna Nasser at EFF. Contact: ai-metrics@eff.org.
With contributions from: Yann Bayle, Owain Evans, Gennie Gebhart and Dustin Schwenk.
Inspired by and merging data from:
Thanks to many others for valuable conversations, suggestions and corrections, including: Dario Amodei, James Bradbury, Miles Brundage, Mark Burdett, Breandan Considine, Owen Cotton-Barrett, Marc Bellemare, Will Dabny, Eric Drexler, Otavio Good, Katja Grace, Hado van Hasselt, Anselm Levskaya, Clare Lyle, Toby Ord, Michael Page, Maithra Raghu, Anders Sandberg, Laura Schatzkin, Daisy Stanton, Gabriel Synnaeve, Stacey Svetlichnaya, Helen Toner, and Jason Weston. EFF's work on this project has been supported by the Open Philanthropy Project.
Problems, Metrics and Datasets
Learning to Learn Better
It collates data with the following structure:
problem
\ \
\ metrics - measures
\
- subproblems
\
metrics
\
measure[ment]s
Problems describe the ability to learn an important category of task.
Metrics should ideally be formulated in the form "software is able to learn to do X given training data of type Y". In some cases X is the interesting part, but sometimes also Y.
Measurements are the score that a specific instance of a specific algorithm was able to get on a Metric.
problems are tagged with attributes: eg, vision, abstract-games, language, world-modelling, safety
Some of these are about performance relative to humans (which is of course a very arbitrary standard, but one we're familiar with)
problems can have "subproblems", including simpler cases and preconditions for solving the problem in general
a "metric" is one way of measuring progress on a problem, commonly associated with a test dataset. There will often be several metrics for a given problem, but in some cases we'll start out with zero metrics and will need to start proposing some...
a measure[ment] is a score on a given metric, by a particular codebase/team/project, at a particular time
The present state of the actual taxonomy is at the bottom of this notebook.
Metric
s.Most source data is now defined in a series of separate files by topic:
data/stem.py for data on scientific & technical problems
data imported from specific scrapers (and then subsequently edited):
scrapers/awty.py
but then edited by handProblem
s and Metric
s are still defined in this Notebook, especially in areas that do not have many active results yet.from IPython.display import HTML
HTML('''
<script>
if (typeof code_show == "undefined") {
code_show=false;
} else {
code_show = !code_show; // FIXME hack, because we toggle on load :/
}
function toggle_one(mouse_event) {
console.log("Unhiding "+button + document.getElementById(button.region));
parent = button.parentNode;
console.log("Parent" + parent)
input = parent.querySelector(".input");
console.log("Input" + input + " " + input.classList + " " + input.style.display)
input.style.display = "block";
//$(input).show();
}
function code_toggle() {
if (!code_show) {
inputs = $('div.input');
for (n = 0; n < inputs.length; n++) {
if (inputs[n].innerHTML.match('# hidd' + 'encode'))
inputs[n].style.display = "none";
button = document.createElement("button");
button.innerHTML="unhide code";
button.style.width = "100px";
button.style.marginLeft = "90px";
button.addEventListener("click", toggle_one);
button.classList.add("cell-specific-unhide")
// inputs[n].parentNode.appendChild(button);
}
} else {
$('div.input').show();
$('button.cell-specific-unhide').remove()
}
code_show = !code_show;
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()">
<input type="submit" value="Click here to show/hide source code cells."> <br><br>(you can mark a cell as code with <tt># hiddencode</tt>)
</form>
''')
# hiddencode
from __future__ import print_function
%matplotlib inline
import matplotlib as mpl
try:
from lxml.cssselect import CSSSelector
except ImportError:
# terrifying magic for Azure Notebooks
import os
if os.getcwd() == "/home/nbuser":
!pip install cssselect
from lxml.cssselect import CSSSelector
else:
raise
import datetime
import json
import re
from matplotlib import pyplot as plt
date = datetime.date
import taxonomy
#reload(taxonomy)
from taxonomy import Problem, Metric, problems, metrics, measurements, all_attributes, offline, render_tables
from scales import *
The simplest vision subproblem is probably image classification, which determines what objects are present in a picture. From 2010-2017, Imagenet has been a closely watched contest for progress in this domain.
Image classification includes not only recognising single things within an image, but localising them and essentially specifying which pixels are which object. MSRC-21 is a metric that is specifically for that task:
from data.vision import *
imagenet.graph()
from data.vision import *
from data.awty import *
for m in sorted(image_classification.metrics, key=lambda m:m.name):
if m != imagenet: m.graph()