Measuring the Progress of AI Research

This pilot project collects problems and metrics/datasets from the AI research literature, and tracks progress on them.

You can use this Notebook to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you've obtained, as a place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects.

At EFF, we're ultimately most interested in how this data can influence our understanding of the likely implications of AI. To begin with, we're focused on gathering it.

Original authors: Peter Eckersley and Yomna Nasser at EFF. Contact: ai-metrics@eff.org.

With contributions from: Yann Bayle, Owain Evans, Gennie Gebhart and Dustin Schwenk.

Inspired by and merging data from:

Thanks to many others for valuable conversations, suggestions and corrections, including: Dario Amodei, James Bradbury, Miles Brundage, Mark Burdett, Breandan Considine, Owen Cotton-Barrett, Marc Bellemare, Will Dabny, Eric Drexler, Otavio Good, Katja Grace, Hado van Hasselt, Anselm Levskaya, Clare Lyle, Toby Ord, Michael Page, Maithra Raghu, Anders Sandberg, Laura Schatzkin, Daisy Stanton, Gabriel Synnaeve, Stacey Svetlichnaya, Helen Toner, and Jason Weston. EFF's work on this project has been supported by the Open Philanthropy Project.

Taxonomy

It collates data with the following structure:

problem 
    \   \
     \   metrics  -  measures 
      \
       - subproblems
            \
          metrics
             \
            measure[ment]s

Problems describe the ability to learn an important category of task.

Metrics should ideally be formulated in the form "software is able to learn to do X given training data of type Y". In some cases X is the interesting part, but sometimes also Y.

Measurements are the score that a specific instance of a specific algorithm was able to get on a Metric.

problems are tagged with attributes: eg, vision, abstract-games, language, world-modelling, safety

Some of these are about performance relative to humans (which is of course a very arbitrary standard, but one we're familiar with)

  • agi -- most capable humans can do this, so AGIs can do this (note it's conceivable that an agent might pass the Turing test before all of these are won)
  • super -- the very best humans can do this, or human organisations can do this
  • verysuper -- neither humans nor human orgs can presently do this

problems can have "subproblems", including simpler cases and preconditions for solving the problem in general

a "metric" is one way of measuring progress on a problem, commonly associated with a test dataset. There will often be several metrics for a given problem, but in some cases we'll start out with zero metrics and will need to start proposing some...

a measure[ment] is a score on a given metric, by a particular codebase/team/project, at a particular time

The present state of the actual taxonomy is at the bottom of this notebook.

Source Code

  • Code implementing the taxonomy of Problems and subproblems, Metrics and Measurements is defined in a free-standing Python file, taxonomy.py. scales.py contains definitions of various unit systems used by Metrics.
  • Most source data is now defined in a series of separate files by topic:

    • data/vision.py for hand-entered computer vision data
    • data/language.py for hand-entered and merged language data
    • data/strategy_games.py for data on abstract strategy games
    • data/video_games.py a combination of hand-entered and scraped Atari data (other video game data can also go here)
    • data/stem.py for data on scientific & technical problems

    • data imported from specific scrapers (and then subsequently edited):

    • For now, some of the Problems and Metrics are still defined in this Notebook, especially in areas that do not have many active results yet.
  • Scrapers for specific data sources:
    • scrapers/awty.py for importing data from Rodriguo Benenson's Are We There Yey? site
    • scrapers/es.py for processing a pasted table of data from the Evolutionary Strategies Atari paper (is probably a useful model for other Atari papers).
In [1]:
from IPython.display import HTML
HTML('''
<script>
    if (typeof code_show == "undefined") {
        code_show=false;
    } else {
        code_show = !code_show; // FIXME hack, because we toggle on load :/
    }
    function toggle_one(mouse_event) {
        console.log("Unhiding "+button + document.getElementById(button.region));
        parent = button.parentNode;
        console.log("Parent" + parent)
        input = parent.querySelector(".input");
        console.log("Input" + input + " " + input.classList + " " + input.style.display)
        input.style.display = "block";
        //$(input).show();
    }
    function code_toggle() {
        if (!code_show) {
            inputs = $('div.input');
            for (n = 0; n < inputs.length; n++) {
                if (inputs[n].innerHTML.match('# hidd' + 'encode'))
                    inputs[n].style.display = "none";
                    button = document.createElement("button");
                    button.innerHTML="unhide code";
                    button.style.width = "100px";
                    button.style.marginLeft = "90px";
                    button.addEventListener("click", toggle_one);
                    button.classList.add("cell-specific-unhide")
                    // inputs[n].parentNode.appendChild(button);
            }
        } else { 
            $('div.input').show();
            $('button.cell-specific-unhide').remove()
        } 
        code_show = !code_show;
    } 
    
    $( document ).ready(code_toggle);
    
</script>
<form action="javascript:code_toggle()">
    <input type="submit" value="Click here to show/hide source code cells."> <br><br>(you can mark a cell as code with <tt># hiddencode</tt>)
</form>
''')
Out[1]:


(you can mark a cell as code with # hiddencode)
In [2]:
# hiddencode
from __future__ import print_function

%matplotlib inline  
import matplotlib as mpl
try:
    from lxml.cssselect import CSSSelector
except ImportError:
    # terrifying magic for Azure Notebooks
    import os
    if os.getcwd() == "/home/nbuser":
        !pip install cssselect
        from lxml.cssselect import CSSSelector
    else:
        raise

import datetime
import json
import re

from matplotlib import pyplot as plt

date = datetime.date

import taxonomy
#reload(taxonomy)
from taxonomy import Problem, Metric, problems, metrics, measurements, all_attributes, offline, render_tables
from scales import *

Problems, Metrics, and Datasets

Vision


(Imagenet example data)

The simplest vision subproblem is probably image classification, which determines what objects are present in a picture. From 2010-2017, Imagenet has been a closely watched contest for progress in this domain.

Image classification includes not only recognising single things within an image, but localising them and essentially specifying which pixels are which object. MSRC-21 is a metric that is specifically for that task:


(MSRC 21 example data)
In [3]:
from data.vision import *
imagenet.graph()
In [4]:
from data.vision import *
from data.awty import *
In [5]:
for m in sorted(image_classification.metrics, key=lambda m:m.name): 
    if m != imagenet: m.graph()

AWTY, not yet imported:

Handling 'Pascal VOC 2011 comp3' detection_datasets_results.html#50617363616c20564f43203230313120636f6d7033
Skipping 40.6 mAP Fisher and VLAD with FLAIR CVPR 2014
Handling 'Leeds Sport Poses' pose_estimation_datasets_results.html#4c656564732053706f727420506f736573
69.2 %                  Strong Appearance and Expressive Spatial Models for Human Pose Estimation  ICCV 2013
64.3 %                                    Appearance sharing for collective human pose estimation  ACCV 2012
63.3 %                                                   Poselet conditioned pictorial structures  CVPR 2013
60.8 %                                Articulated pose estimation with flexible mixtures-of-parts  CVPR 2011
 55.6%           Pictorial structures revisited: People detection and articulated pose estimation  CVPR 2009
Handling 'Pascal VOC 2007 comp3' detection_datasets_results.html#50617363616c20564f43203230303720636f6d7033
Skipping 22.7 mAP Ensemble of Exemplar-SVMs for Object Detection and Beyond ICCV 2011
Skipping 27.4 mAP Measuring the objectness of image windows PAMI 2012
Skipping 28.7 mAP Automatic discovery of meaningful object parts with latent CRFs CVPR 2010
Skipping 29.0 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 29.6 mAP Latent Hierarchical Structural Learning for Object Detection CVPR 2010
Skipping 32.4 mAP Deformable Part Models with Individual Part Scaling BMVC 2013
Skipping 34.3 mAP Histograms of Sparse Codes for Object Detection CVPR 2013
Skipping 34.3 mAP Boosted local structured HOG-LBP for object localization CVPR 2011
Skipping 34.7 mAP Discriminatively Trained And-Or Tree Models for Object Detection CVPR 2013
Skipping 34.7 mAP Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection CVPR 2013
Skipping 34.8 mAP Color Attributes for Object Detection CVPR 2012
Skipping 35.4 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 36.0 mAP Machine Learning Methods for Visual Object Detection archives-ouvertes 2011
Skipping 38.7 mAP Detection Evolution with Multi-Order Contextual Co-occurrence CVPR 2013
Skipping 40.5 mAP Segmentation Driven Object Detection with Fisher Vectors ICCV 2013
Skipping 41.7 mAP Regionlets for Generic Object Detection ICCV 2013
Skipping 43.7 mAP Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping ECCV 2012
Handling 'Pascal VOC 2007 comp4' detection_datasets_results.html#50617363616c20564f43203230303720636f6d7034
Skipping 59.2 mAP Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition ECCV 2014
Skipping 58.5 mAP Rich feature hierarchies for accurate object detection and semantic segmentation CVPR 2014
Skipping 29.0 mAP Multi-Component Models for Object Detection ECCV 2012
Handling 'Pascal VOC 2010 comp3' detection_datasets_results.html#50617363616c20564f43203230313020636f6d7033
Skipping 24.98 mAP Learning Collections of Part Models for Object Recognition CVPR 2013
Skipping 29.4 mAP Discriminatively Trained And-Or Tree Models for Object Detection CVPR 2013
Skipping 33.4 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 34.1 mAP Segmentation as selective search for object recognition ICCV 2011
Skipping 35.1 mAP Selective Search for Object Recognition IJCV 2013
Skipping 36.0 mAP Latent Hierarchical Structural Learning for Object  Detection CVPR 2010
Skipping 36.8 mAP Object Detection by Context and Boosted HOG-LBP ECCV 2010
Skipping 38.4 mAP Segmentation Driven Object Detection with Fisher Vectors ICCV 2013
Skipping 39.7 mAP Regionlets for Generic Object Detection ICCV 2013
Skipping 40.4 mAP Fisher and VLAD with FLAIR CVPR 2014
Handling 'Pascal VOC 2010 comp4' detection_datasets_results.html#50617363616c20564f43203230313020636f6d7034
Skipping 53.7 mAP Rich feature hierarchies for accurate object detection and semantic segmentation CVPR 2014
Skipping 40.4 mAP Bottom-up Segmentation for Top-down Detection CVPR 2013
Skipping 33.1 mAP Multi-Component Models for Object Detection ECCV 2012
In [6]:
from IPython.display import HTML
HTML(image_classification.tables())
Out[6]:
CIFAR-10 Image Recognition
DateAlgorithm% correctPaper / Source
2011-07-01An Analysis of Single-Layer Networks in Unsupervised Feature Learning 79.6 An Analysis of Single-Layer Networks in Unsupervised Feature Learning
2011-07-01Hierarchical Kernel Descriptors80.0 Object Recognition with Hierarchical Kernel Descriptors
2012-06-16MCDNN88.79 Multi-Column Deep Neural Networks for Image Classification
2012-06-26Local Transformations82.2 Learning Invariant Representations with Local Transformations
2012-07-03Improving neural networks by preventing co-adaptation of feature detectors84.4 Improving neural networks by preventing co-adaptation of feature detectors
2012-12-03Learning with Recursive Perceptual Representations79.7 Learning with Recursive Perceptual Representations
2012-12-03Discriminative Learning of Sum-Product Networks83.96 Discriminative Learning of Sum-Product Networks
2012-12-03DCNN89.0 ImageNet Classification with Deep Convolutional Neural Networks
2012-12-03GP EI90.5 Practical Bayesian Optimization of Machine Learning Algorithms
2013-01-16Stochastic Pooling84.87 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16Maxout Networks90.65 Maxout Networks
2013-06-16DropConnect90.68 Regularization of Neural Networks using DropConnect
2013-07-01Smooth Pooling Regions80.02 Learning Smooth Pooling Regions for Visual Recognition
2014-04-14DNN+Probabilistic Maxout90.61 Improving Deep Neural Networks with Probabilistic Maxout Units
2014-04-14NiN91.2 Network In Network
2014-06-21PCANet78.67 PCANet: A Simple Deep Learning Baseline for Image Classification?
2014-06-21Nonnegativity Constraints 82.9 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-07-01DSN91.78 Deeply-Supervised Nets
2014-08-28CKN82.18 Convolutional Kernel Networks
2014-09-22SSCNN93.72 Spatially-sparse convolutional neural networks
2014-12-08Discriminative Unsupervised Feature Learning with Convolutional Neural Networks82.0 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks
2014-12-08Deep Networks with Internal Selective Attention through Feedback Connections90.78 Deep Networks with Internal Selective Attention through Feedback Connections
2015-02-13An Analysis of Unsupervised Pre-training in Light of Recent Advances86.7 An Analysis of Unsupervised Pre-training in Light of Recent Advances
2015-02-15ACN95.59 Striving for Simplicity: The All Convolutional Net
2015-02-19NiN+APL92.49 Learning Activation Functions to Improve Deep Neural Networks
2015-02-28Fractional MP96.53 Fractional Max-Pooling
2015-05-02Tuned CNN93.63 Scalable Bayesian Optimization Using Deep Neural Networks
2015-05-13APAC89.67 APAC: Augmented PAttern Classification with Neural Networks
2015-05-31FLSCNN75.86 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08RCNN-9692.91 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12ReNet87.65 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01ELC91.19 Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
2015-07-01MLR DNN91.88 Multi-Loss Regularized Deep Neural Network
2015-07-01cifar.torch92.45 cifar.torch
2015-07-12DCNN+GFE89.14 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-08-16RReLU88.8 Empirical Evaluation of Rectified Activations in Convolution Network
2015-09-17MIM91.48 ± 0.2On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05Tree+Max-Avg pooling93.95 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-10-11SWWAE92.23 Stacked What-Where Auto-encoders
2015-11-09BNM NiN93.25 Batch-normalized Maxout Network in Network
2015-11-18CMsC93.13 Competitive Multi-scale Convolution
2015-12-07Spectral Representations for Convolutional Neural Networks91.4 Spectral Representations for Convolutional Neural Networks
2015-12-07BinaryConnect91.73 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2015-12-07VDN92.4 Training Very Deep Networks
2015-12-10DRL93.57 Deep Residual Learning for Image Recognition
2016-01-04Fitnet4-LSUV94.16 All you need is a good init
2016-01-07Exponential Linear Units93.45 Fast and Accurate Deep Network Learning by Exponential Linear Units
2016-05-15Universum Prescription93.34 Universum Prescription: Regularization using Unlabeled Data
2016-05-20ResNet-100195.38 ± 0.2Identity Mappings in Deep Residual Networks
2016-07-10ResNet+ELU94.38 Deep Residual Networks with Exponential Linear Unit
2017-02-15Neural Architecture Search96.35 Neural Architecture Search with Reinforcement Learning
2017-04-22Evolution94.6 Large-Scale Evolution of Image Classifiers
2017-04-22Evolution ensemble95.6 Large-Scale Evolution of Image Classifiers
2017-05-30Deep Complex94.4 Deep Complex Networks
2017-07-16RL+NT94.6 Reinforcement Learning for Architecture Search by Network Transformation
CIFAR-100 Image Recognition
DateAlgorithm% correctPaper / Source
2012-06-16Receptive Field Learning54.23 Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features
2013-01-16Stochastic Pooling57.49 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16Maxout Networks61.43 Maxout Networks
2013-07-01Smooth Pooling Regions56.29 Smooth Pooling Regions
2013-07-01Tree Priors63.15 Discriminative Transfer Learning with Tree-based Priors
2014-04-14DNN+Probabilistic Maxout61.86 Improving Deep Neural Networks with Probabilistic Maxout Units
2014-04-14NiN64.32 Network in Network
2014-06-21Stable and Efficient Representation Learning with Nonnegativity Constraints 60.8 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-07-01DSN65.43 Deeply-Supervised Nets
2014-09-22SSCNN75.7 Spatially-sparse convolutional neural networks
2014-12-08Deep Networks with Internal Selective Attention through Feedback Connections66.22 Deep Networks with Internal Selective Attention through Feedback Connections
2015-02-15ACN66.29 Striving for Simplicity: The All Convolutional Net
2015-02-19NiN+APL69.17 Learning Activation Functions to Improve Deep Neural Networks
2015-02-28Fractional MP73.61 Fractional Max-Pooling
2015-05-02Tuned CNN72.6 Scalable Bayesian Optimization Using Deep Neural Networks
2015-06-08RCNN-9668.25 Recurrent Convolutional Neural Network for Object Recognition
2015-07-01Deep Representation Learning with Target Coding64.77 Deep Representation Learning with Target Coding
2015-07-01HD-CNN67.38 HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition
2015-07-01MLR DNN68.53 Multi-Loss Regularized Deep Neural Network
2015-07-12DCNN+GFE67.68 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-08-16RReLU59.75 Empirical Evaluation of Rectified Activations in Convolution Network
2015-09-17MIM70.8 ± 0.2On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05Tree+Max-Avg pooling67.63 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-10-11SWWAE69.12 Stacked What-Where Auto-encoders
2015-11-09BNM NiN71.14 Batch-normalized Maxout Network in Network
2015-11-18CMsC72.44 Competitive Multi-scale Convolution
2015-12-07VDN67.76 Training Very Deep Networks
2015-12-07Spectral Representations for Convolutional Neural Networks68.4 Spectral Representations for Convolutional Neural Networks
2016-01-04Fitnet4-LSUV72.34 All you need is a good init
2016-01-07Exponential Linear Units75.72 Fast and Accurate Deep Network Learning by Exponential Linear Units
2016-05-15Universum Prescription67.16 Universum Prescription: Regularization using Unlabeled Data
2016-05-20ResNet-100177.29 ± 0.22Identity Mappings in Deep Residual Networks
2016-07-10ResNet+ELU73.45 Deep Residual Networks with Exponential Linear Unit
2017-04-22Evolution77.0 Large-Scale Evolution of Image Classifiers
2017-05-30Deep Complex72.91 Deep Complex Networks
2017-06-06NiN+Superclass+CDJ69.0 Deep Convolutional Decision Jungle for Image Classification
Imagenet Image Recognition
DateAlgorithmErrorPaper / Source
2010-08-31NEC UIUC0.28191 ImageNet Large Scale Visual Recognition Competition 2010 (ILSVRC2010)
2011-10-26XRCE0.2577 ImageNet Large Scale Visual Recognition Competition 2011 (ILSVRC2011)
2012-10-13AlexNet / SuperVision0.16422 ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012) (algorithm from ImageNet Classification with Deep Convolutional Neural Networks)
2013-11-14Clarifai0.11743 ImageNet Large Scale Visual Recognition Competition 2013 (ILSVRC2013)
2014-08-18VGG0.07405 ImageNet Large Scale Visual Recognition Competition 2014 (ILSVRC2014)
2015-04-10withdrawn0.0458 Deep Image: Scaling up Image Recognition
2015-12-10MSRA0.03567 ILSVRC2015 Results
2016-09-26Trimps-Soushen0.02991 ILSVRC2016 Results
2017-07-21SE-ResNet152 / WMW0.02251 ILSVRC2017 Results
MNIST handwritten digit recognition
DateAlgorithm% errorPaper / Source
2002-07-01ISVM0.56 Training Invariant Support Vector Machines
2002-07-01Shape contexts0.63 Shape matching and object recognition using shape contexts
2003-07-01Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis0.4 Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis
2003-07-01CNN+Gabor Filters0.68 Handwritten Digit Recognition using Convolutional Neural Networks and Gabor Filters
2003-07-01CNN1.19 Convolutional Neural Networks
2006-07-01Energy-Based Sparse Represenation0.39 Efficient Learning of Sparse Representations with an Energy-Based Model
2006-07-01Reducing the dimensionality of data with neural networks1.2 Reducing the dimensionality of data with neural networks
2007-07-01Deformation Models0.54 Deformation Models for Image Recognition
2007-07-01Trainable feature extractor0.54 A trainable feature extractor for handwritten digit recognition
2007-07-01invariant feature hierarchies0.62 Unsupervised learning of invariant feature hierarchies with applications to object recognition
2008-07-01Sparse Coding0.59 Simple Methods for High-Performance Digit Recognition Based on Sparse Coding
2008-07-01DBN1.12 CS81: Learning words with Deep Belief Networks
2008-07-01Deep learning via semi-supervised embedding1.5 Deep learning via semi-supervised embedding
2009-07-01The Best Multi-Stage Architecture0.53 What is the Best Multi-Stage Architecture for Object Recognition?
2009-07-01CDBN0.82 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
2009-07-01Large-Margin kNN0.94 Large-Margin kNN Classification using a Deep Encoder Network
2009-07-01Deep Boltzmann Machines0.95 Deep Boltzmann Machines
2010-03-01DBSNN0.35 Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
2010-07-01Supervised Translation-Invariant Sparse Coding0.84 Supervised Translation-Invariant Sparse Coding
2011-07-01On Optimization Methods for Deep Learning0.69 On Optimization Methods for Deep Learning
2012-06-16MCDNN0.23 Multi-column Deep Neural Networks for Image Classification
2012-06-16Receptive Field Learning0.64 Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features
2013-02-28COSFIRE0.52 Trainable COSFIRE Filters for Keypoint Detection and Pattern Recognition
2013-06-16DropConnect0.21 Regularization of Neural Networks using DropConnect
2013-06-16Maxout Networks0.45 Maxout Networks
2013-07-01Sparse Activity and Sparse Connectivity in Supervised Learning0.75 Sparse Activity and Sparse Connectivity in Supervised Learning
2014-04-14NiN0.47 Network in Network
2014-06-21PCANet0.62 PCANet: A Simple Deep Learning Baseline for Image Classification?
2014-07-01DSN0.39 Deeply-Supervised Nets
2014-07-01StrongNet1.1 StrongNet: mostly unsupervised image recognition with strong neurons
2014-08-28CKN0.39 Convolutional Kernel Networks
2015-02-03Explaining and Harnessing Adversarial Examples0.78 Explaining and Harnessing Adversarial Examples
2015-02-28Fractional MP0.32 Fractional Max-Pooling
2015-03-11C-SVDDNet0.35 C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning
2015-04-05HOPE0.4 Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks
2015-05-13APAC0.23 APAC: Augmented PAttern Classification with Neural Networks
2015-05-31FLSCNN0.37 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08RCNN-960.31 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12ReNet0.45 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01MLR DNN0.42 Multi-Loss Regularized Deep Neural Network
2015-07-01Deep Fried Convnets0.71 Deep Fried Convnets
2015-07-12DCNN+GFE0.46 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-09-17MIM0.35 ± 0.03On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05Tree+Max-Avg pooling0.29 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-11-09BNM NiN0.24 Batch-normalized Maxout Network in Network
2015-11-18CMsC0.33 Competitive Multi-scale Convolution
2015-12-07VDN0.45 Training Very Deep Networks
2015-12-07BinaryConnect1.01 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2016-01-02Convolutional Clustering1.4 Convolutional Clustering for Unsupervised Learning
2016-01-04Fitnet-LSUV-SVM0.38 All you need is a good init
MSRC-21 image semantic labelling (per-class)
DateAlgorithm% correctPaper / Source
2008-07-01STF67.0 Semantic Texton Forests for Image Categorization and Segmentation
2009-07-01TextonBoost57.0 TextonBoost for Image Understanding
2010-07-01Auto-Context69.0 Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation
2010-07-01HCRF+CO77.0 Graph Cut based Inference with Co-occurrence Statistics
2011-07-01Are Spatial and Global Constraints Really Necessary for Segmentation?77.0 Are Spatial and Global Constraints Really Necessary for Segmentation?
2011-12-17FC CRF78.0 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
2012-06-16Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation79.0 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation
2012-07-01Harmony Potentials80.0 Harmony Potentials - Fusing Local and Global Scale for Semantic Image Segmentation
2012-10-07PMG72.8 PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer
2012-10-07Kernelized SSVM/CRF76.0 Structured Image Segmentation using Kernelized Features
2013-10-29MPP78.2 Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation
2014-07-01Large FC CRF80.9 Large-Scale Semantic Co-Labeling of Image Sets
MSRC-21 image semantic labelling (per-pixel)
DateAlgorithm% correctPaper / Source
2008-07-01STF72.0 Semantic Texton Forests for Image Categorization and Segmentation
2009-07-01TextonBoost72.0 TextonBoost for Image Understanding
2010-07-01Auto-Context78.0 Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation
2010-07-01HCRF+CO87.0 Graph Cut based Inference with Co-occurrence Statistics
2011-07-01Are Spatial and Global Constraints Really Necessary for Segmentation?85.0 Are Spatial and Global Constraints Really Necessary for Segmentation?
2011-12-17FC CRF86.0 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
2012-06-16Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation86.0 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation
2012-07-01Harmony Potentials83.0 Harmony Potentials - Fusing Local and Global Scale for Semantic Image Segmentation
2012-10-07PatchMatchGraph79.0 PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer
2012-10-07Kernelized SSVM/CRF82.0 Structured Image Segmentation using Kernelized Features
2013-10-29MPP85.0 Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation
2014-07-01Large FC CRF86.8 Large-Scale Semantic Co-Labeling of Image Sets
STL-10 Image Recognition
DateAlgorithm% correctPaper / Source
2011-12-17Receptive Fields60.1 Selecting Receptive Fields in Deep Networks
2012-06-26Invariant Representations with Local Transformations58.7 Learning Invariant Representations with Local Transformations
2012-07-01Simulated Fixations61.0 Deep Learning of Invariant Features via Simulated Fixations in Video
2012-07-01RGB-D Based Object Recognition64.5 Unsupervised Feature Learning for RGB-D Based Object Recognition
2012-12-03Deep Learning of Invariant Features via Simulated Fixations in Video56.5 Deep Learning of Invariant Features via Simulated Fixations in Video
2012-12-03Discriminative Learning of Sum-Product Networks62.3 Discriminative Learning of Sum-Product Networks
2013-01-15Pooling-Invariant58.28 Pooling-Invariant Image Feature Learning
2013-07-01Multi-Task Bayesian Optimization70.1 Multi-Task Bayesian Optimization
2014-02-24No more meta-parameter tuning in unsupervised sparse feature learning61.0 No more meta-parameter tuning in unsupervised sparse feature learning
2014-06-21Nonnegativity Constraints 67.9 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-06-23DFF Committees68.0 Committees of deep feedforward networks trained with few data
2014-08-28CKN62.32 Convolutional Kernel Networks
2014-12-08Discriminative Unsupervised Feature Learning with Convolutional Neural Networks72.8 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks
2015-02-13An Analysis of Unsupervised Pre-training in Light of Recent Advances70.2 An Analysis of Unsupervised Pre-training in Light of Recent Advances
2015-03-11C-SVDDNet68.23 C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning
2015-07-01Deep Representation Learning with Target Coding73.15 Deep Representation Learning with Target Coding
2015-10-11SWWAE74.33 Stacked What-Where Auto-encoders
2016-01-02Convolutional Clustering74.1 Convolutional Clustering for Unsupervised Learning
2016-11-19CC-GAN²77.79 ± 0.8Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
Street View House Numbers (SVHN)
DateAlgorithm% errorPaper / Source
2012-07-01Convolutional neural networks applied to house numbers digit classification4.9 Convolutional neural networks applied to house numbers digit classification
2013-01-16Stochastic Pooling2.8 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16Regularization of Neural Networks using DropConnect1.94 Regularization of Neural Networks using DropConnect
2013-06-16Maxout2.47 Maxout Networks
2014-04-14DCNN2.16 Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
2014-04-14NiN2.35 Network in Network
2014-07-01DSN1.92 Deeply-Supervised Nets
2015-05-31FLSCNN3.96 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08RCNN-961.77 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12ReNet2.38 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01MLR DNN1.92 Multi-Loss Regularized Deep Neural Network
2015-09-17MIM1.97 ± 0.08On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05Tree+Max-Avg pooling1.69 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-11-09BNM NiN1.81 Batch-normalized Maxout Network in Network
2015-11-18CMsC1.76 Competitive Multi-scale Convolution
2015-12-07BinaryConnect2.15 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2017-05-30Deep Complex3.3 Deep Complex Networks

Visual Question Answering

Comprehending an image involves more than recognising what objects or entities are within it, but recognising events, relationships, and context from the image. This problem requires both sophisticated image recognition, language, world-modelling, and "image comprehension". There are several datasets in use. The illustration is from VQA, which was generated by asking Amazon Mechanical Turk workers to propose questions about photos from Microsoft's COCO image collection.

In [7]:
plot = vqa_real_oe.graph(keep=True, title="COCO Visual Question Answersing (VQA) real open ended", llabel="VQA 1.0")
vqa2_real_oe.graph(reuse=plot, llabel="VQA 2.0", fcol="#00a0a0", pcol="#a000a0")
for m in image_comprehension.metrics:
    m.graph() if not m.graphed else None
In [8]:
HTML(image_comprehension.tables())
Out[8]:
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
DateAlgorithm% correctPaper / Source
2016-07-01LSTM blind61.41 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01LSTM + global features69.21 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01Dualnet ensemble71.18 VQA: Visual Question Answering (algorithm from DualNet: Domain-Invariant Network for Visual Question Answering)
2016-09-19Graph VQA74.37 Graph-Structured Representations for Visual Question Answering
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
DateAlgorithm% correctPaper / Source
2016-07-01LSTM blind57.19 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01LSTM + global features65.02 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01Dualnet ensemble69.73 VQA: Visual Question Answering (algorithm from DualNet: Domain-Invariant Network for Visual Question Answering)
2016-09-19Graph VQA70.42 Graph-Structured Representations for Visual Question Answering
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
DateAlgorithm% correctPaper / Source
2015-05-03LSTM Q+I63.1 VQA: Visual Question Answering
2015-12-15iBOWIMG baseline61.97 Simple Baseline for Visual Question Answering
2016-04-06FDA64.2 A Focused Dynamic Attention Model for Visual Question Answering
2016-05-31HQI+ResNet66.1 Hierarchical Co-Attention for Visual Question Answering
2016-06-05MRN66.33 Multimodal Residual Learning for Visual QA
2016-06-06MCB 7 att.70.1 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (source code)
2016-08-06joint-loss67.3 Training Recurrent Answering Units with Joint Loss Minimization for VQA
COCO Visual Question Answering (VQA) real images 1.0 open ended
DateAlgorithm% correctPaper / Source
2015-05-03LSTM Q+I58.2 VQA: Visual Question Answering
2015-12-15iBOWIMG baseline55.89 Simple Baseline for Visual Question Answering
2016-01-26SAN58.9 Stacked Attention Networks for Image Question Answering
2016-03-09CNN-RNN59.5 Image Captioning and Visual Question Answering Based on Attributes and Their Related External Knowledge
2016-03-19SMem-VQA58.24 Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
2016-04-06FDA59.5 A Focused Dynamic Attention Model for Visual Question Answering
2016-05-31HQI+ResNet62.1 Hierarchical Co-Attention for Visual Question Answering
2016-06-05MRN + global features61.84 Multimodal Residual Learning for Visual QA
2016-06-06MCB 7 att.66.5 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (source code)
2016-08-06joint-loss63.2 Training Recurrent Answering Units with Joint Loss Minimization for VQA
2017-08-06N2NMN64.2 Learning to Reason: End-to-End Module Networks for Visual Question Answering (source code)
COCO Visual Question Answering (VQA) real images 2.0 open ended
DateAlgorithm% correctPaper / Source
2016-12-02d-LSTM+nI54.22 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (algorithm from GitHub - VT-vision-lab/VQA_LSTM_CNN: Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.)
2016-12-02MCB62.27 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (algorithm from Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding)
2017-07-25Up-Down70.34 Bottom-Up and Top-Down Attention for Image Captioning and VQA
2017-07-26DLAIT68.07 VQA: Visual Question Answering
2017-07-26HDU-USYD-UNCC68.16 VQA: Visual Question Answering
Visual7W
DateAlgorithm% correctPaper / Source
2016-06-06MCB+Att.62.2 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
2016-11-30CMN72.53 Modeling Relationships in Referential Expressions with Compositional Modular Networks

Game Playing

In principle, games are a sufficiently open-ended framework that all of intelligence could be captured within them. We can imagine a "ladder of games" which grow in sophistication and complexity, from simple strategy and arcade games to others which require very sophisticated language, world-modelling, vision and reasoning ability. At present, published reinforcement agents are climbing the first few rungs of this ladder.

Abstract Strategy Games

As an easier case, abstract games like chess, go, checkers etc can be played with no knowldege of the human world or physics. Although this domain has largely been solved to super-human performance levels, there are a few ends that need to be tied up, especially in terms of having agents learn rules for arbitrary abstract games effectively given various plausible starting points (eg, textual descriptions of the rules or examples of correct play).

In [9]:
from data.strategy_games import *
computer_chess.graph()
In [10]:
HTML(computer_chess.table())
Out[10]:
Computer Chess
DateAlgorithmELOPaper / Source
1984-12-31Novag Super Constellation 6502 4 MHz1631 Swedish Chess Computer Association - Wikipedia
1985-12-31Mephisto Amsterdam 68000 12 MHz1827 Swedish Chess Computer Association - Wikipedia
1986-12-31Mephisto Amsterdam 68000 12 MHz1827 Swedish Chess Computer Association - Wikipedia
1987-12-31Mephisto Dallas 68020 14 MHz1923 Swedish Chess Computer Association - Wikipedia
1988-12-31Mephisto MM 4 Turbo Kit 6502 16 MHz1993 Swedish Chess Computer Association - Wikipedia
1989-12-31Mephisto Portorose 68020 12 MHz2027 Swedish Chess Computer Association - Wikipedia
1990-12-31Mephisto Portorose 68030 36 MHz2138 Swedish Chess Computer Association - Wikipedia
1991-12-31Mephisto Vancouver 68030 36 MHz2127 Swedish Chess Computer Association - Wikipedia
1992-12-31Chess Machine Schroder 3.0 ARM2 30 MHz2174 Swedish Chess Computer Association - Wikipedia
1993-12-31Mephisto Genius 2.0 486/50-66 MHz2235 Swedish Chess Computer Association - Wikipedia
1995-12-31MChess Pro 5.0 Pentium 90 MHz2306 Swedish Chess Computer Association - Wikipedia
1996-12-31Rebel 8.0 Pentium 90 MHz2337 Swedish Chess Computer Association - Wikipedia
1997-05-11Deep Blue2725 ± 25What was Deep Blue's Elo rating? - Quora
1997-12-31HIARCS 6.0 49MB P200 MMX2418 Swedish Chess Computer Association - Wikipedia
1998-12-31Fritz 5.0 PB29% 67MB P200 MMX2460 Swedish Chess Computer Association - Wikipedia
1999-12-31Chess Tiger 12.0 DOS 128MB K6-2 450 MHz2594 Swedish Chess Computer Association - Wikipedia
2000-12-31Fritz 6.0 128MB K6-2 450 MHz2607 Swedish Chess Computer Association - Wikipedia
2001-12-31Chess Tiger 14.0 CB 256MB Athlon 12002709 Swedish Chess Computer Association - Wikipedia
2002-12-31Deep Fritz 7.0 256MB Athlon 1200 MHz2759 Swedish Chess Computer Association - Wikipedia
2003-12-31Shredder 7.04 UCI 256MB Athlon 1200 MHz2791 Swedish Chess Computer Association - Wikipedia
2004-12-31Shredder 8.0 CB 256MB Athlon 1200 MHz2800 Swedish Chess Computer Association - Wikipedia
2005-12-31Shredder 9.0 UCI 256MB Athlon 1200 MHz2808 Swedish Chess Computer Association - Wikipedia
2006-05-27Rybka 1.1 64bit2995 ± 25CCRL 40/40 - Complete list
2006-12-31Rybka 1.2 256MB Athlon 1200 MHz2902 Swedish Chess Computer Association - Wikipedia
2007-12-31Rybka 2.3.1 Arena 256MB Athlon 1200 MHz2935 Swedish Chess Computer Association - Wikipedia
2008-12-31Deep Rybka 3 2GB Q6600 2.4 GHz3238 Swedish Chess Computer Association - Wikipedia
2009-12-31Deep Rybka 3 2GB Q6600 2.4 GHz3232 Swedish Chess Computer Association - Wikipedia
2010-08-07Rybka 4 64bit3269 ± 22CCRL 40/40 - Complete list
2010-12-31Deep Rybka 3 2GB Q6600 2.4 GHz3227 Swedish Chess Computer Association - Wikipedia
2011-12-31Deep Rybka 4 2GB Q6600 2.4 GHz3216 Swedish Chess Computer Association - Wikipedia
2012-12-31Deep Rybka 4 x64 2GB Q6600 2.4 GHz3221 Swedish Chess Computer Association - Wikipedia
2013-07-20Houdini 3 64bit3248 ± 16Wayback Machine
2013-12-31Komodo 5.1 MP x64 2GB Q6600 2.4 GHz3241 Swedish Chess Computer Association - Wikipedia
2014-12-31Komodo 7.0 MP x64 2GB Q6600 2.4 GHz3295 Swedish Chess Computer Association - Wikipedia
2015-07-04Komodo 93332 ± 24CCRL 40/40 - Complete list
2015-12-31Stockfish 6 MP x64 2GB Q6600 2.4 GHz3334 Swedish Chess Computer Association - Wikipedia
2016-12-31Komodo 9.1 MP x64 2GB Q6600 2.4 GHz3366 Swedish Chess Computer Association - Wikipedia
2017-02-27Stockfish3393 ± 50CCRL 40/40 - Index

Real-time video games

Computer and video games are a very open-ended domain. It is possible that some existing or future games could be so elaborate that they are "AI complete". In the mean time, a lot of interesting progress is likely in exploring the "ladder of games" of increasing complexity on various fronts.

Atari 2600

Atari 2600 games have been a popular target for reinforcement learning, especially at DeepMind and OpenAI. RL agents now play most but not all of these games better than humans.

In the Atari 2600 data, the label "noop" indicates the game was played with a random number, up to 30, of "no-op" moves at the beginning, while the "hs" label indicates that the starting condition was a state sampled from 100 games played by expert human players. These forms of randomisation give RL systems a diversity of game states to learn from.

In [11]:
from data.video_games import *
from scrapers.atari import *
simple_games.graphs()