Skip to main content
Podcast Episode: Antitrust/Pro-Internet

AI Progress Measurement (Archived)

PROJECT

AI Progress Measurement (Archived)

Measuring the Progress of AI Research (Archived)

This project was active during 2017 and hasn't been updated since then. It collected problems and metrics/datasets from the AI research literature, with the intent to track progress on them. The page is preserved for historical interest but should not be considered up-to-date.

You can use this Notebook to see how things are progressing in specific subfields or AI/ML as a whole, as a place to report new results you've obtained, as a place to look for problems that might benefit from having new datasets/metrics designed for them, or as a source to build on for data science projects.

At EFF, we're ultimately most interested in how this data can influence our understanding of the likely implications of AI. To begin with, we're focused on gathering it.

Original authors: Peter Eckersley and Yomna Nasser at EFF. Contact: ai-metrics@eff.org.

With contributions from: Yann Bayle, Owain Evans, Gennie Gebhart and Dustin Schwenk.

Inspired by and merging data from:

Thanks to many others for valuable conversations, suggestions and corrections, including: Dario Amodei, James Bradbury, Miles Brundage, Mark Burdett, Breandan Considine, Owen Cotton-Barrett, Marc Bellemare, Will Dabny, Eric Drexler, Otavio Good, Katja Grace, Hado van Hasselt, Anselm Levskaya, Clare Lyle, Toby Ord, Michael Page, Maithra Raghu, Anders Sandberg, Laura Schatzkin, Daisy Stanton, Gabriel Synnaeve, Stacey Svetlichnaya, Helen Toner, and Jason Weston. EFF's work on this project has been supported by the Open Philanthropy Project.

Taxonomy

It collates data with the following structure:

problem 
    \   \
     \   metrics  -  measures 
      \
       - subproblems
            \
          metrics
             \
            measure[ment]s

Problems describe the ability to learn an important category of task.

Metrics should ideally be formulated in the form "software is able to learn to do X given training data of type Y". In some cases X is the interesting part, but sometimes also Y.

Measurements are the score that a specific instance of a specific algorithm was able to get on a Metric.

problems are tagged with attributes: eg, vision, abstract-games, language, world-modelling, safety

Some of these are about performance relative to humans (which is of course a very arbitrary standard, but one we're familiar with)

  • agi -- most capable humans can do this, so AGIs can do this (note it's conceivable that an agent might pass the Turing test before all of these are won)
  • super -- the very best humans can do this, or human organisations can do this
  • verysuper -- neither humans nor human orgs can presently do this

problems can have "subproblems", including simpler cases and preconditions for solving the problem in general

a "metric" is one way of measuring progress on a problem, commonly associated with a test dataset. There will often be several metrics for a given problem, but in some cases we'll start out with zero metrics and will need to start proposing some...

a measure[ment] is a score on a given metric, by a particular codebase/team/project, at a particular time

The present state of the actual taxonomy is at the bottom of this notebook.

Source Code

  • Code implementing the taxonomy of Problems and subproblems, Metrics and Measurements is defined in a free-standing Python file, taxonomy.py. scales.py contains definitions of various unit systems used by Metrics.
  • Most source data is now defined in a series of separate files by topic:

    • data/vision.py for hand-entered computer vision data
    • data/language.py for hand-entered and merged language data
    • data/strategy_games.py for data on abstract strategy games
    • data/video_games.py a combination of hand-entered and scraped Atari data (other video game data can also go here)
    • data/stem.py for data on scientific & technical problems

    • data imported from specific scrapers (and then subsequently edited):

    • For now, some of the Problems and Metrics are still defined in this Notebook, especially in areas that do not have many active results yet.
  • Scrapers for specific data sources:
    • scrapers/awty.py for importing data from Rodriguo Benenson's Are We There Yey? site
    • scrapers/atari.py for parsing cut-and-pasted tables of Atari 2600 results from the various PDFs in the literature.
In [1]:
from IPython.display import HTML
HTML('''
<script>
    if (typeof code_show == "undefined") {
        code_show=false;
    } else {
        code_show = !code_show; // FIXME hack, because we toggle on load :/
    }

    function code_toggle() {
        if (!code_show) {
            inputs = $('div.input');
            for (n = 0; n < inputs.length; n++) {
                if (inputs[n].innerHTML.match('# hidd' + 'encode'))
                    inputs[n].style.display = "none";
            }
        } else { 
            $('div.input').show();
            $('button.cell-specific-unhide').remove()
        } 
        code_show = !code_show;
    } 
    
    $( document ).ready(code_toggle);
    
</script>
<form action="javascript:code_toggle()">
    <input type="submit" value="Click here to show/hide source code cells."> <br><br>(you can mark a cell as code with <tt># hiddencode</tt>)
</form>
''')
Out[1]:


(you can mark a cell as code with # hiddencode)
In [2]:
# hiddencode
from __future__ import print_function

%matplotlib inline  
import matplotlib as mpl
try:
    from lxml.cssselect import CSSSelector
except ImportError:
    # terrifying magic for Azure Notebooks
    import os
    if os.getcwd() == "/home/nbuser":
        !pip install cssselect
        from lxml.cssselect import CSSSelector
    else:
        raise

import datetime
import json
import re

from matplotlib import pyplot as plt

date = datetime.date

import taxonomy
#reload(taxonomy)
from taxonomy import Problem, Metric, problems, metrics, measurements, all_attributes, offline, render_tables
from scales import *

Problems, Metrics, and Datasets

Vision


(Imagenet example data)

The simplest vision subproblem is probably image classification, which determines what objects are present in a picture. From 2010-2017, Imagenet has been a closely watched contest for progress in this domain.

Image classification includes not only recognising single things within an image, but localising them and essentially specifying which pixels are which object. MSRC-21 is a metric that is specifically for that task:


(MSRC 21 example data)
In [3]:
from data.vision import *
imagenet.graph()
In [4]:
from data.vision import *
from data.awty import *
In [5]:
for m in sorted(image_classification.metrics, key=lambda m:m.name): 
    if m != imagenet: m.graph()

AWTY, not yet imported:

Handling 'Pascal VOC 2011 comp3' detection_datasets_results.html#50617363616c20564f43203230313120636f6d7033
Skipping 40.6 mAP Fisher and VLAD with FLAIR CVPR 2014
Handling 'Leeds Sport Poses' pose_estimation_datasets_results.html#4c656564732053706f727420506f736573
69.2 %                  Strong Appearance and Expressive Spatial Models for Human Pose Estimation  ICCV 2013
64.3 %                                    Appearance sharing for collective human pose estimation  ACCV 2012
63.3 %                                                   Poselet conditioned pictorial structures  CVPR 2013
60.8 %                                Articulated pose estimation with flexible mixtures-of-parts  CVPR 2011
 55.6%           Pictorial structures revisited: People detection and articulated pose estimation  CVPR 2009
Handling 'Pascal VOC 2007 comp3' detection_datasets_results.html#50617363616c20564f43203230303720636f6d7033
Skipping 22.7 mAP Ensemble of Exemplar-SVMs for Object Detection and Beyond ICCV 2011
Skipping 27.4 mAP Measuring the objectness of image windows PAMI 2012
Skipping 28.7 mAP Automatic discovery of meaningful object parts with latent CRFs CVPR 2010
Skipping 29.0 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 29.6 mAP Latent Hierarchical Structural Learning for Object Detection CVPR 2010
Skipping 32.4 mAP Deformable Part Models with Individual Part Scaling BMVC 2013
Skipping 34.3 mAP Histograms of Sparse Codes for Object Detection CVPR 2013
Skipping 34.3 mAP Boosted local structured HOG-LBP for object localization CVPR 2011
Skipping 34.7 mAP Discriminatively Trained And-Or Tree Models for Object Detection CVPR 2013
Skipping 34.7 mAP Incorporating Structural Alternatives and Sharing into Hierarchy for Multiclass Object Recognition and Detection CVPR 2013
Skipping 34.8 mAP Color Attributes for Object Detection CVPR 2012
Skipping 35.4 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 36.0 mAP Machine Learning Methods for Visual Object Detection archives-ouvertes 2011
Skipping 38.7 mAP Detection Evolution with Multi-Order Contextual Co-occurrence CVPR 2013
Skipping 40.5 mAP Segmentation Driven Object Detection with Fisher Vectors ICCV 2013
Skipping 41.7 mAP Regionlets for Generic Object Detection ICCV 2013
Skipping 43.7 mAP Beyond Bounding-Boxes: Learning Object Shape by Model-Driven Grouping ECCV 2012
Handling 'Pascal VOC 2007 comp4' detection_datasets_results.html#50617363616c20564f43203230303720636f6d7034
Skipping 59.2 mAP Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition ECCV 2014
Skipping 58.5 mAP Rich feature hierarchies for accurate object detection and semantic segmentation CVPR 2014
Skipping 29.0 mAP Multi-Component Models for Object Detection ECCV 2012
Handling 'Pascal VOC 2010 comp3' detection_datasets_results.html#50617363616c20564f43203230313020636f6d7033
Skipping 24.98 mAP Learning Collections of Part Models for Object Recognition CVPR 2013
Skipping 29.4 mAP Discriminatively Trained And-Or Tree Models for Object Detection CVPR 2013
Skipping 33.4 mAP Object Detection with Discriminatively Trained Part Based Models PAMI 2010
Skipping 34.1 mAP Segmentation as selective search for object recognition ICCV 2011
Skipping 35.1 mAP Selective Search for Object Recognition IJCV 2013
Skipping 36.0 mAP Latent Hierarchical Structural Learning for Object  Detection CVPR 2010
Skipping 36.8 mAP Object Detection by Context and Boosted HOG-LBP ECCV 2010
Skipping 38.4 mAP Segmentation Driven Object Detection with Fisher Vectors ICCV 2013
Skipping 39.7 mAP Regionlets for Generic Object Detection ICCV 2013
Skipping 40.4 mAP Fisher and VLAD with FLAIR CVPR 2014
Handling 'Pascal VOC 2010 comp4' detection_datasets_results.html#50617363616c20564f43203230313020636f6d7034
Skipping 53.7 mAP Rich feature hierarchies for accurate object detection and semantic segmentation CVPR 2014
Skipping 40.4 mAP Bottom-up Segmentation for Top-down Detection CVPR 2013
Skipping 33.1 mAP Multi-Component Models for Object Detection ECCV 2012
In [6]:
from IPython.display import HTML
HTML(image_classification.tables())
Out[6]:
CIFAR-10 Image Recognition
Date Algorithm % correct Paper / Source
2011-07-01 Hierarchical Kernel Descriptors 80.0 Object Recognition with Hierarchical Kernel Descriptors
2011-07-01 An Analysis of Single-Layer Networks in Unsupervised Feature Learning 79.6 An Analysis of Single-Layer Networks in Unsupervised Feature Learning
2012-06-16 MCDNN 88.79 Multi-Column Deep Neural Networks for Image Classification
2012-06-26 Local Transformations 82.2 Learning Invariant Representations with Local Transformations
2012-07-03 Improving neural networks by preventing co-adaptation of feature detectors 84.4 Improving neural networks by preventing co-adaptation of feature detectors
2012-12-03 GP EI 90.5 Practical Bayesian Optimization of Machine Learning Algorithms
2012-12-03 DCNN 89.0 ImageNet Classification with Deep Convolutional Neural Networks
2012-12-03 Discriminative Learning of Sum-Product Networks 83.96 Discriminative Learning of Sum-Product Networks
2012-12-03 Learning with Recursive Perceptual Representations 79.7 Learning with Recursive Perceptual Representations
2013-01-16 Stochastic Pooling 84.87 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16 DropConnect 90.68 Regularization of Neural Networks using DropConnect
2013-06-16 Maxout Networks 90.65 Maxout Networks
2013-07-01 Smooth Pooling Regions 80.02 Learning Smooth Pooling Regions for Visual Recognition
2014-04-14 NiN 91.2 Network In Network
2014-04-14 DNN+Probabilistic Maxout 90.61 Improving Deep Neural Networks with Probabilistic Maxout Units
2014-06-21 Nonnegativity Constraints 82.9 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-06-21 PCANet 78.67 PCANet: A Simple Deep Learning Baseline for Image Classification?
2014-07-01 DSN 91.78 Deeply-Supervised Nets
2014-08-28 CKN 82.18 Convolutional Kernel Networks
2014-09-22 SSCNN 93.72 Spatially-sparse convolutional neural networks
2014-12-08 Deep Networks with Internal Selective Attention through Feedback Connections 90.78 Deep Networks with Internal Selective Attention through Feedback Connections
2014-12-08 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks 82.0 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks
2015-02-13 An Analysis of Unsupervised Pre-training in Light of Recent Advances 86.7 An Analysis of Unsupervised Pre-training in Light of Recent Advances
2015-02-15 ACN 95.59 Striving for Simplicity: The All Convolutional Net
2015-02-19 NiN+APL 92.49 Learning Activation Functions to Improve Deep Neural Networks
2015-02-28 Fractional MP 96.53 Fractional Max-Pooling
2015-05-02 Tuned CNN 93.63 Scalable Bayesian Optimization Using Deep Neural Networks
2015-05-13 APAC 89.67 APAC: Augmented PAttern Classification with Neural Networks
2015-05-31 FLSCNN 75.86 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08 RCNN-96 92.91 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12 ReNet 87.65 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01 cifar.torch 92.45 cifar.torch
2015-07-01 MLR DNN 91.88 Multi-Loss Regularized Deep Neural Network
2015-07-01 ELC 91.19 Speeding up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves
2015-07-12 DCNN+GFE 89.14 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-08-16 RReLU 88.8 Empirical Evaluation of Rectified Activations in Convolution Network
2015-09-17 MIM 91.48 ± 0.2 On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05 Tree+Max-Avg pooling 93.95 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-10-11 SWWAE 92.23 Stacked What-Where Auto-encoders
2015-11-09 BNM NiN 93.25 Batch-normalized Maxout Network in Network
2015-11-18 CMsC 93.13 Competitive Multi-scale Convolution
2015-12-07 VDN 92.4 Training Very Deep Networks
2015-12-07 BinaryConnect 91.73 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2015-12-07 Spectral Representations for Convolutional Neural Networks 91.4 Spectral Representations for Convolutional Neural Networks
2015-12-10 DRL 93.57 Deep Residual Learning for Image Recognition
2016-01-04 Fitnet4-LSUV 94.16 All you need is a good init
2016-01-07 Exponential Linear Units 93.45 Fast and Accurate Deep Network Learning by Exponential Linear Units
2016-05-15 Universum Prescription 93.34 Universum Prescription: Regularization using Unlabeled Data
2016-05-20 ResNet-1001 95.38 ± 0.2 Identity Mappings in Deep Residual Networks
2016-07-10 ResNet+ELU 94.38 Deep Residual Networks with Exponential Linear Unit
2017-02-15 Neural Architecture Search 96.35 Neural Architecture Search with Reinforcement Learning
2017-04-22 Evolution ensemble 95.6 Large-Scale Evolution of Image Classifiers
2017-04-22 Evolution 94.6 Large-Scale Evolution of Image Classifiers
2017-05-30 Deep Complex 94.4 Deep Complex Networks
2017-07-16 RL+NT 94.6 Reinforcement Learning for Architecture Search by Network Transformation
2018-02-10 ENAS 97.11 Efficient Neural Architecture Search via Parameter Sharing
CIFAR-100 Image Recognition
Date Algorithm % correct Paper / Source
2012-06-16 Receptive Field Learning 54.23 Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features
2013-01-16 Stochastic Pooling 57.49 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16 Maxout Networks 61.43 Maxout Networks
2013-07-01 Tree Priors 63.15 Discriminative Transfer Learning with Tree-based Priors
2013-07-01 Smooth Pooling Regions 56.29 Smooth Pooling Regions
2014-04-14 NiN 64.32 Network in Network
2014-04-14 DNN+Probabilistic Maxout 61.86 Improving Deep Neural Networks with Probabilistic Maxout Units
2014-06-21 Stable and Efficient Representation Learning with Nonnegativity Constraints 60.8 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-07-01 DSN 65.43 Deeply-Supervised Nets
2014-09-22 SSCNN 75.7 Spatially-sparse convolutional neural networks
2014-12-08 Deep Networks with Internal Selective Attention through Feedback Connections 66.22 Deep Networks with Internal Selective Attention through Feedback Connections
2015-02-15 ACN 66.29 Striving for Simplicity: The All Convolutional Net
2015-02-19 NiN+APL 69.17 Learning Activation Functions to Improve Deep Neural Networks
2015-02-28 Fractional MP 73.61 Fractional Max-Pooling
2015-05-02 Tuned CNN 72.6 Scalable Bayesian Optimization Using Deep Neural Networks
2015-06-08 RCNN-96 68.25 Recurrent Convolutional Neural Network for Object Recognition
2015-07-01 MLR DNN 68.53 Multi-Loss Regularized Deep Neural Network
2015-07-01 HD-CNN 67.38 HD-CNN: Hierarchical Deep Convolutional Neural Network for Large Scale Visual Recognition
2015-07-01 Deep Representation Learning with Target Coding 64.77 Deep Representation Learning with Target Coding
2015-07-12 DCNN+GFE 67.68 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-08-16 RReLU 59.75 Empirical Evaluation of Rectified Activations in Convolution Network
2015-09-17 MIM 70.8 ± 0.2 On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05 Tree+Max-Avg pooling 67.63 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-10-11 SWWAE 69.12 Stacked What-Where Auto-encoders
2015-11-09 BNM NiN 71.14 Batch-normalized Maxout Network in Network
2015-11-18 CMsC 72.44 Competitive Multi-scale Convolution
2015-12-07 Spectral Representations for Convolutional Neural Networks 68.4 Spectral Representations for Convolutional Neural Networks
2015-12-07 VDN 67.76 Training Very Deep Networks
2016-01-04 Fitnet4-LSUV 72.34 All you need is a good init
2016-01-07 Exponential Linear Units 75.72 Fast and Accurate Deep Network Learning by Exponential Linear Units
2016-05-15 Universum Prescription 67.16 Universum Prescription: Regularization using Unlabeled Data
2016-05-20 ResNet-1001 77.29 ± 0.22 Identity Mappings in Deep Residual Networks
2016-07-10 ResNet+ELU 73.45 Deep Residual Networks with Exponential Linear Unit
2017-04-22 Evolution 77.0 Large-Scale Evolution of Image Classifiers
2017-05-30 Deep Complex 72.91 Deep Complex Networks
2017-06-06 NiN+Superclass+CDJ 69.0 Deep Convolutional Decision Jungle for Image Classification
Imagenet Image Recognition
Date Algorithm Error Paper / Source
2010-08-31 NEC UIUC 0.28191 ImageNet Large Scale Visual Recognition Competition 2010 (ILSVRC2010)
2011-10-26 XRCE 0.2577 ImageNet Large Scale Visual Recognition Competition 2011 (ILSVRC2011)
2012-10-13 AlexNet / SuperVision 0.16422 ImageNet Large Scale Visual Recognition Competition 2012 (ILSVRC2012) (algorithm from ImageNet Classification with Deep Convolutional Neural Networks)
2013-11-14 Clarifai 0.11743 ImageNet Large Scale Visual Recognition Competition 2013 (ILSVRC2013)
2014-08-18 VGG 0.07405 ImageNet Large Scale Visual Recognition Competition 2014 (ILSVRC2014)
2015-04-10 withdrawn 0.0458 Deep Image: Scaling up Image Recognition
2015-12-10 MSRA 0.03567 ILSVRC2015 Results
2016-09-26 Trimps-Soushen 0.02991 ILSVRC2016 Results
2017-07-21 SE-ResNet152 / WMW 0.02251 ILSVRC2017 Results
MNIST handwritten digit recognition
Date Algorithm % error Paper / Source
2002-07-01 Shape contexts 0.63 Shape matching and object recognition using shape contexts
2002-07-01 ISVM 0.56 Training Invariant Support Vector Machines
2003-07-01 CNN 1.19 Convolutional Neural Networks
2003-07-01 CNN+Gabor Filters 0.68 Handwritten Digit Recognition using Convolutional Neural Networks and Gabor Filters
2003-07-01 Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis 0.4 Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis
2006-07-01 Reducing the dimensionality of data with neural networks 1.2 Reducing the dimensionality of data with neural networks
2006-07-01 Energy-Based Sparse Represenation 0.39 Efficient Learning of Sparse Representations with an Energy-Based Model
2007-07-01 invariant feature hierarchies 0.62 Unsupervised learning of invariant feature hierarchies with applications to object recognition
2007-07-01 Deformation Models 0.54 Deformation Models for Image Recognition
2007-07-01 Trainable feature extractor 0.54 A trainable feature extractor for handwritten digit recognition
2008-07-01 Deep learning via semi-supervised embedding 1.5 Deep learning via semi-supervised embedding
2008-07-01 DBN 1.12 CS81: Learning words with Deep Belief Networks
2008-07-01 Sparse Coding 0.59 Simple Methods for High-Performance Digit Recognition Based on Sparse Coding
2009-07-01 Deep Boltzmann Machines 0.95 Deep Boltzmann Machines
2009-07-01 Large-Margin kNN 0.94 Large-Margin kNN Classification using a Deep Encoder Network
2009-07-01 CDBN 0.82 Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations
2009-07-01 The Best Multi-Stage Architecture 0.53 What is the Best Multi-Stage Architecture for Object Recognition?
2010-03-01 DBSNN 0.35 Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition
2010-07-01 Supervised Translation-Invariant Sparse Coding 0.84 Supervised Translation-Invariant Sparse Coding
2011-07-01 On Optimization Methods for Deep Learning 0.69 On Optimization Methods for Deep Learning
2012-06-16 Receptive Field Learning 0.64 Beyond Spatial Pyramids: Receptive Field Learning for Pooled Image Features
2012-06-16 MCDNN 0.23 Multi-column Deep Neural Networks for Image Classification
2013-02-28 COSFIRE 0.52 Trainable COSFIRE Filters for Keypoint Detection and Pattern Recognition
2013-06-16 Maxout Networks 0.45 Maxout Networks
2013-06-16 DropConnect 0.21 Regularization of Neural Networks using DropConnect
2013-07-01 Sparse Activity and Sparse Connectivity in Supervised Learning 0.75 Sparse Activity and Sparse Connectivity in Supervised Learning
2014-04-14 NiN 0.47 Network in Network
2014-06-21 PCANet 0.62 PCANet: A Simple Deep Learning Baseline for Image Classification?
2014-07-01 StrongNet 1.1 StrongNet: mostly unsupervised image recognition with strong neurons
2014-07-01 DSN 0.39 Deeply-Supervised Nets
2014-08-28 CKN 0.39 Convolutional Kernel Networks
2015-02-03 Explaining and Harnessing Adversarial Examples 0.78 Explaining and Harnessing Adversarial Examples
2015-02-28 Fractional MP 0.32 Fractional Max-Pooling
2015-03-11 C-SVDDNet 0.35 C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning
2015-04-05 HOPE 0.4 Hybrid Orthogonal Projection and Estimation (HOPE): A New Framework to Probe and Learn Neural Networks
2015-05-13 APAC 0.23 APAC: Augmented PAttern Classification with Neural Networks
2015-05-31 FLSCNN 0.37 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08 RCNN-96 0.31 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12 ReNet 0.45 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01 Deep Fried Convnets 0.71 Deep Fried Convnets
2015-07-01 MLR DNN 0.42 Multi-Loss Regularized Deep Neural Network
2015-07-12 DCNN+GFE 0.46 Deep Convolutional Neural Networks as Generic Feature Extractors
2015-09-17 MIM 0.35 ± 0.03 On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05 Tree+Max-Avg pooling 0.29 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-11-09 BNM NiN 0.24 Batch-normalized Maxout Network in Network
2015-11-18 CMsC 0.33 Competitive Multi-scale Convolution
2015-12-07 BinaryConnect 1.01 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2015-12-07 VDN 0.45 Training Very Deep Networks
2016-01-02 Convolutional Clustering 1.4 Convolutional Clustering for Unsupervised Learning
2016-01-04 Fitnet-LSUV-SVM 0.38 All you need is a good init
MSRC-21 image semantic labelling (per-class)
Date Algorithm % correct Paper / Source
2008-07-01 STF 67.0 Semantic Texton Forests for Image Categorization and Segmentation
2009-07-01 TextonBoost 57.0 TextonBoost for Image Understanding
2010-07-01 HCRF+CO 77.0 Graph Cut based Inference with Co-occurrence Statistics
2010-07-01 Auto-Context 69.0 Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation
2011-07-01 Are Spatial and Global Constraints Really Necessary for Segmentation? 77.0 Are Spatial and Global Constraints Really Necessary for Segmentation?
2011-12-17 FC CRF 78.0 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
2012-06-16 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation 79.0 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation
2012-07-01 Harmony Potentials 80.0 Harmony Potentials - Fusing Local and Global Scale for Semantic Image Segmentation
2012-10-07 Kernelized SSVM/CRF 76.0 Structured Image Segmentation using Kernelized Features
2012-10-07 PMG 72.8 PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer
2013-10-29 MPP 78.2 Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation
2014-07-01 Large FC CRF 80.9 Large-Scale Semantic Co-Labeling of Image Sets
MSRC-21 image semantic labelling (per-pixel)
Date Algorithm % correct Paper / Source
2008-07-01 STF 72.0 Semantic Texton Forests for Image Categorization and Segmentation
2009-07-01 TextonBoost 72.0 TextonBoost for Image Understanding
2010-07-01 HCRF+CO 87.0 Graph Cut based Inference with Co-occurrence Statistics
2010-07-01 Auto-Context 78.0 Auto-Context and Its Application to High-Level Vision Tasks and 3D Brain Image Segmentation
2011-07-01 Are Spatial and Global Constraints Really Necessary for Segmentation? 85.0 Are Spatial and Global Constraints Really Necessary for Segmentation?
2011-12-17 FC CRF 86.0 Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials
2012-06-16 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation 86.0 Describing the Scene as a Whole: Joint Object Detection, Scene Classification and Semantic Segmentation
2012-07-01 Harmony Potentials 83.0 Harmony Potentials - Fusing Local and Global Scale for Semantic Image Segmentation
2012-10-07 Kernelized SSVM/CRF 82.0 Structured Image Segmentation using Kernelized Features
2012-10-07 PatchMatchGraph 79.0 PatchMatchGraph: Building a Graph of Dense Patch Correspondences for Label Transfer
2013-10-29 MPP 85.0 Morphological Proximity Priors: Spatial Relationships for Semantic Segmentation
2014-07-01 Large FC CRF 86.8 Large-Scale Semantic Co-Labeling of Image Sets
STL-10 Image Recognition
Date Algorithm % correct Paper / Source
2011-12-17 Receptive Fields 60.1 Selecting Receptive Fields in Deep Networks
2012-06-26 Invariant Representations with Local Transformations 58.7 Learning Invariant Representations with Local Transformations
2012-07-01 RGB-D Based Object Recognition 64.5 Unsupervised Feature Learning for RGB-D Based Object Recognition
2012-07-01 Simulated Fixations 61.0 Deep Learning of Invariant Features via Simulated Fixations in Video
2012-12-03 Discriminative Learning of Sum-Product Networks 62.3 Discriminative Learning of Sum-Product Networks
2012-12-03 Deep Learning of Invariant Features via Simulated Fixations in Video 56.5 Deep Learning of Invariant Features via Simulated Fixations in Video
2013-01-15 Pooling-Invariant 58.28 Pooling-Invariant Image Feature Learning
2013-07-01 Multi-Task Bayesian Optimization 70.1 Multi-Task Bayesian Optimization
2014-02-24 No more meta-parameter tuning in unsupervised sparse feature learning 61.0 No more meta-parameter tuning in unsupervised sparse feature learning
2014-06-21 Nonnegativity Constraints 67.9 Stable and Efficient Representation Learning with Nonnegativity Constraints
2014-06-23 DFF Committees 68.0 Committees of deep feedforward networks trained with few data
2014-08-28 CKN 62.32 Convolutional Kernel Networks
2014-12-08 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks 72.8 Discriminative Unsupervised Feature Learning with Convolutional Neural Networks
2015-02-13 An Analysis of Unsupervised Pre-training in Light of Recent Advances 70.2 An Analysis of Unsupervised Pre-training in Light of Recent Advances
2015-03-11 C-SVDDNet 68.23 C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning
2015-07-01 Deep Representation Learning with Target Coding 73.15 Deep Representation Learning with Target Coding
2015-10-11 SWWAE 74.33 Stacked What-Where Auto-encoders
2016-01-02 Convolutional Clustering 74.1 Convolutional Clustering for Unsupervised Learning
2016-11-19 CC-GAN² 77.79 ± 0.8 Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks
Street View House Numbers (SVHN)
Date Algorithm % error Paper / Source
2012-07-01 Convolutional neural networks applied to house numbers digit classification 4.9 Convolutional neural networks applied to house numbers digit classification
2013-01-16 Stochastic Pooling 2.8 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
2013-06-16 Maxout 2.47 Maxout Networks
2013-06-16 Regularization of Neural Networks using DropConnect 1.94 Regularization of Neural Networks using DropConnect
2014-04-14 NiN 2.35 Network in Network
2014-04-14 DCNN 2.16 Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
2014-07-01 DSN 1.92 Deeply-Supervised Nets
2015-05-31 FLSCNN 3.96 Enhanced Image Classification With a Fast-Learning Shallow Convolutional Neural Network
2015-06-08 RCNN-96 1.77 Recurrent Convolutional Neural Network for Object Recognition
2015-06-12 ReNet 2.38 ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks
2015-07-01 MLR DNN 1.92 Multi-Loss Regularized Deep Neural Network
2015-09-17 MIM 1.97 ± 0.08 On the Importance of Normalisation Layers in Deep Learning with Piecewise Linear Activation Units
2015-10-05 Tree+Max-Avg pooling 1.69 Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree
2015-11-09 BNM NiN 1.81 Batch-normalized Maxout Network in Network
2015-11-18 CMsC 1.76 Competitive Multi-scale Convolution
2015-12-07 BinaryConnect 2.15 BinaryConnect: Training Deep Neural Networks with binary weights during propagations
2017-05-30 Deep Complex 3.3 Deep Complex Networks

Visual Question Answering

Comprehending an image involves more than recognising what objects or entities are within it, but recognising events, relationships, and context from the image. This problem requires both sophisticated image recognition, language, world-modelling, and "image comprehension". There are several datasets in use. The illustration is from VQA, which was generated by asking Amazon Mechanical Turk workers to propose questions about photos from Microsoft's COCO image collection.

In [7]:
plot = vqa_real_oe.graph(keep=True, title="COCO Visual Question Answersing (VQA) real open ended", llabel="VQA 1.0")
vqa2_real_oe.graph(reuse=plot, llabel="VQA 2.0", fcol="#00a0a0", pcol="#a000a0")
for m in image_comprehension.metrics:
    m.graph() if not m.graphed else None
In [8]:
HTML(image_comprehension.tables())
Out[8]:
COCO Visual Question Answering (VQA) abstract 1.0 multiple choice
Date Algorithm % correct Paper / Source
2016-07-01 Dualnet ensemble 71.18 VQA: Visual Question Answering (algorithm from DualNet: Domain-Invariant Network for Visual Question Answering)
2016-07-01 LSTM + global features 69.21 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01 LSTM blind 61.41 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-09-19 Graph VQA 74.37 Graph-Structured Representations for Visual Question Answering
COCO Visual Question Answering (VQA) abstract images 1.0 open ended
Date Algorithm % correct Paper / Source
2016-07-01 Dualnet ensemble 69.73 VQA: Visual Question Answering (algorithm from DualNet: Domain-Invariant Network for Visual Question Answering)
2016-07-01 LSTM + global features 65.02 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-07-01 LSTM blind 57.19 VQA: Visual Question Answering (algorithm from Yin and Yang: Balancing and Answering Binary Visual Questions)
2016-09-19 Graph VQA 70.42 Graph-Structured Representations for Visual Question Answering
COCO Visual Question Answering (VQA) real images 1.0 multiple choice
Date Algorithm % correct Paper / Source
2015-05-03 LSTM Q+I 63.1 VQA: Visual Question Answering
2015-12-15 iBOWIMG baseline 61.97 Simple Baseline for Visual Question Answering
2016-04-06 FDA 64.2 A Focused Dynamic Attention Model for Visual Question Answering
2016-05-31 HQI+ResNet 66.1 Hierarchical Co-Attention for Visual Question Answering
2016-06-05 MRN 66.33 Multimodal Residual Learning for Visual QA
2016-06-06 MCB 7 att. 70.1 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (source code)
2016-08-06 joint-loss 67.3 Training Recurrent Answering Units with Joint Loss Minimization for VQA
COCO Visual Question Answering (VQA) real images 1.0 open ended
Date Algorithm % correct Paper / Source
2015-05-03 LSTM Q+I 58.2 VQA: Visual Question Answering
2015-12-15 iBOWIMG baseline 55.89 Simple Baseline for Visual Question Answering
2016-01-26 SAN 58.9 Stacked Attention Networks for Image Question Answering
2016-03-09 CNN-RNN 59.5 Image Captioning and Visual Question Answering Based on Attributes and Their Related External Knowledge
2016-03-19 SMem-VQA 58.24 Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering
2016-04-06 FDA 59.5 A Focused Dynamic Attention Model for Visual Question Answering
2016-05-31 HQI+ResNet 62.1 Hierarchical Co-Attention for Visual Question Answering
2016-06-05 MRN + global features 61.84 Multimodal Residual Learning for Visual QA
2016-06-06 MCB 7 att. 66.5 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding (source code)
2016-08-06 joint-loss 63.2 Training Recurrent Answering Units with Joint Loss Minimization for VQA
2017-08-06 N2NMN 64.2 Learning to Reason: End-to-End Module Networks for Visual Question Answering (source code)
COCO Visual Question Answering (VQA) real images 2.0 open ended
Date Algorithm % correct Paper / Source
2016-12-02 MCB 62.27 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (algorithm from Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding)
2016-12-02 d-LSTM+nI 54.22 Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering (algorithm from GitHub - VT-vision-lab/VQA_LSTM_CNN: Train a deeper LSTM and normalized CNN Visual Question Answering model. This current code can get 58.16 on OpenEnded and 63.09 on Multiple-Choice on test-standard.)
2017-07-25 Up-Down 70.34 Bottom-Up and Top-Down Attention for Image Captioning and VQA
2017-07-26 HDU-USYD-UNCC 68.16 VQA: Visual Question Answering
2017-07-26 DLAIT 68.07 VQA: Visual Question Answering
Visual7W
Date Algorithm % correct Paper / Source
2016-06-06 MCB+Att. 62.2 Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
2016-11-30 CMN 72.53 Modeling Relationships in Referential Expressions with Compositional Modular Networks

Game Playing

In principle, games are a sufficiently open-ended framework that all of intelligence could be captured within them. We can imagine a "ladder of games" which grow in sophistication and complexity, from simple strategy and arcade games to others which require very sophisticated language, world-modelling, vision and reasoning ability. At present, published reinforcement agents are climbing the first few rungs of this ladder.

Abstract Strategy Games

As an easier case, abstract games like chess, go, checkers etc can be played with no knowldege of the human world or physics. Although this domain has largely been solved to super-human performance levels, there are a few ends that need to be tied up, especially in terms of having agents learn rules for arbitrary abstract games effectively given various plausible starting points (eg, textual descriptions of the rules or examples of correct play).

In [9]:
from data.strategy_games import *
computer_chess.graph()
In [10]:
HTML(computer_chess.table())
Out[10]:
Computer Chess
Date Algorithm ELO Paper / Source
1984-12-31 Novag Super Constellation 6502 4 MHz 1631 Swedish Chess Computer Association - Wikipedia
1985-12-31 Mephisto Amsterdam 68000 12 MHz 1827 Swedish Chess Computer Association - Wikipedia
1986-12-31 Mephisto Amsterdam 68000 12 MHz 1827 Swedish Chess Computer Association - Wikipedia
1987-12-31 Mephisto Dallas 68020 14 MHz 1923 Swedish Chess Computer Association - Wikipedia
1988-12-31 Mephisto MM 4 Turbo Kit 6502 16 MHz 1993 Swedish Chess Computer Association - Wikipedia
1989-12-31 Mephisto Portorose 68020 12 MHz 2027 Swedish Chess Computer Association - Wikipedia
1990-12-31 Mephisto Portorose 68030 36 MHz 2138 Swedish Chess Computer Association - Wikipedia
1991-12-31 Mephisto Vancouver 68030 36 MHz 2127 Swedish Chess Computer Association - Wikipedia
1992-12-31 Chess Machine Schroder 3.0 ARM2 30 MHz 2174 Swedish Chess Computer Association - Wikipedia
1993-12-31 Mephisto Genius 2.0 486/50-66 MHz 2235 Swedish Chess Computer Association - Wikipedia
1995-12-31 MChess Pro 5.0 Pentium 90 MHz 2306 Swedish Chess Computer Association - Wikipedia
1996-12-31 Rebel 8.0 Pentium 90 MHz 2337 Swedish Chess Computer Association - Wikipedia
1997-05-11 Deep Blue 2725 ± 25 What was Deep Blue's Elo rating? - Quora
1997-12-31 HIARCS 6.0 49MB P200 MMX 2418 Swedish Chess Computer Association - Wikipedia
1998-12-31 Fritz 5.0 PB29% 67MB P200 MMX 2460 Swedish Chess Computer Association - Wikipedia
1999-12-31 Chess Tiger 12.0 DOS 128MB K6-2 450 MHz 2594 Swedish Chess Computer Association - Wikipedia
2000-12-31 Fritz 6.0 128MB K6-2 450 MHz 2607 Swedish Chess Computer Association - Wikipedia
2001-12-31 Chess Tiger 14.0 CB 256MB Athlon 1200 2709 Swedish Chess Computer Association - Wikipedia
2002-12-31 Deep Fritz 7.0 256MB Athlon 1200 MHz 2759 Swedish Chess Computer Association - Wikipedia
2003-12-31 Shredder 7.04 UCI 256MB Athlon 1200 MHz 2791 Swedish Chess Computer Association - Wikipedia
2004-12-31 Shredder 8.0 CB 256MB Athlon 1200 MHz 2800 Swedish Chess Computer Association - Wikipedia
2005-12-31 Shredder 9.0 UCI 256MB Athlon 1200 MHz 2808 Swedish Chess Computer Association - Wikipedia
2006-05-27 Rybka 1.1 64bit 2995 ± 25 CCRL 40/40 - Complete list
2006-12-31 Rybka 1.2 256MB Athlon 1200 MHz 2902 Swedish Chess Computer Association - Wikipedia
2007-12-31 Rybka 2.3.1 Arena 256MB Athlon 1200 MHz 2935 Swedish Chess Computer Association - Wikipedia
2008-12-31 Deep Rybka 3 2GB Q6600 2.4 GHz 3238 Swedish Chess Computer Association - Wikipedia
2009-12-31 Deep Rybka 3 2GB Q6600 2.4 GHz 3232 Swedish Chess Computer Association - Wikipedia
2010-08-07 Rybka 4 64bit 3269 ± 22 CCRL 40/40 - Complete list
2010-12-31 Deep Rybka 3 2GB Q6600 2.4 GHz 3227 Swedish Chess Computer Association - Wikipedia
2011-12-31 Deep Rybka 4 2GB Q6600 2.4 GHz 3216 Swedish Chess Computer Association - Wikipedia
2012-12-31 Deep Rybka 4 x64 2GB Q6600 2.4 GHz 3221 Swedish Chess Computer Association - Wikipedia
2013-07-20 Houdini 3 64bit 3248 ± 16 Wayback Machine
2013-12-31 Komodo 5.1 MP x64 2GB Q6600 2.4 GHz 3241 Swedish Chess Computer Association - Wikipedia
2014-12-31 Komodo 7.0 MP x64 2GB Q6600 2.4 GHz 3295 Swedish Chess Computer Association - Wikipedia
2015-07-04 Komodo 9 3332 ± 24 CCRL 40/40 - Complete list
2015-12-31 Stockfish 6 MP x64 2GB Q6600 2.4 GHz 3334 Swedish Chess Computer Association - Wikipedia
2016-12-31 Komodo 9.1 MP x64 2GB Q6600 2.4 GHz 3366 Swedish Chess Computer Association - Wikipedia
2017-02-27 Stockfish 3393 ± 50 CCRL 40/40 - Index

Real-time video games

Computer and video games are a very open-ended domain. It is possible that some existing or future games could be so elaborate that they are "AI complete". In the mean time, a lot of interesting progress is likely in exploring the "ladder of games" of increasing complexity on various fronts.

Atari 2600

Atari 2600 games have been a popular target for reinforcement learning, especially at DeepMind and OpenAI. RL agents now play most but not all of these games better than humans.

In the Atari 2600 data, the label "noop" indicates the game was played with a random number, up to 30, of "no-op" moves at the beginning, while the "hs" label indicates that the starting condition was a state sampled from 100 games played by expert human players. These forms of randomisation give RL systems a diversity of game states to learn from.

In [11]:
from data.video_games import *
from scrapers.atari import *
simple_games.graphs()
In [12]:
HTML(simple_games.tables())
Out[12]:
Atari 2600 Alien
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 103.2 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 939.2 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 3069.0 ± 1093.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 813.5 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 1620.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 634.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 4461.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 3747.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 1486.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 1033.4 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 823.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 4203.8 Prioritized Experience Replay
2016-01-06 Prior hs 1334.7 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 3213.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 3941.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 945.3 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 518.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 182.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 994.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 3166.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Amidar
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 183.6 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 103.4 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 739.5 ± 3024.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 189.2 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 978.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 178.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 2354.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 1793.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 172.7 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 238.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 169.1 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 1838.9 Prioritized Experience Replay
2016-01-06 Prior hs 129.1 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 782.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 2296.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 283.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 263.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 173.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 112.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 1735.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Assault
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 537.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 628.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 3359.0 ± 775.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 1195.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 4280.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 3489.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 5393.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 4621.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 3994.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 10950.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 6060.8 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 7672.1 Prioritized Experience Replay
2016-01-06 Prior hs 6548.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 9011.6 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 11477.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 14497.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 5474.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 3746.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1673.9 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 7203.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Asterix
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 1332.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 987.3 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 6012.0 ± 1744.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 3324.7 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 4359.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 3170.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 28188.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 17356.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 15840.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 364200.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 16837.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 31527.0 Prioritized Experience Replay
2016-01-06 Prior hs 22484.5 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 18919.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 375080.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 22140.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 17244.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 6723.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1440.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 406211.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Asteroids
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 89.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 907.3 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 1629.0 ± 542.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 933.6 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 1458.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 1364.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 2837.7 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 2035.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 734.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 DDQN (tuned) hs 1193.2 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 1021.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 2654.3 Prioritized Experience Replay
2016-01-06 Prior hs 1745.1 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 2869.3 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 1192.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 5093.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 4474.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 3009.4 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1562.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 1516.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Atlantis
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 852.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 62687.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 85641.0 ± 17600.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 629166.5 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 292491.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 279987.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 445360.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 382572.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 106056.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 423252.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 319688.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 357324.0 Prioritized Experience Replay
2016-01-06 Prior hs 330647.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 340076.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 395762.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 911091.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 875822.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 772392.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1267410.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 841075.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Bank Heist
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 67.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 190.8 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 429.7 ± 650.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 399.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 455.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 312.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 1611.9 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 1129.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 1030.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 1004.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 886.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 1054.6 Prioritized Experience Replay
2016-01-06 Prior hs 876.6 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 1103.3 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 1503.1 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 970.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 946.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 932.8 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 225.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 976.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Battle Zone
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 16.2 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 15820.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 26300.0 ± 7725.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 19938.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 29900.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 23750.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 37150.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 31700.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 31320.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 30650.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 24740.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 31530.0 Prioritized Experience Replay
2016-01-06 Prior hs 25520.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 8220.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 35520.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 20760.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 12950.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 11340.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 16600.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 28742.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Beam Rider
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 1743.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 929.4 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 5184 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 6846.0 ± 1619.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 3822.1 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 9743.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 8627.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 14591.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 13772.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 12164.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 37412.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 17417.2 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 31181.3 Prioritized Experience Replay
2016-01-06 Prior noop 23384.2 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 8299.4 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 30276.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 24622.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 22707.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 13235.9 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 744.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 14074.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Berzerk
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop 585.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 493.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 1472.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 1225.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 910.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 2178.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 1011.1 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 1305.6 Prioritized Experience Replay
2016-01-06 Prior hs 865.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 1199.6 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 3409.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 1433.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 862.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 817.9 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 686.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 1645.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Bowling
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 36.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 43.9 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 42.4 ± 88.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 54.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 56.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 50.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 68.1 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 65.7 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 65.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 69.6 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 50.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior hs 52.0 Prioritized Experience Replay
2016-01-06 Prior noop 47.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 102.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 46.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 41.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 36.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 35.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 30.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 81.8 A Distributional Perspective on Reinforcement Learning
Atari 2600 Boxing
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 9.8 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 44.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 71.8 ± 8.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 74.2 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 88.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 70.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 99.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 91.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 77.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 79.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 73.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 95.6 Prioritized Experience Replay
2016-01-06 Prior hs 72.3 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 99.3 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 98.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 59.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 37.3 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 33.7 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 49.8 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 97.8 A Distributional Perspective on Reinforcement Learning
Atari 2600 Breakout
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 6.1 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 5.2 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 225 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 401.2 ± 26.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 313.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 385.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 354.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 418.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 411.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 345.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 368.9 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 354.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 373.9 Prioritized Experience Replay
2016-01-06 Prior hs 343.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 344.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 366.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 766.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 681.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 551.6 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 9.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 748.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Centipede
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 4647.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 8803.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 8309.0 ± 5237.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 6296.9 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 4657.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 3973.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 7561.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 5409.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 4881.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 5570.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 3853.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 4463.2 Prioritized Experience Replay
2016-01-06 Prior hs 3489.1 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 49065.8 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 7687.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 3755.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 3306.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 1997.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 7783.9 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 9646.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Chopper Command
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 16.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1582.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 6687.0 ± 2916.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 3191.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 6126.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 5017.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 11215.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 5809.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 3784.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 8058.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 3495.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 8600.0 Prioritized Experience Replay
2016-01-06 Prior hs 4635.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 775.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 13185.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 10150.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 7021.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 4669.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 3710.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 15600.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Crazy Climber
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 149.8 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 23411.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 114103.0 ± 22797.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 65451.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 110763.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 98128.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 143570.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 124566.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 117282.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 127853.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 113782.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 141161.0 Prioritized Experience Replay
2016-01-06 Prior hs 127512.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 119679.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 162224.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 138518.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 112646.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 101624.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 26430.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 179877.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Defender
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop 23633.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 15917.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 42214.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 35338.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 33996.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 34415.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 27510.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 31286.5 Prioritized Experience Replay
2016-01-06 Prior hs 23666.5 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 11099.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 41324.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 233021.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 56533.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 36242.5 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 47092.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Demon Attack
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 0.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 520.5 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 9711.0 ± 2406.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 14880.1 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 12550.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 12149.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 60813.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 58044.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 56322.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 73371.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 69803.4 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 71846.4 Prioritized Experience Replay
2016-01-06 Prior hs 61277.5 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 63644.9 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 72878.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 115201.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 113308.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 84997.5 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1166.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 130955.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Double Dunk
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA -16.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear -13.1 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN -18.1 ± 2.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila -11.3 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs -6.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop -6.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 0.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs -0.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -5.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 DDQN (tuned) hs -0.3 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs -10.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 18.5 Prioritized Experience Replay
2016-01-06 Prior hs 16.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop -11.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop -12.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 0.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 0.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs -0.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 0.2 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 2.5 A Distributional Perspective on Reinforcement Learning
Atari 2600 Enduro
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 159.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 129.1 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 661 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 301.8 ± 24.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 71.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 729.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 626.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 2258.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 2077.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 1211.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 2223.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 1216.6 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 2093.0 Prioritized Experience Replay
2016-01-06 Prior hs 1831.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 2002.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 2306.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs -82.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs -82.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs -82.5 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 95.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 3454.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Fishing Derby
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA -85.1 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear -89.5 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN -0.8 ± 19.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 4.6 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs -1.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop -4.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 46.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 15.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs -4.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 17.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 3.2 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 39.5 Prioritized Experience Replay
2016-01-06 Prior hs 9.8 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 45.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 41.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 22.6 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 18.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 13.6 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop -49.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 8.9 A Distributional Perspective on Reinforcement Learning
Atari 2600 Freeway
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 19.7 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 19.1 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 30.3 Human-level control through deep reinforcement learning
2015-07-15 Gorila 10.2 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-10 MP-EB 27.0 Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
2015-09-22 DQN noop 30.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 26.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 33.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 0.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 28.8 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 28.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 33.7 Prioritized Experience Replay
2016-01-06 Prior hs 28.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 33.4 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 33.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 0.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 0.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 0.1 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS 30.48 Unifying Count-Based Exploration and Intrinsic Motivation
2016-12-13 TRPO-hash 34.0 Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
2017-03-03 DQN-CTS 33.0 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-PixelCNN 31.7 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 31.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-06-25 Sarsa-ε 29.9 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-06-25 Sarsa-φ-EB 0.0 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-07-21 C51 noop 33.9 A Distributional Perspective on Reinforcement Learning
Atari 2600 Frostbite
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 180.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 216.9 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 328.3 ± 250.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 426.6 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-10 MP-EB 507.0 Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
2015-09-22 DQN noop 797.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 496.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 4672.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 2332.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 1683.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 4038.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 1448.1 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 4380.1 Prioritized Experience Replay
2016-01-06 Prior hs 3510.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 3469.6 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 7413.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 197.6 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 190.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 180.1 Asynchronous Methods for Deep Reinforcement Learning
2016-12-13 TRPO-hash 5214.0 Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 370.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-06-25 Sarsa-φ-EB 2770.1 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-06-25 Sarsa-ε 1394.3 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-07-21 C51 noop 3965.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Gopher
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 2368.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1288.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 8520.0 ± 3279.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 4373.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 8777.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 8190.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 20051.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 15718.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 14840.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 105148.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 15253.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 34858.8 Prioritized Experience Replay
2016-01-06 Prior noop 32487.2 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 56218.2 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 104368.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 17106.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 10022.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 8442.8 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 582.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 33641.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Gravitar
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 429.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 387.7 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 306.7 ± 223.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 538.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 473.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 298.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 588.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 412.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 297.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 200.5 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 167.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 548.5 Prioritized Experience Replay
2016-01-06 Prior hs 269.5 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 483.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 238.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 320.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 303.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 269.5 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS 238.68 Unifying Count-Based Exploration and Intrinsic Motivation
2017-03-03 DQN-PixelCNN 498.3 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-CTS 238.0 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 805.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 440.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 HERO
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 7295.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 6459.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 19950.0 ± 158.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 8963.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 20437.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 14992.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 20818.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 20130.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 15207.9 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 15459.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 14892.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 23037.7 Prioritized Experience Replay
2016-01-06 Prior hs 20889.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 14225.2 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 21036.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 32464.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 28889.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 28765.8 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 38874.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Ice Hockey
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA -3.2 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear -9.5 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN -1.6 ± 2.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila -1.7 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs -1.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop -1.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 0.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs -1.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -2.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 0.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs -2.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 1.3 Prioritized Experience Replay
2016-01-06 Prior hs -0.2 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop -4.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop -0.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs -1.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs -2.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs -4.7 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop -4.1 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop -3.5 A Distributional Perspective on Reinforcement Learning
Atari 2600 James Bond
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 354.1 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 202.8 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 576.7 ± 175.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 444.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 768.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 697.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 1358.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 1312.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 835.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 585.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 573.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 5148.0 Prioritized Experience Replay
2016-01-06 Prior hs 3961.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 507.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 812.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 613.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 541.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 351.5 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 1909.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Kangaroo
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 8.8 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1622.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 6740.0 ± 2959.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 1431.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 7259.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4496.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 14854.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 12992.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 10334.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 11204.0 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 861.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 16200.0 Prioritized Experience Replay
2016-01-06 Prior hs 12185.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 13150.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 1792.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 125.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 106.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 94.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 11200.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 12853.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Krull
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 3341.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 3372.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 3805.0 ± 1033.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 6363.1 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 8422.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 6206.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 11451.9 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 8051.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 7920.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 7658.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 6796.1 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 9728.0 Prioritized Experience Replay
2016-01-06 Prior hs 6872.8 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 9745.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 10374.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 8066.6 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 5911.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 5560.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 8647.2 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 9735.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Kung-Fu Master
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 29151.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 19544.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 23270.0 ± 5955.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 20620.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 26059.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 20882.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 34294.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 29710.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 24288.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 37484.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 30207.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 39581.0 Prioritized Experience Replay
2016-01-06 Prior hs 31676.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 34393.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 48375.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 40835.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 28819.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 3046.0 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 48192.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Montezuma's Revenge
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 259.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 10.7 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 0.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 84.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-10 MP-EB 142.0 Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
2015-09-22 DQN hs 47.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 0.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 22.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 42.0 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 24.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior hs 51.0 Prioritized Experience Replay
2016-01-06 Prior noop 0.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 0.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 67.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 53.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 41.0 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 DDQN-PC 3459.0 Unifying Count-Based Exploration and Intrinsic Motivation
2016-08-22 A3C-CTS 273.7 Unifying Count-Based Exploration and Intrinsic Motivation
2016-12-13 TRPO-hash 75.0 Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
2017-03-03 DQN-PixelCNN 3705.5 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-CTS 0.0 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 0.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-06-25 Sarsa-φ-EB 2745.4 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-06-25 Sarsa-ε 399.5 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-07-21 C51 noop 0.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Ms. Pacman
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 1227.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1692.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 2311.0 ± 525.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 1263.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 3085.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 1092.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 6283.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 2711.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 2250.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 1241.3 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 1007.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 6518.7 Prioritized Experience Replay
2016-01-06 Prior hs 1865.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 4963.8 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 3327.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 850.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 653.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 594.4 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 3415.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Name This Game
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 2247.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 2500.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 7257.0 ± 547.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 9238.5 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 8207.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 6738.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 11971.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 11185.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 10616.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 13637.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 8960.3 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 12270.5 Prioritized Experience Replay
2016-01-06 Prior hs 10497.6 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 15851.2 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 15572.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 12093.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 10476.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 5614.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 4503.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 12542.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Phoenix
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop 8485.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 7484.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 23092.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 20410.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 12252.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 63597.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 12366.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 18992.7 Prioritized Experience Replay
2016-01-06 Prior hs 16903.6 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 6202.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 70324.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 74786.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 52894.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 28181.8 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 4041.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 17490.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Pitfall!
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN hs -113.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop -286.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -29.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs -46.9 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs -186.7 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs -243.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop -356.5 Prioritized Experience Replay
2016-01-06 Prior hs -427.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop -2.6 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs -78.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs -123.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs -135.7 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS -259.09 Unifying Count-Based Exploration and Intrinsic Motivation
2017-03-03 DQN-CTS 0.0 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-PixelCNN 0.0 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 0.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 0.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Pong
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA -17.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear -19.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 21 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 18.9 ± 1.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 16.7 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 19.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 18.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 21.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 20.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 18.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 19.1 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 18.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 20.6 Prioritized Experience Replay
2016-01-06 Prior hs 18.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 20.6 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 20.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 11.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 10.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 5.6 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 21.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 20.9 A Distributional Perspective on Reinforcement Learning
Atari 2600 Private Eye
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 86.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 684.3 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 1788.0 ± 5473.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 2598.6 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN hs 207.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop 146.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 292.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 129.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 103.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 1277.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs -575.5 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 670.7 Prioritized Experience Replay
2016-01-06 Prior noop 200.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 286.7 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 206.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 421.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 206.9 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 194.4 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS 99.32 Unifying Count-Based Exploration and Intrinsic Motivation
2017-03-03 DQN-PixelCNN 8358.7 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-CTS 206.0 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 100.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 15095.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Q*Bert
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 960.3 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 613.5 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 4500 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 10596.0 ± 3294.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 7089.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-10 MP-EB 15805.0 Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
2015-09-22 DQN noop 13117.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 9271.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 19220.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 15088.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 14175.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 14063.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 11020.8 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 16256.5 Prioritized Experience Replay
2016-01-06 Prior hs 9944.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 5236.8 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 18760.3 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 21307.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 15148.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 13752.3 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 147.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-06-25 Sarsa-φ-EB 4111.8 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-06-25 Sarsa-ε 3895.3 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-07-21 C51 noop 23784.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 River Raid
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 2650.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1904.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 8316.0 ± 1049.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 5310.3 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 7377.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4748.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 21162.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 16569.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 14884.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 16496.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 10838.4 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 14522.3 Prioritized Experience Replay
2016-01-06 Prior hs 11807.2 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 12530.8 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 20607.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 12201.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 10001.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 6591.9 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 5009.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 17322.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Road Runner
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 89.1 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 67.7 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 18257.0 ± 4268.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 43079.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 39544.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 35215.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 69524.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 58549.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 44127.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 54630.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 43156.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 57608.0 Prioritized Experience Replay
2016-01-06 Prior hs 52264.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 47770.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 62151.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 73949.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 34216.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 31769.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 16590.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 55839.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Robotank
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 12.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 28.7 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 51.6 ± 4.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 61.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 63.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 58.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 65.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 65.1 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 62.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 59.1 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 24.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 62.6 Prioritized Experience Replay
2016-01-06 Prior hs 56.2 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 64.3 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 27.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 32.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 2.6 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 2.3 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 11.9 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 52.3 A Distributional Perspective on Reinforcement Learning
Atari 2600 Seaquest
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 675.5 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 664.8 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 1740 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 5286.0 ± 1310.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 10145.9 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 5860.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4216.7 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 50254.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 37361.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 16452.7 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 DDQN (tuned) hs 14498.0 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 1431.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 26357.8 Prioritized Experience Replay
2016-01-06 Prior hs 25463.7 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 10932.3 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 931.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 2355.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 2300.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 1326.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1390.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 266434.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Skiing
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN hs -12142.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN noop -13062.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop -8857.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -9021.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs -11928.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs -11490.4 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs -18955.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop -9996.9 Prioritized Experience Replay
2016-01-06 Prior hs -10169.1 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop -13585.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop -19949.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs -10911.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs -13700.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs -14863.8 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop -15442.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop -13901.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Solaris
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop 3482.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 1295.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 3067.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 2250.8 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 1768.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 810.0 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 280.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 4309.0 Prioritized Experience Replay
2016-01-06 Prior hs 2272.8 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 4544.8 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 133.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 1956.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 1936.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 1884.8 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS 2270.15 Unifying Count-Based Exploration and Intrinsic Motivation
2017-03-03 DQN-PixelCNN 2863.6 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-CTS 133.4 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 2090.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 8342.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Space Invaders
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 267.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 250.1 The Arcade Learning Environment: An Evaluation Platform for General Agents
2013-12-19 DQN best 1075 Playing Atari with Deep Reinforcement Learning
2015-02-26 Nature DQN 1976.0 ± 893.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 1183.3 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 1692.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 1293.8 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 6427.3 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 5993.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 2525.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 8978.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 2628.7 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 3912.1 Prioritized Experience Replay
2016-01-06 Prior noop 2865.8 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 2589.7 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 15311.5 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 23846.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 15730.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 2214.7 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 678.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 5747.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Star Gunner
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 9.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1070.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 57997.0 ± 3152.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 14919.2 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 54282.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 52970.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel hs 90804.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 89238.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 60142.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 127073.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 58365.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 63302.0 Prioritized Experience Replay
2016-01-06 Prior hs 61582.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 589.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 125117.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 164766.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 138218.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 64393.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 1470.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 49095.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Surround
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop -5.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs -6.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 4.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 4.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -2.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 DDQN (tuned) hs 1.9 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs -0.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 8.9 Prioritized Experience Replay
2016-01-06 Prior hs 5.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop -2.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 1.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs -8.3 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs -9.6 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs -9.7 Asynchronous Methods for Deep Reinforcement Learning
2017-07-21 C51 noop 6.8 A Distributional Perspective on Reinforcement Learning
Atari 2600 Tennis
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 0.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear -0.1 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN -2.5 ± 1.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila -0.7 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 12.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 11.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 5.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 4.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop -22.8 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 DDQN (tuned) hs -7.8 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs -13.2 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 0.0 Prioritized Experience Replay
2016-01-06 Prior hs -5.3 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 12.1 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 0.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs -6.3 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs -6.4 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs -10.2 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop -4.5 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 23.1 A Distributional Perspective on Reinforcement Learning
Atari 2600 Time Pilot
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 24.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 3741.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 5947.0 ± 1600.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 8267.8 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 4870.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4786.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 11666.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 8339.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 6601.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 DDQN (tuned) hs 6608.0 Deep Reinforcement Learning with Double Q-learning
2015-12-08 Prior+Duel hs 4871.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2016-01-06 Prior noop 9197.0 Prioritized Experience Replay
2016-01-06 Prior hs 5963.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 4870.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 7553.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 27202.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 12679.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 5825.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 4970.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 8329.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Tutankham
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 98.2 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 114.3 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 186.7 ± 41.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 118.5 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 68.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 45.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 218.4 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel noop 211.4 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 48.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 108.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 92.2 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 204.6 Prioritized Experience Replay
2016-01-06 Prior hs 56.9 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 183.9 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 245.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 156.3 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 144.2 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 26.1 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 130.3 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 280.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Up and Down
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 2449.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 3533.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 8456.0 ± 3162.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 8747.7 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 9989.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 8038.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 44939.6 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 24759.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 22972.2 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 22681.3 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 19086.9 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 16154.1 Prioritized Experience Replay
2016-01-06 Prior hs 12157.4 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 22474.4 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 33879.1 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 105728.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 74705.7 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 54525.4 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 67974.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 15612.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Venture
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 0.6 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 66.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 380.0 ± 238.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 523.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-10 MP-EB 0.0 Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models
2015-09-22 DQN noop 163.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 136.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 497.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 200.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 98.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 29.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 21.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 94.0 Prioritized Experience Replay
2016-01-06 Prior noop 54.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 1172.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 48.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 25.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 23.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 19.0 Asynchronous Methods for Deep Reinforcement Learning
2016-08-22 A3C-CTS 0.0 Unifying Count-Based Exploration and Intrinsic Motivation
2016-12-13 TRPO-hash 445.0 Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
2017-03-03 DQN-PixelCNN 82.2 Count-Based Exploration with Neural Density Models
2017-03-03 DQN-CTS 48.0 Count-Based Exploration with Neural Density Models
2017-03-10 ES FF (1 hour) noop 760.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-06-25 Sarsa-φ-EB 1169.2 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-06-25 Sarsa-ε 0.0 Count-Based Exploration in Feature Space for Reinforcement Learning
2017-07-21 C51 noop 1520.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Video Pinball
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 19761.0 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 16871.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 42684.0 ± 16287.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 112093.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 196760.4 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 154414.1 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 DDQN (tuned) noop 309941.9 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 110976.2 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel noop 98209.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 447408.6 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 367823.7 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 295972.8 Prioritized Experience Replay
2016-01-06 Prior noop 282007.3 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 56287.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 479197.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 470310.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 331628.1 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 185852.6 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 22834.8 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 949604.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Wizard of Wor
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 36.9 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 1981.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 3393.0 ± 2019.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 10431.0 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 2704.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 1609.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 7855.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 7492.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-11-20 Duel hs 7054.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-12-08 Prior+Duel hs 10471.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 6201.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior hs 5727.0 Prioritized Experience Replay
2016-01-06 Prior noop 4802.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 483.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 12352.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C LSTM hs 18082.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 17244.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 5278.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 3480.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 9300.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Yars Revenge
Date Algorithm Raw Score Paper / Source
2015-09-22 DQN noop 18098.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4577.5 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 49622.1 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 25976.5 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 11712.6 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 58145.9 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 6270.6 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 11357.0 Prioritized Experience Replay
2016-01-06 Prior hs 4687.4 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 21409.5 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 69618.1 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF (1 day) hs 7270.8 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF hs 7157.5 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 5615.5 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 16401.7 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 35050.0 A Distributional Perspective on Reinforcement Learning
Atari 2600 Zaxxon
Date Algorithm Raw Score Paper / Source
2012-07-14 SARSA 21.4 Investigating Contingency Awareness Using Atari 2600 Games
2012-07-19 Best linear 3365.0 The Arcade Learning Environment: An Evaluation Platform for General Agents
2015-02-26 Nature DQN 4977.0 ± 1235.0 Human-level control through deep reinforcement learning
2015-07-15 Gorila 6159.4 Massively Parallel Methods for Deep Reinforcement Learning
2015-09-22 DQN noop 5363.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-09-22 DQN hs 4412.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Human-level control through deep reinforcement learning)
2015-11-20 Duel noop 12944.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 Duel hs 10164.0 Dueling Network Architectures for Deep Reinforcement Learning
2015-11-20 DDQN (tuned) noop 10163.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Deep Reinforcement Learning with Double Q-learning)
2015-12-08 Prior+Duel hs 11320.0 Deep Reinforcement Learning with Double Q-learning (algorithm from Prioritized Experience Replay)
2015-12-08 DDQN (tuned) hs 8593.0 Deep Reinforcement Learning with Double Q-learning
2016-01-06 Prior noop 10469.0 Prioritized Experience Replay
2016-01-06 Prior hs 9474.0 Prioritized Experience Replay
2016-02-24 DDQN+Pop-Art noop 14402.0 Learning functions across many orders of magnitudes
2016-04-05 Prior+Duel noop 13886.0 Dueling Network Architectures for Deep Reinforcement Learning (algorithm from Prioritized Experience Replay)
2016-04-10 A3C FF hs 24622.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C LSTM hs 23519.0 Asynchronous Methods for Deep Reinforcement Learning
2016-04-10 A3C FF (1 day) hs 2659.0 Asynchronous Methods for Deep Reinforcement Learning
2017-03-10 ES FF (1 hour) noop 6380.0 Evolution Strategies as a Scalable Alternative to Reinforcement Learning
2017-07-21 C51 noop 10513.0 A Distributional Perspective on Reinforcement Learning

Speech recognition

In [13]:
from data.acoustics import *
from data.wer import *
speech_recognition.graphs()
In [14]:
HTML(speech_recognition.tables())
Out[14]:
Word error rate on Switchboard trained against the Hub5'00 dataset
Date Algorithm % error Paper / Source
2011-08-31 CD-DNN 16.1 https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/CD-DNN-HMM-SWB-Interspeech2011-Pub.pdf
2012-04-27 DNN-HMM 18.5 https://pdfs.semanticscholar.org/ce25/00257fda92338ec0a117bea1dbc0381d7c73.pdf?_ga=1.195375081.452266805.1483390947
2013-08-23 HMM-DNN +sMBR 12.6 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2013-08-23 CNN 11.5 http://www.cs.toronto.edu/~asamir/papers/icassp13_cnn.pdf
2013-08-25 DNN MMI 12.9 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2013-08-25 DNN MPE 12.9 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2013-08-25 DNN BMMI 12.9 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2013-08-25 DNN sMBR 12.6 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2014-06-23 DNN + Dropout 15 Building DNN Acoustic Models for Large Vocabulary Speech Recognition
2014-06-30 DNN 16 Increasing Deep Neural Network Acoustic Model Size for Large Vocabulary Continuous Speech Recognition
2014-08-23 CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN 10.4 http://www.mirlab.org/conference_papers/International_Conference/ICASSP%202014/papers/p5609-soltau.pdf
2014-12-07 Deep Speech 20 Deep Speech: Scaling up end-to-end speech recognition
2014-12-07 Deep Speech + FSH 12.6 Deep Speech: Scaling up end-to-end speech recognition
2014-12-23 CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB 12.6 Deep Speech: Scaling up end-to-end speech recognition
2015-05-21 IBM 2015 8.0 The IBM 2015 English Conversational Telephone Speech Recognition System
2015-08-23 HMM-TDNN + pNorm + speed up/down speech 12.9 http://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
2015-08-23 HMM-TDNN + iVectors 11 http://speak.clsp.jhu.edu/uploads/publications/papers/1048_pdf.pdf
2015-09-23 Deep CNN (10 conv, 4 FC layers), multi-scale feature maps 12.2 Very Deep Multilingual Convolutional Neural Networks for LVCSR
2016-04-27 IBM 2016 6.9 The IBM 2016 English Conversational Telephone Speech Recognition System
2016-06-23 RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model 6.6 The IBM 2016 English Conversational Telephone Speech Recognition System
2016-09-23 HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) 9.2 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
2016-09-23 HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher 8.5 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
2016-09-23 VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast 6.3 The Microsoft 2016 Conversational Speech Recognition System
2016-10-17 CNN-LSTM 6.6 Achieving Human Parity in Conversational Speech Recognition
2016-12-17 Microsoft 2016b 5.8 Achieving Human Parity in Conversational Speech Recognition
2017-02-17 RNNLM 6.9 The Microsoft 2016 Conversational Speech Recognition System
2017-02-17 Microsoft 2016 6.2 The Microsoft 2016 Conversational Speech Recognition System
2017-03-23 ResNet + BiLSTMs acoustic model, with 40d FMLLR + i-Vector inputs, trained on SWB+Fisher+CH, n-gram + model-M + LSTM + Strided ( trous) convs-based LM trained on Switchboard+Fisher+Gigaword+Broadcast 5.5 English Conversational Telephone Speech Recognition by Humans and Machines
chime clean
Date Algorithm % error Paper / Source
2014-12-23 CNN + Bi-RNN + CTC (speech to letters) 6.3 Deep Speech: Scaling up end-to-end speech recognition
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters 3.34 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
chime real
Date Algorithm % error Paper / Source
2014-12-23 CNN + Bi-RNN + CTC (speech to letters) 67.94 Deep Speech: Scaling up end-to-end speech recognition
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters 21.79 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
fisher WER
Date Algorithm % error Paper / Source
2016-09-23 HMM-TDNNtrained with MMI + data augmentation (speed) + iVectors + 3 regularizations + SWBD 9.8 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
2016-09-23 HMM-BLSTMtrained with MMI + data augmentation (speed) + iVectors + 3 regularizations + SWBD 9.6 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
librispeech WER testclean
Date Algorithm % error Paper / Source
2015-08-23 HMM-(SAT)GMM 8.01 Kaldi ASR
2015-08-23 HMM-DNN + pNorm* 5.51 http://www.danielpovey.com/files/2015_icassp_librispeech.pdf
2015-08-23 HMM-TDNN + iVectors 4.83 http://speak.clsp.jhu.edu/uploads/publications/papers/1048_pdf.pdf
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters trained on 11940h 5.33 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
2016-09-23 HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations 4.28 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
librispeech WER testother
Date Algorithm % error Paper / Source
2015-08-23 HMM-(SAT)GMM 22.49 Kaldi ASR
2015-08-23 HMM-DNN + pNorm* 13.97 http://www.danielpovey.com/files/2015_icassp_librispeech.pdf
2015-08-23 TDNN + pNorm + speed up/down speech 12.51 http://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters trained on 11940h 13.25 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
swb_hub_500 WER fullSWBCH
Date Algorithm % error Paper / Source
2013-08-23 HMM-DNN +sMBR 18.4 http://www.danielpovey.com/files/2013_interspeech_dnn.pdf
2014-06-23 DNN + Dropout 19.1 Building DNN Acoustic Models for Large Vocabulary Speech Recognition
2014-12-23 CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB 16 Deep Speech: Scaling up end-to-end speech recognition
2015-08-23 HMM-TDNN + pNorm + speed up/down speech 19.3 http://www.danielpovey.com/files/2015_interspeech_augmentation.pdf
2015-08-23 HMM-TDNN + iVectors 17.1 http://speak.clsp.jhu.edu/uploads/publications/papers/1048_pdf.pdf
2016-06-23 RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model 12.2 The IBM 2016 English Conversational Telephone Speech Recognition System
2016-09-23 HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) 13.3 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
2016-09-23 HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher 13 http://www.danielpovey.com/files/2016_interspeech_mmi.pdf
2016-09-23 VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast 11.9 The Microsoft 2016 Conversational Speech Recognition System
2017-03-23 ResNet + BiLSTMs acoustic model, with 40d FMLLR + i-Vector inputs, trained on SWB+Fisher+CH, n-gram + model-M + LSTM + Strided ( trous) convs-based LM trained on Switchboard+Fisher+Gigaword+Broadcast 10.3 English Conversational Telephone Speech Recognition by Humans and Machines
timit PER
Date Algorithm % error Paper / Source
2009-08-23 (first, modern) HMM-DBN 23 http://www.cs.toronto.edu/~asamir/papers/NIPS09.pdf
2013-03-23 Bi-LSTM + skip connections w/ CTC 17.7 Speech Recognition with Deep Recurrent Neural Networks
2014-08-23 CNN in time and frequency + dropout, 17.6% w/o dropout 16.7 http://www.inf.u-szeged.hu/~tothl/pubs/ICASSP2014.pdf
2015-06-23 Bi-RNN + Attention 17.6 Attention-Based Models for Speech Recognition
2015-09-23 Hierarchical maxout CNN + Dropout 16.5
2016-03-23 RNN-CRF on 24(x3) MFSC 17.3 Segmental Recurrent Neural Networks for End-to-end Speech Recognition
wsj WER eval92
Date Algorithm % error Paper / Source
2014-08-23 CNN over RAW speech (wav) 5.6 http://infoscience.epfl.ch/record/203464/files/Palaz_Idiap-RR-18-2014.pdf
2015-04-23 TC-DNN-BLSTM-DNN 3.47 Deep Recurrent Neural Networks for Acoustic Modelling
2015-08-23 test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm* 3.63 http://www.danielpovey.com/files/2015_icassp_librispeech.pdf
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters 3.6 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
wsj WER eval93
Date Algorithm % error Paper / Source
2015-08-23 test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm* 5.66 http://www.danielpovey.com/files/2015_icassp_librispeech.pdf
2015-12-23 9-layer model w/ 2 layers of 2D-invariant convolution & 7 recurrent layers, w/ 68M parameters 4.98 Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

Music Information Retrieval

Instrumentals recognition

Instrumentals recognition in a representative musical dataset for Instrumentals playlist generation.

  • Experiments tested on SATIN database from Bayle et al. (2017).
  • The ratio Instrumentals/Songs (11%/89%) of SATIN is representative of real uneven musical databases.
  • The human performance is at 99% of correct instrumentals detection because there are known examples of possible confusion.
In [15]:
from data.acoustics import *
instrumentals_recognition.graphs()
In [16]:
HTML(instrumentals_recognition.tables())
Out[16]:
Precision of Instrumentals detection reached when tested on SATIN (Bayle et al. 2017)
Date Algorithm % correct Paper / Source
2013-10-17 Ghosal et al. 17.3 A hierarchical approach for speech-instrumental-song classification | SpringerLink
2014-09-30 VQMM 29.8 On Evaluation Validity in Music Autotagging
2014-09-30 SVMBFF 12.5 On Evaluation Validity in Music Autotagging
2017-06-23 Bayle et al. 82.5 Revisiting Autotagging Toward Faultless Instrumental Playlists Generation

Image Generation

In [17]:
from data.generative import *
image_generation_metric.graph()
In [18]:
HTML(image_generation_metric.table())
Out[18]:
Generative models of CIFAR-10 images
Date Algorithm Model
Entropy
Paper / Source
2014-10-30 NICE 4.48 NICE: Non-linear Independent Components Estimation
2015-02-16 DRAW 4.13 DRAW: A Recurrent Neural Network For Image Generation
2016-05-27 PixelRNN 3.0 Density estimation using Real NVP
2016-05-27 Real NVP 3.49 Density estimation using Real NVP
2016-06-15 VAE with IAF 3.11 Improved Variational Inference with Inverse Autoregressive Flow
2016-11-04 PixelCNN++ 2.92 Forum | OpenReview (source code)

Language Modelling and Comprehension

Predicting the next work in a document (which is also the central algorithmic problem in text compression) is one way to see how well machine learning systems are able to model human language. Shannon's classic 1951 paper obtained an expiermental measure of human text compression performance at 0.6 - 1.3 bits per character: humans know, better than classic algorithms, what word is likely to come next in a piece of writing. More recent work (Moradi 1998, Cover 1978) provides estimates that are text-relative and in the 1.3 bits per character (and for some texts, much higher) range.

In [19]:
from data.language import *
ptperplexity.graph()
In [20]:
HTML(ptperplexity.table())
Out[20]:
Penn Treebank (Perplexity when parsing English sentences)
Date Algorithm Perplexity Paper / Source
2012-04-07 KN5+RNNME ensemble 78.8 http://www.fit.vutbr.cz/~imikolov/rnnlm/google.pdf
2012-04-07 KN5+cache baseline 125.7 http://www.fit.vutbr.cz/~imikolov/rnnlm/google.pdf
2012-07-27 RNN-LDA+all 74.1 https://www.microsoft.com/en-us/research/wp-content/uploads/2012/07/rnn_ctxt_TR.sav_.pdf
2012-07-27 RNN-LDA ensemble 80.1 https://www.microsoft.com/en-us/research/wp-content/uploads/2012/07/rnn_ctxt_TR.sav_.pdf
2012-07-27 RNN-LDA LM+KN5+cache 92.0 https://www.microsoft.com/en-us/research/wp-content/uploads/2012/07/rnn_ctxt_TR.sav_.pdf
2012-07-27 RNN-LDA LM 113.7 https://www.microsoft.com/en-us/research/wp-content/uploads/2012/07/rnn_ctxt_TR.sav_.pdf
2012-07-27 RNNLM 124.7 https://www.microsoft.com/en-us/research/wp-content/uploads/2012/07/rnn_ctxt_TR.sav_.pdf
2013-12-20 Deep RNN 107.5 How to Construct Deep Recurrent Neural Networks
2014-09-08 RNN Dropout Regularization 68.7 Recurrent Neural Network Regularization
2016-08-11 RHN 71.3 Recurrent Highway Networks
2016-09-26 Pointer Sentinel-LSTM 70.9 Pointer Sentinel Mixture Models
2016-10-05 Variational LSTM 73.4 A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
2016-10-27 RHN+WT 66 Recurrent Highway Networks
2016-10-27 RHN 68.5 Recurrent Highway Networks
2017-02-15 Neural Architecture Search 62.4 Neural Architecture Search with Reinforcement Learning
2017-03-03 RHN+WT 65.4 Recurrent Highway Networks
2018-02-10 ENAS 55.8 Efficient Neural Architecture Search via Parameter Sharing
2018-03-02 AWD-LSTM-MOS+de 47.67 Breaking the Softmax Bottleneck: A High-Rank RNN Language Model
2018-09-18 FRAGE 46.54 FRAGE: Frequency-Agnostic Word Representation
2019-02-14 GPT2 (zero shot) 35.76 Language Models are Unsupervised Multitask Learners
In [21]:
hp_compression.graph()
In [22]:
HTML(hp_compression.table())
Out[22]:
Hutter Prize (bits per character to encode English text)
Date Algorithm Model
Entropy
Paper / Source
2011-06-28 RNN 1.6 http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf
2013-08-04 RNN, LSTM 1.67 Generating Sequences With Recurrent Neural Networks
2015-02-15 Gated Feedback RNN 1.58 Gated Feedback Recurrent Neural Networks
2015-07-06 Grid LSTM 1.47 Grid Long Short-Term Memory
2016-07-12 Recurrent Highway Networks 1.32 Recurrent Highway Networks
2016-08-11 RHN 1.42 Recurrent Highway Networks
2016-09-06 Hierarchical Multiscale RNN 1.32 Hierarchical Multiscale Recurrent Neural Networks
2016-09-27 Hypernetworks 1.39 HyperNetworks
2016-10-19 Surprisal-Driven Feedback RNN 1.37 Surprisal-Driven Feedback in Recurrent Networks
2016-10-31 Surprisal-Driven Zoneout 1.313 https://pdfs.semanticscholar.org/e9bc/83f9ff502bec9cffb750468f76fdfcf5dd05.pdf
2017-01-06 ByteNet 1.31 Neural Machine Translation in Linear Time
2017-03-03 Large RHN depth 10 1.27 Recurrent Highway Networks

LAMBADA is a challenging language modelling dataset in which the model has to predict a next word in a discourse, when that exact word has not occurred in the test. For instance, given a context like this:

He shook his head, took a step back and held his hands up as he tried to smile without losing a cigarette. “Yes you can,” Julia said in a reassuring voice. “I’ve already focused on my friend. You just have to click the shutter, on top, here.”

And a target sentence:

He nodded sheepishly, through his cigarette away and took the _________.

The task is to guess the target word "camera".

In [23]:
lambada.graph()
In [25]:
# Also consider adding the Microsoft Sentence Completion Challenge; see eg http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf table 7.4

Translation

Translation is a tricky problem to score, since ultimately it is human comprehension or judgement that determines whether a translation is accurate. Google for instance uses human evaluation to determine when their algorithms have improved. But that kind of measurement is expensive and difficult to replicate accurately, so automated scoring metrics are also widely used in the field. Perhaps the most common of these are BLEU scores for corpora that have extensive professional human translations, which forms the basis for the measurements included here:

In [26]:
from data.language import *
en_fr_bleu.graph()
plot = en_de_bleu.graph(keep=True, title="En-De Traslation BLEU scores", llabel="news-test-2014")
en_de_bleu15.graph(reuse=plot, llabel="news-test-2015", fcol="#00a0a0", pcol="#a000a0", suppress_target=True)
en_ro_bleu.graph()
In [27]:
HTML(translation.tables())
Out[27]:
news-test-2014 En-De BLEU
Date Algorithm BLEU Paper / Source
2014-02-24 PBMT 20.7 Edinburgh’s phrase-based machine translation systems for WMT-14
2016-07-14 NSE-NSE 17.93 Neural Semantic Encoders
2016-07-23 Deep-Att 20.7 Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
2016-09-26 GNMT+RL 26.3 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2017-01-06 ByteNet 23.75 Neural Machine Translation in Linear Time
2017-01-23 MoE 2048 26.03 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017-05-12 ConvS2S (ensemble) 26.36 Convolutional Sequence to Sequence Learning
2017-05-12 ConvS2S 25.16 Convolutional Sequence to Sequence Learning
2017-06-12 SliceNet 26.1 Depthwise Separable Convolutions for Neural Machine Translation
2017-06-21 Transformer (big) 28.4 Attention Is All You Need
2017-06-21 Transformer 27.3 Attention Is All You Need
2017-09-25 Transformer+BR-CSGAN 27.9 Improving Neural Machine Translation with Conditional Sequence Generative Adversarial Nets
2018-09-18 FRAGE 28.36 FRAGE: Frequency-Agnostic Word Representation
news-test-2014 En-Fr BLEU
Date Algorithm BLEU Paper / Source
2014-02-24 PBMT 37 Edinburgh’s phrase-based machine translation systems for WMT-14
2014-09-01 RNN-search50* 36.15 Neural Machine Translation by Jointly Learning to Align and Translate
2014-09-10 SMT+LSTM5 36.5 Sequence to Sequence Learning with Neural Networks
2014-09-10 LSTM 34.81 Sequence to Sequence Learning with Neural Networks (algorithm from http://www.bioinf.jku.at/publications/older/2604.pdf)
2014-10-30 LSTM6 + PosUnk 37.5 Addressing the Rare Word Problem in Neural Machine Translation
2015-07-11 RNNsearch-50 28.45 Neural Machine Translation by Jointly Learning to Align and Translate
2016-07-23 Deep-Att + PosUnk 39.2 Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
2016-09-26 GNMT+RL 39.92 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
2017-01-23 MoE 2048 40.56 Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
2017-05-12 ConvS2S (ensemble) 41.29 Convolutional Sequence to Sequence Learning
2017-05-12 ConvS2S 40.46 Convolutional Sequence to Sequence Learning
2017-06-21 Transformer (big) 41 Attention Is All You Need
2017-06-21 Transformer 38.1 Attention Is All You Need
news-test-2015 En-De BLEU
Date Algorithm BLEU Paper / Source
2015-09-17 S2Tree+5gram NPLM 24.1 Edinburgh's Syntax-Based Systems at WMT 2015
2016-03-19 Enc-Dec Att (char) 23.45 A Character-level Decoder without Explicit Segmentation for Neural Machine Translation
2016-03-19 Enc-Dec Att (BPE) 21.72 A Character-level Decoder without Explicit Segmentation for Neural Machine Translation
2017-01-06 ByteNet 26.26 Neural Machine Translation in Linear Time
news-test-2016 En-Ro BLEU
Date Algorithm BLEU Paper / Source
2016-07-11 GRU BPE90k 28.9 The QT21/HimL Combined Machine Translation System
2017-05-12 ConvS2S BPE40k 29.88 Convolutional Sequence to Sequence Learning

Conversation: Chatbots & Conversational Agents

Conversation is the classic AI progress measure! There is the Turing test, which involves a human judge trying to tell the difference between a humand and computer that they are chatting to online, and also easier variants of the Turing test in which the judge limits themselves to more casual, less probing conversation in various ways.

The Loebner Prize is an annual event that runs a somewhat easier version of the test. Since 2014, the event has also been giving standard-form tests to their entrants, and scoring the results (each question gets a plausible/semi-plausible/implausible rating). This metric is not stable, because the test questions have to change every year, they are somewhat indicative of progress. Ideally the event might apply each year's test questions to the most successful entrants from prior years. Here is an example from 2016:

In [28]:
loebner.graph()
In [29]:
HTML(loebner.table())
Out[29]:
The Loebner Prize scored selection answers
Date Algorithm % correct Paper / Source
2014-11-15 Rose 2014 89.2 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2014-11-15 Izar 2014 88.3 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2014-11-15 Misuku 2014 88.3 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2014-11-15 Uberbot 2014 81.67 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2014-11-15 Tutor 2014 80.83 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2014-11-15 The Professor 2014 76.7 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2015-09-19 Mitsuku 2015 83.3 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2015-09-19 Lisa 2015 80 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2015-09-19 Izar 2015 76.7 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2015-09-19 Rose 2015 75 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2016-09-17 Mitsuku 2016 90 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2016-09-17 Tutor 2016 78.3 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2016-09-17 Rose 2016 77.5 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2016-09-17 Arckon 2016 77.5 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize
2016-09-17 Katie 2016 76.7 AISB - The Society for the Study of Artificial Intelligence and Simulation of Behaviour - Loebner Prize

Reading Comprehension

The Facebook BABI 20 QA dataset is an example of a basic reading comprehension task. It has been solved with large training datasets (10,000 examples per task) but not with a smaller training dataset of 1,000 examples for each of the 20 categories of tasks. It involves learning to answer simple reasoning questions like these:

There are numerous other reading comprehension metrics that are in various ways harder than bAbi 20 QA. They are generally not solved, though progress is fairly promising.

In [30]:
for m in reading_comprehension.metrics: m.graphed = False
plot = bAbi1k.graph(keep=True, title="bAbi 20 QA reading comprehension", llabel="1k training examples")
bAbi10k.graph(reuse=plot, llabel="10k training examples", fcol="#00a0a0", pcol="#a000a0", suppress_target=True)

Another reading comprehension dataset that has received significant recent attention is the Stanford Question Answering Dataset (SQuAD). The literature reports both F1 scores and exact match scores, though these are closely correlated:

In [31]:
plot = squad_f1.graph(keep=True, title="Stanford Question Answering Dataset (SQuAD)", tcol="g", llabel="F1 score")
squad_em.graph(reuse=plot, llabel="Exact Match (EM)", fcol="#00a0a0", pcol="#a000a0", tcol="#00a0a0")
In [32]:
for m in reading_comprehension.metrics:
    if not m.graphed: m.graph()
In [33]:
HTML(reading_comprehension.tables())
Out[33]:
CNN Comprehension test
Date Algorithm % correct Paper / Source
2015-06-10 Impatient reader 63.8 Teaching Machines to Read and Comprehend
2015-06-10 Attentive reader 63.0 Teaching Machines to Read and Comprehend
2016-03-04 AS reader (avg) 75.4 Text Understanding with the Attention Sum Reader Network
2016-03-04 AS reader (greedy) 74.8 Text Understanding with the Attention Sum Reader Network
2016-06-05 GA reader 77.4 Gated-Attention Readers for Text Comprehension
2016-06-07 AIA 75.7 Iterative Alternating Neural Attention for Machine Reading
2016-06-07 EpiReader 74.0 Natural Language Comprehension with the EpiReader
2016-07-17 CAS reader 70.0 Consensus Attention-based Neural Networks for Chinese Reading Comprehension
2016-08-04 AoA reader 74.4 Attention-over-Attention Neural Networks for Reading Comprehension
2016-08-08 Attentive+relabling+ensemble 77.6 A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
2016-09-17 ReasoNet 74.7 ReasoNet: Learning to Stop Reading in Machine Comprehension
2016-11-09 AIA 76.1 Iterative Alternating Neural Attention for Machine Reading
2016-12-01 GA update L(w) 77.9 Gated-Attention Readers for Text Comprehension
2017-03-07 GA+MAGE (32) 78.6 Linguistic Knowledge as Memory for Recurrent Neural Networks
2017-10-07 DIM Reader 74.4 DIM Reader: Dual Interaction Mode for Machine Comprehension
Daily Mail Comprehension test
Date Algorithm % correct Paper / Source
2015-06-10 Attentive reader 69.0 Teaching Machines to Read and Comprehend
2015-06-10 Impatient reader 68.0 Teaching Machines to Read and Comprehend
2016-03-04 AS reader (greedy) 77.7 Text Understanding with the Attention Sum Reader Network
2016-03-04 AS reader (avg) 77.1 Text Understanding with the Attention Sum Reader Network
2016-06-05 GA reader 78.1 Gated-Attention Readers for Text Comprehension
2016-08-08 Attentive+relabling+ensemble 79.2 A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task
2016-09-17 ReasoNet 76.6 ReasoNet: Learning to Stop Reading in Machine Comprehension
2016-12-01 GA update L(w) 80.9 Gated-Attention Readers for Text Comprehension
Reading comprehension MCTest-160-all
Date Algorithm % correct Paper / Source
2013-10-01 SW+D+RTE 69.16 MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
2015-07-26 Wang-et-al 75.27 A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
2015-07-26 Narasimhan-model3 73.27 Machine Comprehension with Discourse Relations
2016-03-29 Parallel-Hierarchical 74.58 A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
Reading comprehension MCTest-500-all
Date Algorithm % correct Paper / Source
2013-10-01 SW+D+RTE 63.33 MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
2015-07-26 Wang-et-al 69.94 A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
2015-07-26 LSSVM 67.83 Learning Answer-Entailing Structures for Machine Comprehension
2015-07-26 Narasimhan-model3 63.75 Machine Comprehension with Discourse Relations
2016-03-29 Parallel-Hierarchical 71.0 A Parallel-Hierarchical Model for Machine Comprehension on Sparse Data
Stanford Question Answering Dataset EM test
Date Algorithm Score Paper / Source
2016-11-04 Dynamic Coattention Networks (ensemble) 71.625 Dynamic Coattention Networks For Question Answering
2016-11-04 Dynamic Coattention Networks (single model) 66.233 Dynamic Coattention Networks For Question Answering
2016-11-07 Match-LSTM+Ans-Ptr 67.901 Machine Comprehension Using Match-LSTM and Answer Pointer
2016-11-29 BiDAF (single model) 68.478 Bidirectional Attention Flow for Machine Comprehension
2016-12-13 MPM (ensemble) 73.765 Multi-Perspective Context Matching for Machine Comprehension
2016-12-13 MPM (single model) 70.387 Multi-Perspective Context Matching for Machine Comprehension
2016-12-29 FastQAExt 70.849 Making Neural QA as Simple as Possible but not Simpler
2016-12-29 FastQA 68.436 Making Neural QA as Simple as Possible but not Simpler
2017-02-24 BiDAF (ensemble) 73.744 Bidirectional Attention Flow for Machine Comprehension
2017-03-08 r-net (ensemble) 76.922 https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf
2017-03-08 r-net (single model) 74.614 https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf
2017-03-31 Document Reader (single model) 70.733 Reading Wikipedia to Answer Open-Domain Questions
2017-04-20 SEDT+BiDAF (ensemble) 73.723 Structural Embedding of Syntactic Trees for Machine Comprehension
2017-04-20 SEDT+BiDAF (single model) 68.478 Structural Embedding of Syntactic Trees for Machine Comprehension
2017-04-24 Ruminating Reader (single model) 70.639 Ruminating Reader: Reasoning with Gated Multi-Hop Attention
2017-05-08 Mnemonic reader (ensemble) 73.754 Mnemonic Reader for Machine Comprehension
2017-05-08 Mnemonic reader (single model) 69.863 Mnemonic Reader for Machine Comprehension
2017-05-31 jNet (ensemble) 73.01 Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering
2017-05-31 RaSoR (single model) 70.849 Learning Recurrent Span Representations for Extractive Question Answering
2017-05-31 jNet (single model) 70.607 Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering
2017-06-20 ReasoNet ensemble 73.4 ReasoNet: Learning to Stop Reading in Machine Comprehension
2017-07-28 MEMEN 75.37 MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
2017-08-16 DCN+ (ensemble) 78.706 The Stanford Question Answering Dataset
2017-08-21 RMR (ensemble) 77.678 Mnemonic Reader for Machine Comprehension
2017-09-20 AIR-FusionNet (ensemble) 78.842 The Stanford Question Answering Dataset
2018-10-11 BERT+TriviaQA 87.4 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Stanford Question Answering Dataset F1 test
Date Algorithm Score Paper / Source
2016-11-04 Dynamic Coattention Networks (ensemble) 80.383 Dynamic Coattention Networks For Question Answering
2016-11-04 Dynamic Coattention Networks (single model) 75.896 Dynamic Coattention Networks For Question Answering
2016-11-07 Match-LSTM+Ans-Ptr 77.022 Machine Comprehension Using Match-LSTM and Answer Pointer
2016-11-29 BiDAF (single model) 77.971 Bidirectional Attention Flow for Machine Comprehension
2016-12-13 MPM (ensemble) 81.257 Multi-Perspective Context Matching for Machine Comprehension
2016-12-13 MPM (single model) 78.784 Multi-Perspective Context Matching for Machine Comprehension
2016-12-29 FastQAExt 78.857 Making Neural QA as Simple as Possible but not Simpler
2016-12-29 FastQA 77.07 Making Neural QA as Simple as Possible but not Simpler
2017-02-24 BiDAF (ensemble) 81.525 Bidirectional Attention Flow for Machine Comprehension
2017-03-08 r-net (ensemble) 84.006 https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf
2017-03-08 r-net (single model) 82.458 https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/r-net.pdf
2017-03-31 Document Reader (single model) 79.353 Reading Wikipedia to Answer Open-Domain Questions
2017-04-20 SEDT+BiDAF (ensemble) 81.53 Structural Embedding of Syntactic Trees for Machine Comprehension
2017-04-20 SEDT+BiDAF (single model) 77.971 Structural Embedding of Syntactic Trees for Machine Comprehension
2017-04-24 Ruminating Reader (single model) 79.821 Ruminating Reader: Reasoning with Gated Multi-Hop Attention
2017-05-08 Mnemonic reader (ensemble) 81.863 Mnemonic Reader for Machine Comprehension
2017-05-08 Mnemonic reader (single model) 79.207 Mnemonic Reader for Machine Comprehension
2017-05-31 jNet (ensemble) 81.517 Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering
2017-05-31 jNet (single model) 79.456 Exploring Question Understanding and Adaptation in Neural-Network-Based Question Answering
2017-05-31 RaSoR (single model) 78.741 Learning Recurrent Span Representations for Extractive Question Answering
2017-06-20 ReasoNet ensemble 82.9 ReasoNet: Learning to Stop Reading in Machine Comprehension
2017-07-28 MEMEN 82.66 MEMEN: Multi-layer Embedding with Memory Networks for Machine Comprehension
2017-08-16 DCN+ (ensemble) 85.619 The Stanford Question Answering Dataset
2017-08-21 RMR (ensemble) 84.888 Mnemonic Reader for Machine Comprehension
2017-09-20 AIR-FusionNet (ensemble) 85.936 The Stanford Question Answering Dataset
2018-10-11 BERT+TriviaQA 93.2 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
bAbi 20 QA (10k training examples)
Date Algorithm % correct Paper / Source
2015-02-19 MemNN-AM+NG+NL (1k + strong supervision) 93.3 Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks
2015-03-31 MemN2N-PE+LS+RN 93.4 End-To-End Memory Networks
2016-01-05 DNC 96.2 https://www.gwern.net/docs/2016-graves.pdf
2016-06-30 DMN+ 97.2 Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes
2016-09-27 SDNC 97.1 Query-Reduction Networks for Question Answering
2016-12-09 QRN 99.7 Query-Reduction Networks for Question Answering
2016-12-12 EntNet 99.5 Tracking the World State with Recurrent Entity Networks
bAbi 20 QA (1k training examples)
Date Algorithm % correct Paper / Source
2015-03-31 MemN2N-PE+LS+RN 86.1 End-To-End Memory Networks
2015-06-24 DMN 93.6 Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
2016-12-09 QRN 90.1 Query-Reduction Networks for Question Answering
2016-12-09 DMN+ 66.8 Query-Reduction Networks for Question Answering (source code) (algorithm from Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes)
2016-12-12 EntNet 89.1 Tracking the World State with Recurrent Entity Networks
2017-03-07 GA+MAGE (16) 91.3 Linguistic Knowledge as Memory for Recurrent Neural Networks
bAbi Children's Book comprehension CBtest CN
Date Algorithm % correct Paper / Source
2016-03-04 AS reader (avg) 68.9 Text Understanding with the Attention Sum Reader Network
2016-03-04 AS reader (greedy) 67.5 Text Understanding with the Attention Sum Reader Network
2016-06-05 GA reader 69.4 Gated-Attention Readers for Text Comprehension
2016-06-07 EpiReader 67.4 Natural Language Comprehension with the EpiReader
2016-07-17 CAS reader 65.7 Consensus Attention-based Neural Networks for Chinese Reading Comprehension
2016-08-04 AoA reader 69.4 Attention-over-Attention Neural Networks for Reading Comprehension
2016-12-01 NSE 71.9 Gated-Attention Readers for Text Comprehension (algorithm from Neural Semantic Encoders)
2016-12-01 GA +feature, fix L(w) 70.7 Gated-Attention Readers for Text Comprehension
2017-10-07 DIM Reader 70.0 DIM Reader: Dual Interaction Mode for Machine Comprehension
2019-02-14 GPT2 (zero shot) 93.3 Language Models are Unsupervised Multitask Learners
bAbi Children's Book comprehension CBtest NE
Date Algorithm % correct Paper / Source
2016-03-04 AS reader (greedy) 71.0 Text Understanding with the Attention Sum Reader Network
2016-03-04 AS reader (avg) 70.6 Text Understanding with the Attention Sum Reader Network
2016-06-05 GA reader 71.9 Gated-Attention Readers for Text Comprehension
2016-06-07 AIA 72.0 Iterative Alternating Neural Attention for Machine Reading
2016-06-07 AIA 71.0 Iterative Alternating Neural Attention for Machine Reading
2016-06-07 EpiReader 69.7 Natural Language Comprehension with the EpiReader
2016-07-17 CAS reader 69.2 Consensus Attention-based Neural Networks for Chinese Reading Comprehension
2016-08-04 AoA reader 72.0 Attention-over-Attention Neural Networks for Reading Comprehension
2016-12-01 GA +feature, fix L(w) 74.9 Gated-Attention Readers for Text Comprehension
2016-12-01 NSE 73.2 Gated-Attention Readers for Text Comprehension (algorithm from Neural Semantic Encoders)
2017-10-07 DIM Reader 72.2 DIM Reader: Dual Interaction Mode for Machine Comprehension
2018-10-05 AttSum-Feat+L2 79.4 Entity Tracking Improves Cloze-style Reading Comprehension
2019-02-14 GPT2 (zero shot) 89.05 Language Models are Unsupervised Multitask Learners

Scientific and Technical capabilities

Arguably reading and understanding scientific, technical, engineering and medical documents would be taxonomically related to general reading comprehension, but these technical tasks are probably much more difficult, and will certainly be solved with separate efforts. So we classify them separately for now. We also classify some of these problems as superintelligent, because only a tiny fraction of humans can read STEM papers, and only a miniscule fraction of humans are capable of reasonably comprehending STEM papers across a large range of fields.

In [34]:
from data.stem import *
Example Magic The Gathering (MTG) and Hearthstone (HS) cards Corresponding MTG card implementation in Java

Generating computer programs from specifications

A particularly interesting technical problem, which may be slightly harder than problems with very clear constraints like circuit design, is generating computer programs from natural language specifications (which will often contain ambiguities of various sorts). This is presently a very unsolved problem, though there is now at least one good metric / dataset for it, which is [Deepmind's "card2code" dataset](https://github.com/deepmind/card2code) of Magic the Gathering and Hearthstone cards, along with Java and Python implementations (respectively) of the logic on the cards. Shown below is a figure from [_Ling, et al. 2016_](https://arxiv.org/abs/1603.06744v1) with their Latent Predictor Networks generating part of the code output for a Hearthstone card:
In [35]:
card2code_hs_acc.graph()
In [36]:
HTML(card2code_hs_acc.table())

Answering Science Exam Questions

Science exam question answering is a multifaceted task that pushes the limits of artificial intelligence. As indicated by the example questions pictured, successful science exam QA requires natural language understanding, reasoning, situational modeling, and commonsense knowledge; a challenge problem for which information-retrieval methods alone are not sufficient to earn a "passing" grade.

The AI2 Science Questions dataset provided by the Allen Institute for Artificial Intelligence (AI2) is a freely available collection of 5,059 real science exam questions derived from a variety of regional and state science exams. Project Aristo at AI2 is focused on the task of science question answering – the Aristo system is composed of a suite of various knowledge extraction methods, diagram processing tools, and solvers. As a reference point, the system currently achieves the following scores on these sets of non-diagram multiple choice (NDMC) and diagram multiple choice (DMC) science questions at two different grade levels. Allen Institute staff claim these states of the art for Aristo [Scores are listed as "Subset (Train/Dev/Test)"]:

  • Elementary NDMC (63.2/60.2/61.3)
  • Elementary DMC (41.8/41.3/36.3)
  • Middle School NDMC (55.5/57.6/57.9)
  • Middle School DMC (38.4/35.3/34.3)

Another science question answering dataset that has been studied in the literature is based specifically on New York Regents 4th grade science exam tests:

In [37]:
ny_4_science.graph()

Learning to Learn

Generalisation and Transfer Learning

ML systems are making strong progress at solving specific problems with sufficient training data. But we know that humans are capable of transfer learning -- applying things they've learned from one context, with appropriate variation, to another context. Humans are also very general; rather than just being taught to perform specific tasks, a single agent is able to do a very wide range of tasks, learning new things or not as required by the situation.

In [39]:
generalisation = Problem("Building systems that solve a wide range of diverse problems, rather than just specific ones")
generalisation.metric("Solve all other solved problems in this document, with a single system", solved=False)

transfer_learning = Problem("Transfer learning: apply relevant knowledge from a prior setting to a new slightly different one")
arcade_transfer = Problem("Transfer of learning within simple arcade game paradigms")

generalisation.add_subproblem(transfer_learning)
transfer_learning.add_subproblem(arcade_transfer)

# These will need to be specified a bit more clearly to be proper metrics, eg "play galaga well having trained on Xenon 2" or whatever
# the literature has settled on
# arcade_transfer.metric("Transfer learning of platform games")
# arcade_transfer.metric("Transfer learning of vertical shooter games")
# arcade_transfer.metric("Transfer from a few arcade games to all of them")

one_shot_learning = Problem("One shot learning: ingest important truths from a single example", ["agi", "world-modelling"])

uncertain_prediction = Problem("Correctly identify when an answer to a classification problem is uncertain")
uncertain_prediction.notes = "Humans can usually tell when they don't know something. Present ML classifiers do not have this ability."

interleaved_learning = Problem("Learn a several tasks without undermining performance on a first task, avoiding catastrophic forgetting", url="https://arxiv.org/abs/1612.00796")

Safety and Security Problems

The notion of "safety" for AI and ML systems can encompass many things. In some cases it's about ensuring that the system meets various sorts of constraints, either in general or for specifically safety-critical purposes, such as correct detection of pedestrians for self driving cars.

"Adversarial Examples" and manipulation of ML classifiers

In [40]:
adversarial_examples = Problem("Resistance to adversarial examples", ["safety", "agi", "security"], url="https://arxiv.org/abs/1312.6199")

adversarial_examples.notes = """
We know that humans have significant resistance to adversarial examples.  Although methods like camouflage sometimes
work to fool us into thinking one thing is another, those
"""

Safety of Reinforcement Learning Agents and similar systems

In [41]:
# This section is essentially on teaching ML systems ethics and morality. Amodei et al call this "scaleable supervision".
scalable_supervision = Problem("Scalable supervision of a learning system", ["safety", "agi"], url="https://arxiv.org/abs/1606.06565")
cirl = Problem("Cooperative inverse reinforcement learning of objective functions", ["safety", "agi"], url="https://arxiv.org/abs/1606.03137")
cirl.notes = "This is tagged agi because most humans are able to learn ethics from their surrounding community"
# Co-operative inverse reinforcement learning might be equivalent to solving scalable supervision, or there might other subproblems here
scalable_supervision.add_subproblem(cirl)

safe_exploration = Problem("Safe exploration", ["safety", "agi", "world-modelling"], url="https://arxiv.org/abs/1606.06565")
safe_exploration.notes = """
Sometimes, even doing something once is catastrophic. In such situations, how can an RL agent or some other AI system
learn about the catastrophic consequences without even taking the action once? This is an ability that most humans acquire
at some point between childhood and adolescence.
"""
# safe exploration may be related to one shot learning, though it's probably too early to mark that so clearly.

The work by Saunders et al. (2017) is an example of attempting to deal with the safe exploration problem by human-in-the-loop supervision. Without this oversight, a reinforcement learning system may engage in "reward hacking" in some Atari games. For instance in the Atari 2600 Road Runner game, an RL agent may deliberately kill itself to stay on level 1, because it can get more points on that level than it can on level 2 (particularly when it has not yet learned to master level 2). Human oversight overcomes this problem:

In [42]:
# hiddencode
HTML("""
<video id="video" width="80%" height="%45" controls poster="images/road-runner-poster.jpg" onclick="this.paused?this.play():this.pause();">
   <source src="video/saunders-roadrunner.mp4" type="video/mp4">
</video>

<div style="text-align:right; margin-right:20%">
  Futher videos from that project are on <a href="https://www.youtube.com/playlist?list=PLjs9WCnnR7PCn_Kzs2-1afCsnsBENWqor">on YouTube</a>.
</div>
""")
Out[42]:
Futher videos from that project are on on YouTube.
In [43]:
avoiding_reward_hacking = Problem("Avoiding reward hacking", ["safety"], url="https://arxiv.org/abs/1606.06565")
avoiding_reward_hacking.notes = """
Humans have only partial resistance to reward hacking.
Addiction seems to be one failure to exhibit this resistance.
Avoiding learning something because it might make us feel bad, or even building elaborate systems of self-deception, are also sometimes
seen in humans. So this problem is not tagged "agi".
"""

avoiding_side_effects = Problem("Avoiding undesirable side effects", ["safety"], url="https://arxiv.org/abs/1606.06565")
avoiding_side_effects.nodes = """
Many important constraints on good behaviour will not be explicitly
encoded in goal specification, either because they are too hard to capture
or simply because there are so many of them and they are hard to enumerate
"""

robustness_to_distributional_change = Problem("Function correctly in novel environments (robustness to distributional change)", ["safety", "agi"], url="https://arxiv.org/abs/1606.06565")

copy_bounding = Problem("Know how to prevent an autonomous AI agent from reproducing itself an unbounded number of times", ["safety"])

safety = Problem("Know how to build general AI agents that will behave as expected")
safety.add_subproblem(adversarial_examples)
safety.add_subproblem(scalable_supervision)
safety.add_subproblem(safe_exploration)
safety.add_subproblem(avoiding_reward_hacking)
safety.add_subproblem(avoiding_side_effects)
safety.add_subproblem(robustness_to_distributional_change)
safety.add_subproblem(copy_bounding)

Automated Hacking Systems

Automated tools are becoming increasingly effective both for offensive and defensive computer security purposes.

On the defensive side, fuzzers and static analysis tools have been used for some time by well-resourced software development teams to reduce the number of vulnerabilities in the code they ship.

Assisting both offense and defense, DARPA has recently started running the Cyber Grand Challenge contest to measure and improve the ability of agents to either break into systems or defend those same systems against vulnerabilities. It isn't necessarily clear how such initiatives would change the security of various systems.

This section includes some clear AI problems (like learning to find exploitable vulnerabilities in code) and some less pure AI problems, such as ensuring that defensive versions of this technology (whether in the form of fuzzers, IPSes, or other things) are deployed on all critical systems.

In [44]:
# It isn't totally clear whether having automated systems be good at finding bugs in and of itself will make the deployment
# of AI technologies safer or less safe, so we tag this both with "safety" and as a potentialy "unsafe" development
bug_finding = Problem("Detect security-related bugs in codebases", ["safety", "security", "unsafe"])

# However what
defensive_deployment = Problem("Deploy automated defensive security tools to protect valuable systems")
defensive_deployment.notes = """
It is clearly important is ensuring that the state of the art in defensive technology is deployed everywhere
that matters, including systems that perform important functions or have sensitive data on them (smartphones, for instance), and 
systems that have signifcant computational resources. This "Problem" isn't 
"""

Pedestrian Detection

Detecting pedestrians from images or video is a specific image classification problem that has received a lot of attention because of its importance for self-driving vehicles. Many metrics in this space are based on the Caltech pedestrians toolkit, thought the KITTI Vision Benchmark goes beyond that to include cars and cyclists in addition to pedestrians. We may want to write scrapers for Caltech's published results and KITTI's live results table.

In [45]:
pedestrian_detection = Problem("Pedestrian, bicycle & obstacle detection", ["safety", "vision"])
image_classification.add_subproblem(pedestrian_detection)

# TODO: import data from these pedestrian datasets/metrics.
# performance on them is a frontier of miss rate / false positive tradeoffs, 
# so we'll need to chose how to handle that as a scale

# http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/rocs/UsaTestRocReasonable.pdf
pedestrian_detection.metric("Caltech Pedestrians USA", url="http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/")
# http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/rocs/InriaTestRocReasonable.pdf
pedestrian_detection.metric("INRIA persons", url="http://pascal.inrialpes.fr/data/human/")
# http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/rocs/ETHRocReasonable.pdf
pedestrian_detection.metric("ETH Pedestrian", url="http://www.vision.ee.ethz.ch/~aess/dataset/")
# http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/rocs/TudBrusselsRocReasonable.pdf
pedestrian_detection.metric("TUD-Brussels Pedestrian", url="http://www.d2.mpi-inf.mpg.de/tud-brussels")
# http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/rocs/DaimlerRocReasonable.pdf
pedestrian_detection.metric("Damiler Pedestrian", url="http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/Daimler_Mono_Ped__Detection_Be/daimler_mono_ped__detection_be.html")
Out[45]:
Metric("Damiler Pedestrian")

Explainability and Interpretability

In [46]:
explainability = Problem("Modify arbitrary ML systems in order to be able to provide comprehensible human explanations of their decisions")

statistical_explainability = Problem("Provide mathematical or technical explanations of decisions from classifiers")
statistical_explainability.notes = """
Providing explanations with techniques such as monte carlo analysis may in general
be easier than providing robust ones in natural language (since those may or may not
exist in all cases)
"""

explainability.add_subproblem(statistical_explainability)

Fairness and Debiasing

Biased decision making is a problem exhibited both by very simple machine learning classifiers as well as much more complicated ones. Large drivers of this problem include omitted-variable bias, reliance on inherently biased data sources for training data, attempts to make predictions from insufficient quantities of data, and deploying systems that create real-world incentives that change the behaviour they were measuring (see Goodhart's Law).

These problems are severe and widespread in the deployment of scoring and machine learning systems in contexts that include criminal justice, education policy, insurance and lending.

In [47]:
avoiding_bias = Problem("Build systems which can recognise and avoid biases decision making", ["safety"])

avoiding_bias.notes = """
Legally institutionalised protected categories represent only the most extreme and socially recognised
forms of biased decisionmaking. Attentive human decision makers are sometime capable of recognising
and avoiding many more subtle biases. This problem tracks AI systems' ability to do likewise.
"""

avoid_classification_biases = Problem("Train ML classifiers in a manner that corrects for the impact of omitted-variable bias on certain groups", solved=True)
avoid_classification_biases.notes = '''
Several standards are available for avoiding classification biases.

They include holding false-positive / false adverse prediction rates constant across protected categories (which roughly maps 
to "equal opportunity"), holding both false-positive and false-negative rates equal ("demographic parity"), and ensuring
that the fraction of each protected group that receives a given prediction is constant across all groups 
(roughly equivalent to "affirmative action").'''

avoid_classification_biases.metric("Adjust prediction models to have constant false-positive rates", url="https://arxiv.org/abs/1610.02413", solved=True)
avoid_classification_biases.metric("Adjust prediction models tos have constant false-positive and -negative rates", url="http://www.jmlr.org/proceedings/papers/v28/zemel13.pdf", solved=True)
Out[47]:
Metric("Adjust prediction models tos have constant false-positive and -negative rates")

Privacy

Many of the interesting privacy problems that will arise from AI and machine learning will come from choices about the applications of the technology, rather than a lack of algorithmic progress within the field. But there are some exceptions, which we will track here.

In [48]:
private_training = Problem("Train machine learning systems on private user data, without transferring sensitive facts into the model")
private_training.metric("Federated Learning (distributed training with thresholded updates to models)", solved=True, url="https://arxiv.org/abs/1602.05629")

avoid_privacy_bias = Problem("Fairness in machine learning towards people with a preference for privacy")
avoid_privacy_bias.notes = """
People who care strongly about their own privacy take many measures to obfuscate their tracks through
technological society, including using fictitious names, email addresses, etc in their routine dealings with
corporations, installing software to block or send inacurate data to online trackers. Like many other groups,
these people may be subject to unfairly adverse algorithmic decisionmaking. Treating them as a protected
group will be more difficult, because they are in many respects harder to identify.
"""
In [49]:
# hiddencode
def counts():
    print ("Included thus far:")
    print ("=================================")
    print (len(problems), "problems")
    print (len(metrics), "metrics", len([m for m in metrics.values() if m.solved]), "solved")
    print (len(measurements), "measurements")
    print (len([p for p in problems.values() if not p.metrics]), "problems which do not yet have any metrics (either not in this notebook, or none in the open literature)")
    print ("=================================\n")
    print ("Problems by Type:")
    print ("=================================")

    by_attr = {}
    solved_by_attr = {}
    for a in all_attributes:
        print (a, len([p for p in problems.values() if a in p.attributes]), )
        print ("solved:", len([p for p in problems.values() if p.solved and a in p.attributes]))

    print ("\nMetrics by Type:")
    print ("=================================")

    by_attr = {}
    solved_by_attr = {}
    for a in all_attributes:
        print (a, sum([len(p.metrics) for p in problems.values() if a in p.attributes]), )
        print ("solved:", sum([len([m for m in p.metrics if m.solved]) for p in problems.values() if a in p.attributes]))
    print ("=================================\n")
In [50]:
# hiddencode
def list_problems():
    for p in sorted(problems.values(), key=lambda x: x.attributes):
        if not p.superproblems:
            p.print_structure()
            print("")
In [51]:
# hiddencode
def venn_report():
    print("Sample of problems characterized thus far:")
    lang = set(p for p in problems.values() if "language" in p.attributes)
    world = set(p for p in problems.values() if "world-modelling" in p.attributes)
    vision = set(p for p in problems.values() if "vision" in p.attributes)

    from matplotlib_venn import venn3
    venn3((lang, world, vision), ('Language Problems', 'World-Modelling Problems', 'Vision Problems'))
    plt.show()
In [52]:
# hiddencode
def graphs():
    print("Graphs of progress:")
    for name, metric in metrics.items():
        if len(metric.measures) > 2 and not metric.graphed:
            print(name, "({0} measurements)".format(len(metric.measures)))
            metric.graph()
    plt.show()
                
graphs()
Graphs of progress:
Computer Go (7 measurements)

Taxonomy and recorded progress to date

In [53]:
list_problems()
Problem(Train machine learning systems on private user data, without transferring sensitive facts into the model)
    Metric(Federated Learning (distributed training with thresholded updates to models))SOLVED

Problem(Correctly identify when an answer to a classification problem is uncertain)

Problem(Deploy automated defensive security tools to protect valuable systems)

Problem(Learn a several tasks without undermining performance on a first task, avoiding catastrophic forgetting)

Problem(Train ML classifiers in a manner that corrects for the impact of omitted-variable bias on certain groups)
    Metric(Adjust prediction models to have constant false-positive rates)SOLVED
    Metric(Adjust prediction models tos have constant false-positive and -negative rates)SOLVED

Problem(Fairness in machine learning towards people with a preference for privacy)

Problem(Building systems that solve a wide range of diverse problems, rather than just specific ones)
    Metric(Solve all other solved problems in this document, with a single system)?
    Problem(Transfer learning: apply relevant knowledge from a prior setting to a new slightly different one)
        Problem(Transfer of learning within simple arcade game paradigms)

Problem(Modify arbitrary ML systems in order to be able to provide comprehensible human explanations of their decisions)
    Problem(Provide mathematical or technical explanations of decisions from classifiers)

Problem(Know how to build general AI agents that will behave as expected)
    Problem(Resistance to adversarial examples)
    Problem(Scalable supervision of a learning system)
        Problem(Cooperative inverse reinforcement learning of objective functions)
    Problem(Safe exploration)
    Problem(Avoiding reward hacking)
    Problem(Avoiding undesirable side effects)
    Problem(Function correctly in novel environments (robustness to distributional change))
    Problem(Know how to prevent an autonomous AI agent from reproducing itself an unbounded number of times)

Problem(Abstract strategy games)
    Problem(Playing abstract games with extensive hints)
        Metric(Computer Chess)                                      SOLVED
        Metric(Computer Go)                                         SOLVED
    Problem(Superhuman mastery of arbitrary abstract strategy games)
        Metric(mastering chess)                                     ?
    Problem(Learning the rules of complex strategy games from examples)
        Metric(learning chess)                                      ?
        Metric(learning go)                                         ?
    Problem(Play an arbitrary abstract game, first learning the rules)

Problem(Translation between human langauges)
    Metric(news-test-2014 En-Fr BLEU)                           not solved
    Metric(news-test-2014 En-De BLEU)                           not solved
    Metric(news-test-2015 En-De BLEU)                           not solved
    Metric(news-test-2016 En-Ro BLEU)                           not solved
    Metric(LDC En-De BLEU)                                      not solved

Problem(Conduct arbitrary sustained, probing conversation)
    Problem(Turing test for casual conversation)
        Metric(The Loebner Prize scored selection answers)          not solved
    Problem(Language comprehension and question-answering)
        Metric(bAbi 20 QA (10k training examples))                  SOLVED
        Metric(bAbi 20 QA (1k training examples))                   not solved
        Metric(Reading comprehension MCTest-160-all)                ?
        Metric(Reading comprehension MCTest-500-all)                ?
        Metric(bAbi Children's Book comprehension CBtest CN)        SOLVED
        Metric(CNN Comprehension test)                              ?
        Metric(Daily Mail Comprehension test)                       ?
        Metric(Stanford Question Answering Dataset EM test)         SOLVED
        Metric(Stanford Question Answering Dataset F1 test)         SOLVED
        Metric(bAbi Children's Book comprehension CBtest NE)        SOLVED

Problem(Vision)
    Problem(Image classification)
        Metric(Imagenet Image Recognition)                          SOLVED
        Metric(MSRC-21 image semantic labelling (per-class))        ?
        Metric(MSRC-21 image semantic labelling (per-pixel))        ?
        Metric(CIFAR-100 Image Recognition)                         ?
        Metric(CIFAR-10 Image Recognition)                          SOLVED
        Metric(Street View House Numbers (SVHN))                    SOLVED
        Metric(MNIST handwritten digit recognition)                 SOLVED
        Metric(STL-10 Image Recognition)                            ?
        Metric(Leeds Sport Poses)                                   ?
        Problem(Image comprehension)
            Metric(COCO Visual Question Answering (VQA) real images 1.0 open ended)not solved
            Metric(COCO Visual Question Answering (VQA) real images 1.0 multiple choice)?
            Metric(COCO Visual Question Answering (VQA) abstract images 1.0 open ended)not solved
            Metric(COCO Visual Question Answering (VQA) abstract 1.0 multiple choice)?
            Metric(Toronto COCO-QA)                                     ?
            Metric(DAQUAR)                                              not solved
            Metric(Visual Genome (pairs))                               ?
            Metric(Visual Genome (subjects))                            ?
            Metric(Visual7W)                                            ?
            Metric(FM-IQA)                                              ?
            Metric(Visual Madlibs)                                      ?
            Metric(COCO Visual Question Answering (VQA) real images 2.0 open ended)?
        Problem(Pedestrian, bicycle & obstacle detection)
            Metric(Caltech Pedestrians USA)                             ?
            Metric(INRIA persons)                                       ?
            Metric(ETH Pedestrian)                                      ?
            Metric(TUD-Brussels Pedestrian)                             ?
            Metric(Damiler Pedestrian)                                  ?
    Problem(Recognise events in videos)
        Metric(YouTube-8M video labelling)                          ?

Problem(One shot learning: ingest important truths from a single example)

Problem(Detection of Instrumentals musical tracks)
    Metric(Precision of Instrumentals detection reached when tested on SATIN (Bayle et al. 2017))not solved

Problem(Speech Recognition)
    Metric(Word error rate on Switchboard trained against the Hub5'00 dataset)SOLVED
    Metric(librispeech WER testclean)                           SOLVED
    Metric(librispeech WER testother)                           SOLVED
    Metric(wsj WER eval92)                                      SOLVED
    Metric(wsj WER eval93)                                      SOLVED
    Metric(swb_hub_500 WER fullSWBCH)                           ?
    Metric(fisher WER)                                          ?
    Metric(chime clean)                                         ?
    Metric(chime real)                                          ?
    Metric(timit PER)                                           ?

Problem(Accurate modelling of human language.)
    Metric(Penn Treebank (Perplexity when parsing English sentences))?
    Metric(Hutter Prize (bits per character to encode English text))SOLVED
    Metric(LAMBADA prediction of words in discourse)            not solved

Problem(Given an arbitrary technical problem, solve it as well as a typical professional in that field)
    Problem(Writing software from specifications)
        Metric(Card2Code)                                           ?
    Problem(Solve vaguely or under-constrained technical problems)
        Problem(Read a scientific or technical paper, and comprehend its contents)
            Problem(Extract major numerical results or progress claims from a STEM paper)
                Metric(Automatically find new relevant ML results on arXiv) ?
        Problem(Write computer programs from specifications)
            Metric(Card2Code MTG accuracy)                              not solved
            Metric(Card2Code Hearthstone accuracy)                      not solved
            Problem(Parse and implement complex conditional expressions)
        Problem(Answering Science Exam Questions)
            Metric(NY Regents 4th Grade Science Exams)                  not solved
            Metric(Elementery Non-Diagram Multiple Choice (NDMC) Science Exam accuracy)not solved
            Metric(Elementery Diagram Multiple Choice (DMC) Science Exam accuracy)not solved
            Metric(Middle School Non-Diagram Multiple Choice (NDMC) Science Exam accuracy)not solved
            Metric(Middle School Diagram Multiple Choice (DMC) Science Exam accuracy)not solved
    Problem(Solve technical problems with clear constraints (proofs, circuit design, aerofoil design, etc))
        Problem(Given examples of proofs, find correct proofs of simple mathematical theorems)
            Metric(HolStep)                                             ?
        Problem(Given desired circuit characteristics, and many examples, design new circuits to spec)

Problem(Build systems which can recognise and avoid biases decision making)

Problem(Detect security-related bugs in codebases)

Problem(Be able to generate complex scene e.g. a baboon receiving their degree at convocatoin.)
    Problem(Drawing pictures)
        Metric(Generative models of CIFAR-10 images)                ?

Problem(Play real-time computer & video games)
    Problem(Games that require inventing novel language, forms of speech, or communication)
        Problem(Games that require both understanding and speaking a language)
            Metric(Starcraft)                                           ?
            Problem(Games that require language comprehension)
    Problem(Simple video games)
        Metric(Atari 2600 Alien)                                    not solved
        Metric(Atari 2600 Amidar)                                   SOLVED
        Metric(Atari 2600 Assault)                                  SOLVED
        Metric(Atari 2600 Asterix)                                  SOLVED
        Metric(Atari 2600 Asteroids)                                not solved
        Metric(Atari 2600 Atlantis)                                 SOLVED
        Metric(Atari 2600 Bank Heist)                               SOLVED
        Metric(Atari 2600 Battle Zone)                              not solved
        Metric(Atari 2600 Beam Rider)                               SOLVED
        Metric(Atari 2600 Berzerk)                                  SOLVED
        Metric(Atari 2600 Bowling)                                  not solved
        Metric(Atari 2600 Boxing)                                   SOLVED
        Metric(Atari 2600 Breakout)                                 SOLVED
        Metric(Atari 2600 Centipede)                                SOLVED
        Metric(Atari 2600 Chopper Command)                          SOLVED
        Metric(Atari 2600 Crazy Climber)                            SOLVED
        Metric(Atari 2600 Demon Attack)                             SOLVED
        Metric(Atari 2600 Double Dunk)                              SOLVED
        Metric(Atari 2600 Enduro)                                   SOLVED
        Metric(Atari 2600 Fishing Derby)                            SOLVED
        Metric(Atari 2600 Freeway)                                  SOLVED
        Metric(Atari 2600 Frostbite)                                SOLVED
        Metric(Atari 2600 Gopher)                                   SOLVED
        Metric(Atari 2600 Gravitar)                                 not solved
        Metric(Atari 2600 HERO)                                     SOLVED
        Metric(Atari 2600 Ice Hockey)                               SOLVED
        Metric(Atari 2600 James Bond)                               SOLVED
        Metric(Atari 2600 Kangaroo)                                 SOLVED
        Metric(Atari 2600 Krull)                                    SOLVED
        Metric(Atari 2600 Kung-Fu Master)                           SOLVED
        Metric(Atari 2600 Montezuma's Revenge)                      not solved
        Metric(Atari 2600 Ms. Pacman)                               not solved
        Metric(Atari 2600 Name This Game)                           SOLVED
        Metric(Atari 2600 Pong)                                     SOLVED
        Metric(Atari 2600 Private Eye)                              not solved
        Metric(Atari 2600 Q*Bert)                                   SOLVED
        Metric(Atari 2600 River Raid)                               SOLVED
        Metric(Atari 2600 Road Runner)                              SOLVED
        Metric(Atari 2600 Robotank)                                 SOLVED
        Metric(Atari 2600 Seaquest)                                 SOLVED
        Metric(Atari 2600 Space Invaders)                           SOLVED
        Metric(Atari 2600 Star Gunner)                              SOLVED
        Metric(Atari 2600 Tennis)                                   SOLVED
        Metric(Atari 2600 Time Pilot)                               SOLVED
        Metric(Atari 2600 Tutankham)                                SOLVED
        Metric(Atari 2600 Up and Down)                              SOLVED
        Metric(Atari 2600 Venture)                                  SOLVED
        Metric(Atari 2600 Video Pinball)                            SOLVED
        Metric(Atari 2600 Wizard of Wor)                            SOLVED
        Metric(Atari 2600 Zaxxon)                                   SOLVED
        Metric(Atari 2600 Phoenix)                                  SOLVED
        Metric(Atari 2600 Pitfall!)                                 not solved
        Metric(Atari 2600 Skiing)                                   not solved
        Metric(Atari 2600 Solaris)                                  not solved
        Metric(Atari 2600 Yars Revenge)                             SOLVED
        Metric(Atari 2600 Defender)                                 SOLVED
        Metric(Atari 2600 Surround)                                 SOLVED

Problems and Metrics by category

In [54]:
counts()
venn_report()
Included thus far:
=================================
58 problems
135 metrics 66 solved
1745 measurements
34 problems which do not yet have any metrics (either not in this notebook, or none in the open literature)
=================================

Problems by Type:
=================================
agi 27
solved: 2
language 12
solved: 1
communication 2
solved: 0
world-modelling 13
solved: 0
unsafe 1
solved: 0
qa 1
solved: 0
math 2
solved: 0
safety 11
solved: 0
languge 1
solved: 0
abstract-games 5
solved: 2
science 1
solved: 0
security 2
solved: 0
super 2
solved: 0
vision 6
solved: 0
realtime-games 2
solved: 0

Metrics by Type:
=================================
agi 113
solved: 61
language 43
solved: 11
communication 1
solved: 0
world-modelling 81
solved: 51
unsafe 0
solved: 0
qa 5
solved: 0
math 1
solved: 0
safety 5
solved: 0
languge 0
solved: 0
abstract-games 5
solved: 2
science 5
solved: 0
security 0
solved: 0
super 1
solved: 0
vision 27
solved: 4
realtime-games 57
solved: 46
=================================

Sample of problems characterized thus far:

How to contribute to this notebook

This notebook is an open source, community effort. It lives on Github at https://github.com/AI-metrics/AI-metrics. You can help by adding new metrics, data and problems to it! If you're feeling ambitious you can also improve its semantics or build new analyses into it. Here are some high level tips on how to do that.

0. The easiest way -- just hit the edit button

Next to every table of results (not yet next to the graphs) you'll find an "Add/edit data on Github" link. You can just click it, and you should get a link to the Github's online editor that should make it easy to add new results, or fix existing ones, and send us a pull request. For best results, make sure you're logged in to Github

1. If you're comfortable with git and Jupyter Notebooks, or are happy to learn

If you're interested in making more extensive changes to the Notebook, and you've already worked a lot with git and IPython/Jupyter Notebooks, you can run and edit copy locally. This is a fairly involved process (Jupyter Notebook and git interact in a somewhat complicated way) but here's a quick list of things that should mostly work:

  1. Install Jupyter Notebook and git.
    • On an Ubuntu or Debian system, you can do:
      sudo apt-get install git
      sudo apt-get install ipython-notebook || sudo apt-get install jupyter-notebook || sudo apt-get install python-notebook
    • Make sure you have IPython Notebook version 3 or higher. If your OS doesn't provide it, you might need to enable backports, or use pip to install it.
  2. Install this notebook's Python dependencies:
    • On Ubuntu or Debian, do:
          sudo apt-get install python-{cssselect,lxml,matplotlib{,-venn},numpy,requests,seaborn}
    • On other systems, use your native OS packages, or use pip:
          pip install cssselect lxml matplotlib{,-venn} numpy requests seaborn
  3. Fork our repo on github: https://github.com/AI-metrics/AI-metrics#fork-destination-box
  4. Clone the repo on your machine, and cd into the directory it's using
  5. Configure your copy of git to use IPython Notebook merge filters to prevent conflicts when multiple people edit the Notebook simultaneously. You can do that with these two commands in the cloned repo:
    git config --file .gitconfig filter.clean_ipynb.clean $PWD/ipynb_drop_output
    git config --file .gitconfig filter.clean_ipynb.smudge cat
  6. Run Jupyter Notebok in the project directory (the command may be ipython notebook, jupyter notebook, jupyter-notebook, or python notebook depending on your system), then go to localhost:8888 and edit the Notebook to your heart's content

  7. Save and commit your work (git commit -a -m "DESCRIPTION OF WHAT YOU CHANGED")

  8. Push it to your remote repo
  9. Send us a pull request!

Notes on importing data

  • Each .measure() call is a data point of a specific algorithm on a specific metric/dataset. Thus one paper will often produce multiple measurements on multiple metrics. It's most important to enter results that were at or near the frontier of best performance on the date they were published. This isn't a strict requirement, though; it's nice to have a sense of the performance of the field, or of algorithms that are otherwise notable even if they aren't the frontier for a sepcific problem.
  • When multiple revisions of a paper (typically on arXiv) have the same results on some metric, use the date of the first version (the CBTest results in this paper are an example)
  • When subsequent revisions of a paper improve on the original results (example), use the date and scores of the first results, or if each revision is interesting / on the frontier of best performance, include each paper
    • We didn't check this carefully for our first ~100 measurement data points :(. In order to denote when we've checked which revision of an arXiv preprint first published a result, cite the specific version (https://arxiv.org/abs/1606.01549v3 rather than https://arxiv.org/abs/1606.01549). That way, we can see which previous entries should be double-checked for this form of inaccuracy.
  • Where possible, use a clear short name or acronym for each algorithm. The full paper name can go in the papername field (and is auto-populated for some papers). When matplotlib 2.1 ships we may be able to get nice rollovers with metadata like this. Or perhaps we can switch to D3 to get that type of interactivity.

What to work on

  • If you know of ML datasets/metrics that aren't included yet, add them
  • If there are papers with interesting results for metrics that aren't included, add them
  • If you know of important problems that humans can solve, and machine learning systems may or may not yet be able to, and they're missing from our taxonomy, you can propose them
  • Look at our Github issue list, perhaps starting with those tagged as good volunteer tasks.

Building on this data

If you want to use this data for some purpose that is beyond the scope of this Notebook, all of the raw data exported as a JSON blob. This is not yet a stable API, but you can get the data at:

https://raw.githubusercontent.com/AI-metrics/AI-metrics/master/export-api/v01/progress.json

License

Creative Commons License

Much of this Notebook is uncopyrightable data. The copyrightable portions of this Notebook that are written by EFF and other Github contributors are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. Illustrations from datasets and text written by other parties remain copyrighted by their respective owners, if any, and may be subject to different licenses.

The source code is also dual-licensed under the GNU General Public License, version 2 or greater.

How to cite this document

In academic contexts, you can cite this document as:

Peter Eckersley, Yomna Nasser et al., EFF AI Progress Measurement Project, (2017-) https://eff.org/ai/metrics, accessed on 2017-09-09,

or the equivalent in the bibliographic format you are working in.

If you would like to deep-link an exact version of the text of the Notebook for archival or historical purposes, you can do that using the Internet Archive or Github. In addition to keeping a record of changes, Github will render a specific version of the Notebook using URLs like this one: https://github.com/AI-metrics/AI-metrics/blob/008993c84188094ba804882f65815c7e1cfc4d0e/AI-progress-metrics.ipynb

In [55]:
# hiddencode

def export_json(indent=False, default_name="export-api/v01/progress.json"):
    """Export all the data in here to a JSON file! Default name: progress.json."""
    output = {'problems':[]}
    for problem in problems:
        problem = problems[problem]
        problem_data = {}        
        for problem_attr in problem.__dict__:
            if problem_attr in ['subproblems', 'superproblems']:
                problem_data[problem_attr] = list(map(lambda x: x.name, getattr(problem, problem_attr)))
            elif problem_attr != 'metrics':
                problem_data[problem_attr] = getattr(problem, problem_attr)
            elif problem_attr == 'metrics':
                problem_data['metrics'] = []
                for metric in problem.metrics:
                    metric_data = {}
                    metric_data['measures'] = []
                    for metric_attr in metric.__dict__: 
                        if metric_attr == 'scale':
                            metric_data[metric_attr] = getattr(metric, metric_attr).axis_label
                        elif metric_attr != 'measures':
                            metric_data[metric_attr] = getattr(metric, metric_attr)
                        elif metric_attr == 'measures':
                            for measure in getattr(metric, 'measures'):
                                measure_data = {}
                                for measure_attr in measure.__dict__:
                                    measure_data[measure_attr] = getattr(measure, measure_attr)
                                metric_data['measures'].append(measure_data)
                    problem_data['metrics'].append(metric_data)
        output['problems'].append(problem_data)
    if indent:
        with open(default_name, 'w') as f:
            f.write(json.dumps(output, default=str, indent=4))
    else:
        with open(default_name, 'w') as f:
            f.write(json.dumps(output, default=str))
        
export_json(indent=True)
In [56]:
%%javascript
// # hiddencode
if (document.location.hostname == "localhost" && document.location.port != null) {
    console.log($(".local-edit").show(500));
}
In [57]:
len([m for m in metrics if "Atari" in str(m)])
Out[57]:
57

Back to top

JavaScript license information