iclr12

ICLR 2020 论文列表

8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.

Federated Learning with Matched Averaging.
Differentiable Reasoning over a Virtual Knowledge Base.
Adversarial Training and Provable Defenses: Bridging the Gap.
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks.
Convolutional Conditional Neural Processes.
Meta-Learning with Warped Gradient Descent.
SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference.
High Fidelity Speech Synthesis with Adversarial Networks.
A Generalized Training Approach for Multiagent Learning.
Building Deep Equivariant Capsule Networks.
Restricting the Flow: Information Bottlenecks for Attribution.
Intrinsically Motivated Discovery of Diverse Patterns in Self-Organizing Systems.
Causal Discovery with Reinforcement Learning.
Rotation-invariant clustering of neuronal responses in primary visual cortex.
Reformer: The Efficient Transformer.
Target-Embedding Autoencoders for Supervised Representation Learning.
Watch the Unobserved: A Simple Approach to Parallelizing Monte Carlo Tree Search.
RNA Secondary Structure Prediction By Learning Unrolled Algorithms.
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distribution Tasks.
Fast Task Inference with Variational Intrinsic Successor Features.
Implementation Matters in Deep RL: A Case Study on PPO and TRPO.
A Closer Look at Deep Policy Gradients.
Understanding and Robustifying Differentiable Architecture Search.
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds.
Geometric Analysis of Nonconvex Optimization Landscapes for Overcomplete Learning.
A Theory of Usable Information under Computational Constraints.
Mathematical Reasoning in Latent Space.
Meta-Q-Learning.
Comparing Rewinding and Fine-tuning in Neural Network Pruning.
Harnessing Structures for Value-Based Planning and Reinforcement Learning.
GraphZoom: A Multi-level Spectral Approach for Accurate and Scalable Graph Embedding.
Optimal Strategies Against Generative Attacks.
Dynamics-Aware Unsupervised Discovery of Skills.
Your classifier is secretly an energy based model and you should treat it like one.
Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning.
Mirror-Generative Neural Machine Translation.
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech.
Mogrifier LSTM.
Posterior sampling for multi-agent reinforcement learning: solving extensive games with imperfect information.
Why Gradient Clipping Accelerates Training: A Theoretical Justification for Adaptivity.
Neural Network Branching for Neural Network Verification.
Contrastive Learning of Structured World Models.
Data-dependent Gaussian Prior Objective for Language Generation.
On the Convergence of FedAvg on Non-IID Data.
Principled Weight Initialization for Hypernetworks.
GenDICE: Generalized Offline Estimation of Stationary Values.
BackPACK: Packing more into Backprop.
CATER: A diagnostic dataset for Compositional Actions & TEmporal Reasoning.
Geom-GCN: Geometric Graph Convolutional Networks.
An Exponential Learning Rate Schedule for Deep Learning.
Progressive Learning and Disentanglement of Hierarchical Representations.
Reconstructing continuous distributions of 3D protein structure from cryo-EM images.
Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint.
Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem.
Energy-based models for atomic-resolution protein conformations.
A Mutual Information Maximization Perspective of Language Representation Learning.
Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models.
Making Sense of Reinforcement Learning and Probabilistic Inference.
Deep Learning For Symbolic Mathematics.
SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models.
DeepSphere: a graph-based spherical CNN.
Neural Arithmetic Units.
Truth or backpropaganda? An empirical investigation of deep learning theory.
Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks.
Improving Generalization in Meta Reinforcement Learning using Learned Objectives.
Kaleidoscope: An Efficient, Learnable Representation For All Structured Linear Maps.
Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN).
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Compression based bound for non-compressed network: unified generalization error analysis of large compressible deep neural network.
Explanation by Progressive Exaggeration.
Directional Message Passing for Molecular Graphs.
Learning from Rules Generalizing Labeled Exemplars.
Training individually fair ML models with sensitive subspace robustness.
What Can Neural Networks Reason About?
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement.
Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models.
Spectral Embedding of Regularized Block Models.
Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning.
Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization.
The Ingredients of Real World Robotic Reinforcement Learning.
Scaling Autoregressive Video Models.
Differentiation of Blackbox Combinatorial Solvers.
Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks.
The intriguing role of module criticality in the generalization of deep networks.
Self-labelling via simultaneous clustering and representation learning.
Neural Tangents: Fast and Easy Infinite Neural Networks in Python.
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue.
Measuring the Reliability of Reinforcement Learning Algorithms.
Stable Rank Normalization for Improved Generalization in Neural Networks and GANs.
Disagreement-Regularized Imitation Learning.
Model Based Reinforcement Learning for Atari.
Understanding Why Neural Networks Generalize Well Through GSNR of Parameters.
A Latent Morphology Model for Open-Vocabulary Neural Machine Translation.
And the Bit Goes Down: Revisiting the Quantization of Neural Networks.
Kernelized Wasserstein Natural Gradient.
FreeLB: Enhanced Adversarial Training for Natural Language Understanding.
Behaviour Suite for Reinforcement Learning.
Strategies for Pre-training Graph Neural Networks.
NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture Search.
Emergent Tool Use From Multi-Agent Autocurricula.
A Probabilistic Formulation of Unsupervised Text Style Transfer.
Dream to Control: Learning Behaviors by Latent Imagination.
Real or Not Real, that is the Question.
Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension.
Network Deconvolution.
Simplified Action Decoder for Deep Multi-Agent Reinforcement Learning.
Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning?
Learning The Difference That Makes A Difference With Counterfactually-Augmented Data.
Asymptotics of Wide Networks from Feynman Diagrams.
Symplectic Recurrent Neural Networks.
Learning to Plan in High Dimensions via Neural Exploration-Exploitation Trees.
Disentangling neural mechanisms for perceptual grouping.
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations.
The Break-Even Point on Optimization Trajectories of Deep Neural Networks.
The Logical Expressiveness of Graph Neural Networks.
CLEVRER: Collision Events for Video Representation and Reasoning.
Learning Compositional Koopman Operators for Model-Based Control.
Doubly Robust Bias Reduction in Infinite Horizon Off-Policy Estimation.
Deep neuroethology of a virtual rodent.
Emergence of functional and structural properties of the head direction system by optimization of recurrent neural networks.
Duration-of-Stay Storage Assignment under Uncertainty.
Inductive Matrix Completion Based on Graph Neural Networks.
Conditional Learning of Fair Representations.
Gradientless Descent: High-Dimensional Zeroth-Order Optimization.
Estimating counterfactual treatment outcomes over time through adversarially balanced representations.
CoPhy: Counterfactual Learning of Physical Dynamics.
Hamiltonian Generative Networks.
How much Position Information Do Convolutional Neural Networks Encode?
Sliced Cramer Synaptic Consolidation for Preserving Deeply Learned Representations.
Hoppity: Learning Graph Transformations to Detect and Fix Bugs in Programs.
Influence-Based Multi-Agent Exploration.
Meta-Learning without Memorization.
Finite Depth and Width Corrections to the Neural Tangent Kernel.
Ridge Regression: Structure, Cross-Validation, and Sketching.
Cross-Domain Few-Shot Classification via Learned Feature-Wise Transformation.
DDSP: Differentiable Digital Signal Processing.
Encoding word order in complex embeddings.
Enhancing Adversarial Defense by k-Winners-Take-All.
Online and stochastic optimization beyond Lipschitz continuity: A Riemannian approach.
PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search.
Tranquil Clouds: Neural Networks for Learning Temporally Coherent Features in Point Clouds.
Neural Machine Translation with Universal Visual Representation.
White Noise Analysis of Neural Networks.
Skip Connections Matter: On the Transferability of Adversarial Examples Generated with ResNets.
A Signal Propagation Perspective for Pruning Neural Networks at Initialization.
Intensity-Free Learning of Temporal Point Processes.
Learning to Control PDEs with Differentiable Physics.
Estimating Gradients for Discrete Random Variables by Sampling without Replacement.
Defending Against Physically Realizable Attacks on Image Classification.
On Robustness of Neural Ordinary Differential Equations.
InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization.
Multi-Scale Representation Learning for Spatial Feature Distributions using Grid Cells.
Graph Neural Networks Exponentially Lose Expressive Power for Node Classification.
Sparse Coding with Gated Learned ISTA.
Program Guided Agent.
Pay Attention to Features, Transfer Learn Faster CNNs.
Gradients as Features for Deep Representation Learning.
Monotonic Multihead Attention.
Massively Multilingual Sparse Word Representations.
Query-efficient Meta Attack to Deep Neural Networks.
Breaking Certified Defenses: Semantic Adversarial Examples with Spoofed robustness Certificates.
Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation.
How to 0wn the NAS in Your Spare Time.
The Shape of Data: Intrinsic Distance for Data Distributions.
Understanding Generalization in Recurrent Neural Networks.
Conservative Uncertainty Estimation By Fitting Prior Networks.
NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search.
Learning to Coordinate Manipulation Skills via Skill Behavior Diversification.
Robust Subspace Recovery Layer for Unsupervised Anomaly Detection.
Learning Nearly Decomposable Value Functions Via Communication Minimization.
Extreme Classification via Adversarial Softmax Approximation.
Information Geometry of Orthogonal Initializations and Training.
Mixed Precision DNNs: All you need is a good parametrization.
Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Occurring in Data.
Deep Orientation Uncertainty Learning based on a Bingham Loss.
Critical initialisation in continuous approximations of binary neural networks.
Sub-policy Adaptation for Hierarchical Reinforcement Learning.
Episodic Reinforcement Learning with Associative Memory.
Pitfalls of In-Domain Uncertainty Estimation and Ensembling in Deep Learning.
DiffTaichi: Differentiable Programming for Physical Simulation.
Domain Adaptive Multibranch Networks.
Prediction Poisoning: Towards Defenses Against DNN Model Stealing Attacks.
SCALOR: Generative World Models with Scalable Object Representations.
Neural tangent kernels, transportation mappings, and universal approximation.
Learning to Move with Affordance Maps.
Differentiable learning of numerical rules in knowledge graphs.
Consistency Regularization for Generative Adversarial Networks.
On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning.
Scale-Equivariant Steerable Networks.
Classification-Based Anomaly Detection for General Data.
Unrestricted Adversarial Examples via Semantic Manipulation.
Discriminative Particle Filter Reinforcement Learning for Complex Partial observations.
Learning to Group: A Bottom-Up Framework for 3D Part Discovery in Unseen Categories.
State Alignment-based Imitation Learning.
Lipschitz constant estimation of Neural Networks via sparse polynomial optimization.
Effect of Activation Functions on the Training of Overparametrized Neural Nets.
Provable Filter Pruning for Efficient Neural Networks.
End to End Trainable Active Contours via Differentiable Rendering.
Compositional Language Continual Learning.
Adversarial Lipschitz Regularization.
A Function Space View of Bounded Norm Infinite Width ReLU Nets: The Multivariate Case.
Lite Transformer with Long-Short Range Attention.
Mutual Information Gradient Estimation for Representation Learning.
Regularizing activations in neural networks via distribution matching with the Wasserstein metric.
Transferring Optimality Across Data Distributions via Homotopy Methods.
Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings.
Dynamic Model Pruning with Feedback.
On the interaction between supervision and self-play in emergent communication.
A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms.
Expected Information Maximization: Using the I-Projection for Mixture Density Estimation.
Deep Audio Priors Emerge From Harmonic Convolutional Networks.
A closer look at the approximation capabilities of neural networks.
Residual Energy-Based Models for Text Generation.
AtomNAS: Fine-Grained End-to-End Neural Architecture Search.
AugMix: A Simple Data Processing Method to Improve Robustness and Uncertainty.
Memory-Based Graph Networks.
Variational Template Machine for Data-to-Text Generation.
Phase Transitions for the Information Bottleneck in Representation Learning.
Continual learning with hypernetworks.
Permutation Equivariant Models for Compositional Generalization in Language.
Training binary neural networks with real-to-binary convolutions.
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding.
Smooth markets: A basic mechanism for organizing gradient-based learners.
Fair Resource Allocation in Federated Learning.
Never Give Up: Learning Directed Exploration Strategies.
AdvectiveNet: An Eulerian-Lagrangian Fluidic Reservoir for Point Cloud Processing.
You CAN Teach an Old Dog New Tricks! On Training Knowledge Graph Embeddings.
Functional Regularisation for Continual Learning with Gaussian Processes.
Dynamics-Aware Embeddings.
RaPP: Novelty Detection with Reconstruction along Projection Pathway.
Hypermodels for Exploration.
Meta Reinforcement Learning with Autonomous Inference of Subtask Dependencies.
BayesOpt Adversarial Attack.
Model-based reinforcement learning for biological sequence design.
BinaryDuo: Reducing Gradient Mismatch in Binary Activation Network by Coupling Binary Activations.
Mixed-curvature Variational Autoencoders.
Demystifying Inter-Class Disentanglement.
Understanding the Limitations of Conditional Generative Models.
Keep Doing What Worked: Behavior Modelling Priors for Offline Reinforcement Learning.
Empirical Bayes Transductive Meta-Learning with Synthetic Gradients.
Spike-based causal inference for weight alignment.
Lookahead: A Far-sighted Alternative of Magnitude-based Pruning.
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.
Making Efficient Use of Demonstrations to Solve Hard Exploration Problems.
Meta-learning curiosity algorithms.
vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations.
Infinite-horizon Off-Policy Policy Evaluation with Multiple Behavior Policies.
MMA Training: Direct Input Space Margin Maximization through Adversarial Training.
Incorporating BERT into Neural Machine Translation.
Structured Object-Aware Physics Prediction for Video Modeling and Planning.
Learning-Augmented Data Stream Algorithms.
On the Relationship between Self-Attention and Convolutional Layers.
SpikeGrad: An ANN-equivalent Computation Model for Implementing Backpropagation with Spikes.
Gradient $\ell_1$ Regularization for Quantization Robustness.
Toward Evaluating Robustness of Deep Reinforcement Learning with Continuous Control.
Decentralized Deep Learning with Arbitrary Communication Compression.
Training Generative Adversarial Networks from Incomplete Observations using Factorised Discriminators.
Combining Q-Learning and Search with Amortized Value Estimates.
Infinite-Horizon Differentiable Model Predictive Control.
Projection-Based Constrained Policy Optimization.
You Only Train Once: Loss-Conditional Training of Deep Networks.
GraphSAINT: Graph Sampling Based Inductive Learning Method.
Efficient Probabilistic Logic Reasoning with Graph Neural Networks.
Low-dimensional statistical manifold embedding of directed graphs.
RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated Environments.
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition.
Cross-Lingual Ability of Multilingual BERT: An Empirical Study.
Reducing Transformer Depth on Demand with Structured Dropout.
Neural Outlier Rejection for Self-Supervised Keypoint Learning.
B-Spline CNNs on Lie groups.
Quantifying Point-Prediction Uncertainty in Neural Networks via Residual Estimation with an I/O Kernel.
EMPIR: Ensembles of Mixed Precision Deep Networks for Increased Robustness Against Adversarial Attacks.
Learning To Explore Using Active Neural SLAM.
Understanding and Improving Information Transfer in Multi-Task Learning.
A Stochastic Derivative Free Optimization Method with Momentum.
Compressive Transformers for Long-Range Sequence Modelling.
Reinforced Genetic Algorithm Learning for Optimizing Computation Graphs.
Lagrangian Fluid Simulation with Continuous Convolutions.
Learning to Guide Random Search.
Robust anomaly detection and backdoor attack detection via differential privacy.
Deep probabilistic subsampling for task-adaptive compressed sensing.
Learning Robust Representations via Multi-View Information Bottleneck.
Batch-shaping for learning conditional channel gated networks.
Inductive and Unsupervised Representation Learning on Graph Structured Objects.
U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation.
Masked Based Unsupervised Content Transfer.
DropEdge: Towards Deep Graph Convolutional Networks on Node Classification.
Why Not to Use Zero Imputation? Correcting Sparsity Bias in Training Neural Networks.
Probability Calibration for Knowledge Graph Embedding Models.
On the Equivalence between Positional Node Embeddings and Structural Graph Representations.
Neural Epitome Search for Architecture-Agnostic Network Compression.
Hyper-SAGNN: a self-attention based graph neural network for hypergraphs.
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning.
On Solving Minimax Optimization Locally: A Follow-the-Ridge Approach.
Distributionally Robust Neural Networks.
Kernel of CycleGAN as a principal homogeneous space.
Don't Use Large Mini-batches, Use Local SGD.
Provable robustness against all adversarial $l_p$-perturbations for $p\geq 1$.
On Universal Equivariant Set Networks.
Tensor Decompositions for Temporal Knowledge Base Completion.
Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning.
Robustness Verification for Transformers.
Fantastic Generalization Measures and Where to Find Them.
Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks.
Weakly Supervised Disentanglement with Guarantees.
Discrepancy Ratio: Evaluating Model Performance When Even Experts Disagree on the Truth.
Abductive Commonsense Reasoning.
Variance Reduction With Sparse Gradients.
BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget.
Learning transport cost from subset correspondence.
Rényi Fair Inference.
Meta Dropout: Learning to Perturb Latent Features for Generalization.
Adversarial AutoAugment.
State-only Imitation with Transition Dynamics Mismatch.
Measuring and Improving the Use of Graph Information in Graph Neural Networks.
Universal Approximation with Certified Networks.
Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution.
Deep Symbolic Superoptimization Without Human Knowledge.
Sample Efficient Policy Gradient Methods with Recursive Variance Reduction.
Certified Defenses for Adversarial Patches.
Contrastive Representation Distillation.
A Framework for robustness Certification of Smoothed Classifiers using F-Divergences.
Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks.
Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering.
A Baseline for Few-Shot Image Classification.
Abstract Diagrammatic Reasoning with Multiplex Graph Networks.
Environmental drivers of systematicity and generalization in a situated agent.
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.
Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning.
Thinking While Moving: Deep Reinforcement Learning with Concurrent Control.
Jacobian Adversarially Regularized Networks for Robustness.
Towards Verified Robustness under Text Deletion Interventions.
RGBD-GAN: Unsupervised 3D Representation Learning From Natural Image Datasets via RGBD Image Synthesis.
Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks.
Plug and Play Language Models: A Simple Approach to Controlled Text Generation.
Rethinking the Hyperparameters for Fine-tuning.
Query2box: Reasoning over Knowledge Graphs in Vector Space Using Box Embeddings.
Implementing Inductive bias for different navigation tasks through diverse RNN attrractors.
PCMC-Net: Feature-based Pairwise Choice Markov Chains.
Multi-Agent Interactions Modeling with Correlated Policies.
Once-for-All: Train One Network and Specialize it for Efficient Deployment.
Generalized Convolutional Forest Networks for Domain Generalization and Visual Recognition.
SNODE: Spectral Discretization of Neural ODEs for System Identification.
Guiding Program Synthesis by Learning to Generate Examples.
Fast Neural Network Adaptation via Parameter Remapping and Architecture Search.
Exploratory Not Explanatory: Counterfactual Analysis of Saliency Maps for Deep Reinforcement Learning.
Meta-Learning Deep Energy-Based Memory Models.
Graph Convolutional Reinforcement Learning.
The Curious Case of Neural Text Degeneration.
Multilingual Alignment of Contextual Word Representations.
The Gambler's Problem and Beyond.
GraphAF: a Flow-based Autoregressive Model for Molecular Graph Generation.
Double Neural Counterfactual Regret Minimization.
Neural Policy Gradient Methods: Global Optimality and Rates of Convergence.
Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling Input-Adaptive Inference.
Dynamic Sparse Training: Find Efficient Sparse Network From Scratch With Trainable Masked Layers.
Beyond Linearization: On Quadratic and Higher-Order Approximation of Wide Neural Networks.
Deep Graph Matching Consensus.
Self-Supervised Learning of Appliance Usage.
Quantum Algorithms for Deep Convolutional Neural Networks.
Understanding l4-based Dictionary Learning: Interpretation, Stability, and Robustness.
Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control.
Controlling generative models with continuous factors of variations.
Simple and Effective Regularization Methods for Training on Noisily Labeled Data with Generalization Guarantee.
Unsupervised Clustering using Pseudo-semi-supervised Learning.
PairNorm: Tackling Oversmoothing in GNNs.
Black-box Off-policy Estimation for Infinite-Horizon Reinforcement Learning.
Empirical Studies on the Properties of Linear Regions in Deep Neural Networks.
SNOW: Subscribing to Knowledge via Channel Pooling for Transfer & Lifelong Learning of Convolutional Neural Networks.
Smoothness and Stability in GANs.
Adaptive Correlated Monte Carlo for Contextual Categorical Sequence Generation.
On Bonus Based Exploration Methods In The Arcade Learning Environment.
IMPACT: Importance Weighted Asynchronous Architectures with Clipped Target Networks.
HiLLoC: lossless image compression with hierarchical latent variable models.
Physics-aware Difference Graph Networks for Sparsely-Observed Dynamics.
Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks.
Learning representations for binary-classification without backpropagation.
Frequency-based Search-control in Dyna.
Towards Stable and Efficient Training of Verifiably Robust Neural Networks.
Iterative energy-based projection on a normal data manifold for anomaly localization.
Towards neural networks that provably know when they don't know.
BatchEnsemble: an Alternative Approach to Efficient Ensemble and Lifelong Learning.
Inductive representation learning on temporal graphs.
Generative Models for Effective ML on Private, Decentralized Datasets.
Picking Winning Tickets Before Training by Preserving Gradient Flow.
Curriculum Loss: Robust Learning and Generalization against Label Corruption.
Uncertainty-guided Continual Learning with Bayesian Neural Networks.
Training Recurrent Neural Networks Online by Learning Explicit State Variables.
Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework.
Robust Reinforcement Learning for Continuous Control with Model Misspecification.
Decoupling Representation and Classifier for Long-Tailed Recognition.
Learning from Unlabelled Videos Using Contrastive Predictive Neural 3D Mapping.
From Inference to Generation: End-to-end Fully Self-supervised Generation of Human Face from Speech.
LambdaNet: Probabilistic Type Inference using Graph Neural Networks.
Model-Augmented Actor-Critic: Backpropagating through Paths.
Variational Autoencoders for Highly Multivariate Spatial Point Processes Intensities.
Denoising and Regularization via Exploiting the Structural Bias of Convolutional Generators.
Revisiting Self-Training for Neural Sequence Generation.
Towards a Deep Network Architecture for Structured Smoothness.
On the Global Convergence of Training Deep Linear ResNets.
A Closer Look at the Optimization Landscapes of Generative Adversarial Networks.
Biologically inspired sleep algorithm for increased generalization and adversarial robustness in deep neural networks.
Distributed Bandit Learning: Near-Optimal Regret with Efficient Communication.
Shifted and Squeezed 8-bit Floating Point format for Low-Precision Training of Deep Neural Networks.
Intriguing Properties of Adversarial Training at Scale.
Deep Double Descent: Where Bigger Models and More Data Hurt.
Decoding As Dynamic Programming For Recurrent Autoregressive Models.
Synthesizing Programmatic Policies that Inductively Generalize.
Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention.
Generalization through Memorization: Nearest Neighbor Language Models.
Single Episode Policy Transfer in Reinforcement Learning.
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization.
NeurQuRI: Neural Question Requirement Inspector for Answerability Prediction in Machine Reading Comprehension.
The Early Phase of Neural Network Training.
RNNs Incrementally Evolving on an Equilibrium Manifold: A Panacea for Vanishing and Exploding Gradients?
Extreme Tensoring for Low-Memory Preconditioning.
Non-Autoregressive Dialog State Tracking.
Bayesian Meta Sampling for Fast Uncertainty Adaptation.
Economy Statistical Recurrent Units For Inferring Nonlinear Granger Causality.
MEMO: A Deep Network for Flexible Combination of Episodic Memories.
Probabilistic Connection Importance Inference and Lossless Compression of Deep Neural Networks.
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization.
Jelly Bean World: A Testbed for Never-Ending Learning.
Learning from Explanations with Neural Execution Tree.
Discovering Motor Programs by Recomposing Demonstrations.
Convergence of Gradient Methods on Bilinear Zero-Sum Games.
Composing Task-Agnostic Policies with Deep Reinforcement Learning.
The Local Elasticity of Neural Networks.
Gradient-Based Neural DAG Learning.
Composition-based Multi-Relational Graph Convolutional Networks.
Capsules with Inverted Dot-Product Attention Routing.
FSNet: Compression of Deep Convolutional Neural Networks by Filter Summary.
On the Need for Topology-Aware Generative Models for Manifold-Based Defenses.
Neural Execution of Graph Algorithms.
BERTScore: Evaluating Text Generation with BERT.
Augmenting Non-Collaborative Dialog Systems with Explicit Semantic and Strategic Dialog History.
Prediction, Consistency, Curvature: Representation Learning for Locally-Linear Control.
Graph Constrained Reinforcement Learning for Natural Language Action Spaces.
Towards Fast Adaptation of Neural Architectures with Meta Learning.
Variational Hetero-Encoder Randomized GANs for Joint Image-Text Modeling.
Higher-Order Function Networks for Learning Composable 3D Object Representations.
Neural Module Networks for Reasoning over Text.
Improved memory in recurrent neural networks with sequential non-normal dynamics.
Learn to Explain Efficiently via Neural Logic Inductive Learning.
Improving Neural Language Generation with Spectrum Control.
Span Recovery for Deep Neural Networks with Applications to Input Obfuscation.
Oblique Decision Trees from Derivatives of ReLU Networks.
Precision Gating: Improving Neural Network Efficiency with Dynamic Dual-Precision Activations.
PAC Confidence Sets for Deep Neural Networks via Calibrated Prediction.
DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames.
Learning to Learn by Zeroth-Order Oracle.
MetaPix: Few-Shot Video Retargeting.
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum.
Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving.
Four Things Everyone Should Know to Improve Batch Normalization.
Learning to solve the credit assignment problem.
Sampling-Free Learning of Bayesian Quantized Neural Networks.
DeFINE: Deep Factorized Input Token Embeddings for Neural Sequence Modeling.
DBA: Distributed Backdoor Attacks against Federated Learning.
Fast is better than free: Revisiting adversarial training.
Thieves on Sesame Street! Model Extraction of BERT-based APIs.
Understanding Knowledge Distillation in Non-autoregressive Machine Translation.
Locality and Compositionality in Zero-Shot Learning.
Recurrent neural circuits for contour detection.
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation.
Intrinsic Motivation for Encouraging Synergistic Behavior.
RaCT: Toward Amortized Ranking-Critical Training For Collaborative Filtering.
Sign-OPT: A Query-Efficient Hard-label Adversarial Attack.
Playing the lottery with rewards and multiple languages: lottery tickets in RL and NLP.
Learning Space Partitions for Nearest Neighbor Search.
DeepV2D: Video to Depth with Differentiable Structure from Motion.
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets.
Robust And Interpretable Blind Image Denoising Via Bias-Free Convolutional Neural Networks.
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning.
Deep Imitative Models for Flexible Inference, Planning, and Control.
Pre-training Tasks for Embedding-based Large-scale Retrieval.
Are Transformers universal approximators of sequence-to-sequence functions?
Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples.
One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation.
Differentially Private Meta-Learning.
Bridging Mode Connectivity in Loss Landscapes and Adversarial Robustness.
Overlearning Reveals Sensitive Attributes.
Adversarially robust transfer learning.
Learning to Link.
Detecting Extrapolation with Local Ensembles.
Global Relational Models of Source Code.
Selection via Proxy: Efficient Data Selection for Deep Learning.
Short and Sparse Deconvolution - A Geometric Approach.
Stochastic Weight Averaging in Parallel: Large-Batch Training That Generalizes Well.
Adjustable Real-time Style Transfer.
Unpaired Point Cloud Completion on Real Scans using Adversarial Training.
Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform.
AMRL: Aggregated Memory For Reinforcement Learning.
Scalable Model Compression by Entropy Penalized Reparameterization.
Dynamic Time Lag Regression: Predicting What & When.
Semi-Supervised Generative Modeling for Controllable Speech Synthesis.
Neural Text Generation With Unlikelihood Training.
Pure and Spurious Critical Points: a Geometric Study of Linear Networks.
Learning Heuristics for Quantified Boolean Formulas through Reinforcement Learning.
CAQL: Continuous Action Q-Learning.
Adaptive Structural Fingerprints for Graph Attention Networks.
ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring.
Identity Crisis: Memorization and Generalization Under Extreme Overparameterization.
Improved Sample Complexities for Deep Neural Networks and Robust Classification via an All-Layer Margin.
Augmenting Genetic Algorithms with Deep Neural Networks for Exploring the Chemical Space.
Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML.
Imitation Learning via Off-Policy Distribution Matching.
Reanalysis of Variance Reduced Temporal Difference Learning.
Minimizing FLOPs to Learn Efficient Sparse Representations.
Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints.
Deep Semi-Supervised Anomaly Detection.
Sign Bits Are All You Need for Black-Box Attacks.
Reinforced active learning for image segmentation.
On the "steerability" of generative adversarial networks.
Learned Step Size quantization.
Stochastic Conditional Generative Networks with Basis Decomposition.
Language GANs Falling Short.
GENESIS: Generative Scene Inference and Sampling with Object-Centric Latent Representations.
Understanding the Limitations of Variational Mutual Information Estimators.
Feature Interaction Interpretability: A Case for Explaining Ad-Recommendation Systems via Neural Interaction Detection.
Unsupervised Model Selection for Variational Disentangled Representation Learning.
A Theoretical Analysis of the Number of Shots in Few-Shot Learning.
Dynamical Distance Learning for Semi-Supervised and Unsupervised Skill Discovery.
On the Variance of the Adaptive Learning Rate and Beyond.
Quantifying the Cost of Reliable Photo Authentication via High-Performance Learned Lossy Representations.
Option Discovery using Deep Skill Chaining.
V4D: 4D Convolutional Neural Networks for Video-level Representation Learning.
Learning to Represent Programs with Property Signatures.
Generative Ratio Matching Networks.
In Search for a SAT-friendly Binarized Neural Network Architecture.
Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games.
The asymptotic spectrum of the Hessian of DNN throughout training.
Tree-Structured Attention with Hierarchical Accumulation.
Deep 3D Pan via local adaptive "t-shaped" convolutions with global and local adaptive dilations.
Low-Resource Knowledge-Grounded Dialogue Generation.
A Target-Agnostic Attack on Deep Models: Exploiting Security Vulnerabilities of Transfer Learning.
On Computation and Generalization of Generative Adversarial Imitation Learning.
Few-Shot Learning on graphs via super-Classes based on Graph spectral Measures.
Multiplicative Interactions and Where to Find Them.
Continual Learning with Bayesian Neural Networks for Non-Stationary Data.
SAdam: A Variant of Adam for Strongly Convex Functions.
Generalization bounds for deep convolutional neural networks.
A Fair Comparison of Graph Neural Networks for Graph Classification.
Finding and Visualizing Weaknesses of Deep Reinforcement Learning Agents.
Computation Reallocation for Object Detection.
From Variational to Deterministic Autoencoders.
Adversarially Robust Representations with Smooth Encoders.
AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures.
ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning.
Order Learning and Its Application to Age Estimation.
Efficient and Information-Preserving Future Frame Prediction and Beyond.
NAS evaluation is frustratingly hard.
CLN2INV: Learning Loop Invariants with Continuous Logic Networks.
Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base.
A Constructive Prediction of the Generalization Error Across Scales.
An Inductive Bias for Distances: Neural Nets that Respect the Triangle Inequality.
Physics-as-Inverse-Graphics: Unsupervised Physical Parameter Estimation from Video.
Counterfactuals uncover the modular structure of deep generative models.
Gap-Aware Mitigation of Gradient Staleness.
Ensemble Distribution Distillation.
Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation.
VL-BERT: Pre-training of Generic Visual-Linguistic Representations.
Optimistic Exploration even with a Pessimistic Initialisation.
Certified Robustness for Top-k Predictions against Adversarial Perturbations via Randomized Smoothing.
Identifying through Flows for Recovering Latent Representations.
Robust training with ensemble consensus.
Self-Adversarial Learning with Comparative Discrimination for Text Generation.
Vid2Game: Controllable Characters Extracted from Real-World Videos.
Action Semantics Network: Considering the Effects of Actions in Multiagent Systems.
Learning Efficient Parameter Server Synchronization Policies for Distributed SGD.
Relational State-Space Model for Stochastic Multi-Object Systems.
Piecewise linear activations substantially shape the loss surfaces of neural networks.
Novelty Detection Via Blurring.
Bounds on Over-Parameterization for Guaranteed Existence of Descent Paths in Shallow ReLU Networks.
Data-Independent Neural Pruning via Coresets.
Deep Network Classification by Scattering and Homotopy Dictionary Learning.
Q-learning with UCB Exploration is Sample Efficient for Infinite-Horizon MDP.
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models.
I Am Going MAD: Maximum Discrepancy Competition for Comparing Classifiers Adaptively.
Black-Box Adversarial Attack with Transferable Model-based Embedding.
Compositional languages emerge in a neural iterated learning model.
Population-Guided Parallel Policy Search for Reinforcement Learning.
Variational Recurrent Models for Solving Partially Observable Control Tasks.
GAT: Generative Adversarial Training for Adversarial Example Detection and Robust Classification.
Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions.
MACER: Attack-free and Scalable Robust Training via Maximizing Certified Radius.
Semantically-Guided Representation Learning for Self-Supervised Monocular Depth.
Stochastic AUC Maximization with Deep Neural Networks.
Difference-Seeking Generative Adversarial Network-Unseen Sample Generation.
FasterSeg: Searching for Faster Real-time Semantic Segmentation.
Learning Execution through Neural Code fusion.
Editable Neural Networks.
Can gradient clipping mitigate label noise?
Pretrained Encyclopedia: Weakly Supervised Knowledge-Pretrained Language Model.
Pruned Graph Scattering Transforms.
GLAD: Learning Sparse Graph Recovery.
VideoFlow: A Conditional Flow-Based Model for Stochastic Video Generation.
Adversarial Policies: Attacking Deep Reinforcement Learning.
Escaping Saddle Points Faster with Stochastic Momentum.
Few-shot Text Classification with Distributional Signatures.
Geometric Insights into the Convergence of Nonlinear TD Learning.
Learning Self-Correctable Policies and Value Functions from Demonstrations with Negative Sampling.
Exploring Model-based Planning with Policy Networks.
On Identifiability in Transformers.
Automated curriculum generation through setter-solver interactions.
Progressive Memory Banks for Incremental Domain Adaptation.
What graph neural networks cannot learn: depth vs width.
RTFM: Generalising to New Environment Dynamics via Reading.
Input Complexity and Out-of-distribution Detection with Likelihood-based Generative Models.
Functional vs. parametric equivalence of ReLU networks.
Disentangling Factors of Variations Using Few Labels.
A critical analysis of self-supervision, or what we can learn from a single image.
Accelerating SGD with momentum for over-parameterized learning.
Interpretable Complex-Valued Neural Networks for Privacy Protection.
V-MPO: On-Policy Maximum a Posteriori Policy Optimization for Discrete and Continuous Control.
Improving Adversarial Robustness Requires Revisiting Misclassified Examples.
DivideMix: Learning with Noisy Labels as Semi-supervised Learning.
Fooling Detection Alone is Not Enough: Adversarial Attack against Multiple Object Tracking.
Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards.
Logic and the 2-Simplicial Transformer.
Ae-OT: a New Generative Model based on Extended Semi-discrete Optimal transport.
Exploration in Reinforcement Learning with Deep Covering Options.
Learning Disentangled Representations for CounterFactual Regression.
Analysis of Video Feature Learning in Two-Stream CNNs on the Example of Zebrafish Swim Bout Classification.
Robust Local Features for Improving the Generalization of Adversarial Training.
Reinforcement Learning with Competitive Ensembles of Information-Constrained Primitives.
Learning the Arrow of Time for Problems in Reinforcement Learning.
The Variational Bandwidth Bottleneck: Stochastic Evaluation on an Information Budget.
The Implicit Bias of Depth: How Incremental Learning Drives Generalization.
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness.
Measuring Compositional Generalization: A Comprehensive Method on Realistic Data.
Theory and Evaluation Metrics for Learning Disentangled Representations.
Mixup Inference: Better Exploiting Mixup to Defend Adversarial Attacks.
Dynamically Pruned Message Passing Networks for Large-scale Knowledge Graph Reasoning.
Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction.
FSPool: Learning Set Representations with Featurewise Sort Pooling.
Multi-agent Reinforcement Learning for Networked System Control.
Hierarchical Foresight: Self-Supervised Learning of Long-Horizon Tasks via Visual Subgoal Generation.
Neural Stored-program Memory.
ES-MAML: Simple Hessian-Free Meta Learning.
TabFact: A Large-scale Dataset for Table-based Fact Verification.
Implicit Bias of Gradient Descent based Adversarial Training on Separable Data.
Image-guided Neural Object Rendering.
Knowledge Consistency between Neural Networks and Beyond.
Lazy-CFR: fast and near-optimal regret minimization for extensive games with imperfect information.
Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks.
Enhancing Transformation-Based Defenses Against Adversarial Attacks with a Distribution Classifier.
Observational Overfitting in Reinforcement Learning.
On Mutual Information Maximization for Representation Learning.
Ranking Policy Gradient.
SVQN: Sequential Variational Soft Q-Learning Networks.
Understanding Architectures Learnt by Cell-based Neural Architecture Search.
AutoQ: Automated Kernel-Wise Neural Network Quantization.
Poly-encoders: Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring.
A Learning-based Iterative Method for Solving Vehicle Routing Problems.
Transferable Perturbations of Deep Feature Distributions.
Continual Learning with Adaptive Weights (CLAW).
Scalable and Order-robust Continual Learning with Additive Parameter Decomposition.
Weakly Supervised Clustering by Exploiting Unique Class Count.
Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware.
To Relieve Your Headache of Training an MRF, Take AdVIL.
Automated Relational Meta-learning.
N-BEATS: Neural basis expansion analysis for interpretable time series forecasting.
Deep Learning of Determinantal Point Processes via Proper Spectral Sub-gradient.
Distance-Based Learning from Errors for Confidence Calibration.
Curvature Graph Network.
Learning Expensive Coordination: An Event-Based Deep RL Approach.
LAMOL: LAnguage MOdeling for Lifelong Language Learning.
ProxSGD: Training Structured Neural Networks under Regularization and Constraints.
Diverse Trajectory Forecasting with Determinantal Point Processes.
Evaluating The Search Phase of Neural Architecture Search.
DeepHoyer: Learning Sparser Neural Network with Differentiable Scale-Invariant Sparsity Measures.
Depth-Adaptive Transformer.
Federated Adversarial Domain Adaptation.
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning.
Automatically Discovering and Learning New Visual Categories with Ranking Statistics.
Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification.
Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data.
SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards.
Graph inference learning for semi-supervised classification.
Learning deep graph matching with channel-independent embedding and Hungarian attention.
StructPool: Structured Graph Pooling via Conditional Random Fields.
On the Weaknesses of Reinforcement Learning for Neural Machine Translation.
Sharing Knowledge in Multi-Task Deep Reinforcement Learning.
Reinforcement Learning Based Graph-to-Sequence Model for Natural Question Generation.
SELF: Learning to Filter Noisy Labels with Self-Ensembling.
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes.