iclr13

ICLR 2021 论文列表

9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021.

Evaluation of Neural Architectures trained with square Loss vs Cross-Entropy in Classification Tasks.
Explainable Subgraph Reasoning for Forecasting on Temporal Knowledge Graphs.
Simple Spectral Graph Convolution.
PolarNet: Learning to Optimize Polar Keypoints for Keypoint Based Object Detection.
Deconstructing the Regularization of BatchNorm.
Generative Scene Graph Networks.
Learnable Embedding sizes for Recommender Systems.
Overfitting for Fun and Profit: Instance-Adaptive Data Compression.
Acting in Delayed Environments with Non-Stationary Markov Policies.
ARMOURED: Adversarially Robust MOdels using Unlabeled data by REgularizing Diversity.
Solving Compositional Reinforcement Learning Problems via Task Reduction.
A Geometric Analysis of Deep Generative Image Models and Its Applications.
Fast and Complete: Enabling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers.
Is Attention Better Than Matrix Decomposition?
Communication in Multi-Agent Reinforcement Learning: Intention Sharing.
A Discriminative Gaussian Mixture Model with Sparsity.
Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels.
New Bounds For Distributed Mean Estimation and Variance Reduction.
Certify or Predict: Boosting Certified Robustness with Compositional Architectures.
Taming GANs with Lookahead-Minmax.
Bowtie Networks: Generative Modeling for Joint Few-Shot Recognition and Novel-View Synthesis.
Learning Subgoal Representations with Slow Dynamics.
GAN2GAN: Generative Noise Learning for Blind Denoising with Single Noisy Images.
CO2: Consistent Contrast for Unsupervised Visual Representation Learning.
CPR: Classifier-Projection Regularization for Continual Learning.
Fooling a Complete Neural Network Verifier.
Representation Learning via Invariant Causal Mechanisms.
Interpreting and Boosting Dropout from a Game-Theoretic View.
BOIL: Towards Representation Change for Few-shot Learning.
Generating Adversarial Computer Programs using Optimized Obfuscations.
On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis.
FOCAL: Efficient Fully-Offline Meta-Reinforcement Learning via Distance Metric Learning and Behavior Regularization.
Seq2Tens: An Efficient Representation of Sequences by Low-Rank Tensor Projections.
Neural Jump Ordinary Differential Equations: Consistent Continuous-Time Prediction and Filtering.
Hyperbolic Neural Networks++.
Parameter-Based Value Functions.
Bayesian Few-Shot Classification with One-vs-Each Pólya-Gamma Augmented Gaussian Processes.
Spatially Structured Recurrent Modules.
Teaching Temporal Logics to Neural Networks.
Robust Learning of Fixed-Structure Bayesian Networks in Nearly-Linear Time.
Reset-Free Lifelong Learning with Skill-Space Planning.
TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks.
Adapting to Reward Progressivity via Spectral Reinforcement Learning.
Attentional Constellation Nets for Few-Shot Learning.
Decentralized Attribution of Generative Models.
Randomized Ensembled Double Q-Learning: Learning Fast Without a Model.
Learning Value Functions in Deep Policy Gradients using Residual Variance.
Knowledge Distillation as Semiparametric Inference.
Combining Physics and Machine Learning for Network Flow Estimation.
Personalized Federated Learning with First Order Model Optimization.
Large Batch Simulation for Deep Reinforcement Learning.
MetaNorm: Learning to Normalize Few-Shot Batches Across Domains.
Byzantine-Resilient Non-Convex Stochastic Gradient Descent.
Accurate Learning of Graph Representations with Graph Multiset Pooling.
NBDT: Neural-Backed Decision Tree.
Estimating informativeness of samples with Smooth Unique Information.
Estimating Lipschitz constants of monotone deep equilibrium models.
GraPPa: Grammar-Augmented Pre-Training for Table Semantic Parsing.
Improving Zero-Shot Voice Style Transfer via Disentangled Representation Learning.
HyperDynamics: Meta-Learning Object and Agent Dynamics with Hypernetworks.
Anytime Sampling for Autoregressive Models via Ordered Autoencoding.
Disentangling 3D Prototypical Networks for Few-Shot Concept Learning.
Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning.
Generating Furry Cars: Disentangling Object Shape and Appearance across Multiple Domains.
Learning to Make Decisions via Submodular Regularization.
Adaptive and Generative Zero-Shot Learning.
CompOFA - Compound Once-For-All Networks for Faster Multi-Platform Deployment.
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs.
Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning.
Batch Reinforcement Learning Through Continuation Method.
Class Normalization for (Continual)? Generalized Zero-Shot Learning.
VTNet: Visual Transformer Network for Object Goal Navigation.
More or Less: When and How to Build Convolutional Neural Network Ensembles.
Efficient Empowerment Estimation for Unsupervised Stabilization.
SOLAR: Sparse Orthogonal Learned and Random Embeddings.
Combining Ensembles and Data Augmentation Can Harm Your Calibration.
Fourier Neural Operator for Parametric Partial Differential Equations.
Saliency is a Possible Red Herring When Diagnosing Poor Generalization.
Provably robust classification of adversarial examples with detection.
SAFENet: A Secure, Accurate and Fast Neural Network Inference.
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning.
Combining Label Propagation and Simple Models out-performs Graph Neural Networks.
Local Search Algorithms for Rank-Constrained Convex Optimization.
Pre-training Text-to-Text Transformers for Concept-centric Common Sense.
Decoupling Global and Local Representations via Invertible Generative Flows.
SCoRe: Pre-Training for Context Representation in Conversational Semantic Parsing.
Evaluating the Disentanglement of Deep Generative Models through Manifold Topology.
End-to-End Egospheric Spatial Memory.
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification.
BREEDS: Benchmarks for Subpopulation Shift.
PAC Confidence Predictions for Deep Neural Network Classifiers.
Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning.
Sparse encoding for more-interpretable feature-selecting representations in probabilistic matrix factorization.
Hopper: Multi-hop Transformer for Spatiotemporal Reasoning.
Prediction and generalisation over directed actions by grid cells.
Why resampling outperforms reweighting for correcting sampling bias with stochastic gradients.
MELR: Meta-Learning via Modeling Episode-Level Relationships for Few-Shot Learning.
In Defense of Pseudo-Labeling: An Uncertainty-Aware Pseudo-label Selection Framework for Semi-Supervised Learning.
Vector-output ReLU Neural Network Problems are Copositive Programs: Convex Analysis of Two Layer Networks and Polynomial-time Algorithms.
One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks.
AdaFuse: Adaptive Temporal Fusion Network for Efficient Action Recognition.
Dataset Meta-Learning from Kernel Ridge-Regression.
Repurposing Pretrained Models for Robust Out-of-domain Few-Shot Learning.
On Position Embeddings in BERT.
Conditional Negative Sampling for Contrastive Learning of Visual Representations.
Learning and Evaluating Representations for Deep One-Class Classification.
Faster Binary Embeddings for Preserving Euclidean Distances.
Coping with Label Shift via Distributionally Robust Optimisation.
On Dyadic Fairness: Exploring and Mitigating Bias in Graph Connections.
Model-Based Offline Planning.
Neural Networks for Learning Counterfactual G-Invariances from Single Environments.
Learning Energy-Based Models by Diffusion Recovery Likelihood.
QPLEX: Duplex Dueling Multi-Agent Q-Learning.
Directed Acyclic Graph Neural Networks.
Does enhanced shape bias improve neural network robustness to common corruptions?
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning.
PDE-Driven Spatiotemporal Disentanglement.
Mapping the Timescale Organization of Neural Language Models.
X2T: Training an X-to-Text Typing Interface with Online Learning from User Feedback.
Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models.
CoCo: Controllable Counterfactuals for Evaluating Dialogue State Trackers.
Beyond Categorical Label Representations for Image Classification.
Long Range Arena : A Benchmark for Efficient Transformers.
Generalized Energy Based Models.
SALD: Sign Agnostic Learning with Derivatives.
WaveGrad: Estimating Gradients for Waveform Generation.
Learning advanced mathematical computations from examples.
Linear Last-iterate Convergence in Constrained Saddle-point Optimization.
Go with the flow: Adaptive control for Neural ODEs.
Understanding Over-parameterization in Generative Adversarial Networks.
Multiscale Score Matching for Out-of-Distribution Detection.
Tradeoffs in Data Augmentation: An Empirical Study.
Rapid Task-Solving in Novel Environments.
Temporally-Extended ε-Greedy Exploration.
Extracting Strong Policies for Robotics Tasks from Zero-Order Trajectory Optimizers.
Global optimality of softmax policy gradient with single hidden layer neural networks in the mean-field regime.
Unsupervised Discovery of 3D Physical Objects from Video.
Sample-Efficient Automated Deep Reinforcement Learning.
Learning Structural Edits via Incremental Tree Transformations.
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis.
Private Post-GAN Boosting.
Modeling the Second Player in Distributionally Robust Optimization.
Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks.
Multiplicative Filter Networks.
Representation learning for improved interpretability and classification accuracy of clinical factors from EEG.
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.
Multi-Level Local SGD: Distributed SGD for Heterogeneous Hierarchical Networks.
R-GAP: Recursive Gradient Attack on Privacy.
Isometric Transformation Invariant and Equivariant Graph Convolutional Networks.
Chaos of Learning Beyond Zero-sum and Coordination via Game Decompositions.
Grounding Language to Autonomously-Acquired Skills via Goal Generation.
Trajectory Prediction using Equivariant Continuous Convolution.
On the role of planning in model-based deep reinforcement learning.
A Hypergradient Approach to Robust Regression without Correspondence.
Fast convergence of stochastic subgradient method under interpolation.
Representing Partial Programs with Blended Abstract Semantics.
Meta-learning with negative learning rates.
Wasserstein Embedding for Graph Learning.
Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization.
Shape or Texture: Understanding Discriminative Features in CNNs.
Neurally Augmented ALISTA.
Learning from Demonstration with Weakly Supervised Disentanglement.
On Data-Augmentation and Consistency-Based Semi-Supervised Learning.
Unsupervised Meta-Learning through Latent-Space Interpolation in Generative Models.
Learning N: M Fine-grained Structured Sparse Neural Networks From Scratch.
The role of Disentanglement in Generalisation.
Shapley Explanation Networks.
C-Learning: Horizon-Aware Cumulative Accessibility Estimation.
Multi-resolution modeling of a discrete stochastic process identifies causes of cancer.
PC2WF: 3D Wireframe Reconstruction from Raw Point Clouds.
Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning.
DINO: A Conditional Energy-Based GAN for Domain Translation.
HeteroFL: Computation and Communication Efficient Federated Learning for Heterogeneous Clients.
AdaSpeech: Adaptive Text to Speech for Custom Voice.
Few-Shot Bayesian Optimization with Deep Kernel Surrogates.
Simple Augmentation Goes a Long Way: ADRL for DNN Quantization.
Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds.
SkipW: Resource Adaptable RNN with Strict Upper Computational Limit.
Pruning Neural Networks at Initialization: Why Are We Missing the Mark?
Analyzing the Expressive Power of Graph Neural Networks in a Spectral Perspective.
On the Origin of Implicit Regularization in Stochastic Gradient Descent.
Transient Non-stationarity and Generalisation in Deep Reinforcement Learning.
Adversarial score matching and improved sampling for image generation.
LiftPool: Bidirectional ConvNet Pooling.
Exemplary Natural Images Explain CNN Activations Better than State-of-the-Art Feature Visualization.
Scalable Bayesian Inverse Reinforcement Learning.
Return-Based Contrastive Representation Learning for Reinforcement Learning.
Implicit Gradient Regularization.
Variational Intrinsic Control Revisited.
Bayesian Context Aggregation for Neural Processes.
Understanding and Improving Lexical Choice in Non-Autoregressive Translation.
Generalized Variational Continual Learning.
FedMix: Approximation of Mixup under Mean Augmented Federated Learning.
Relating by Contrasting: A Data-efficient Framework for Multimodal Generative Models.
Rapid Neural Architecture Search by Learning to Generate Graphs from Datasets.
Sharper Generalization Bounds for Learning with Gradient-dominated Objective Functions.
Provable Rich Observation Reinforcement Learning with Combinatorial Latent States.
Learning to Sample with Local and Global Contexts in Experience Replay Buffer.
Gradient Origin Networks.
Nonseparable Symplectic Neural Networks.
Revisiting Locally Supervised Learning: an Alternative to End-to-end Training.
Deep Repulsive Clustering of Ordered Data Based on Order-Identity Decomposition.
Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling.
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning.
Reweighting Augmented Samples by Minimizing the Maximal Expected Loss.
Identifying Physical Law of Hamiltonian Systems via Meta-Learning.
Robust early-learning: Hindering the memorization of noisy labels.
Monte-Carlo Planning and Learning with Language Action Value Estimates.
Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration.
DrNAS: Dirichlet Neural Architecture Search.
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective.
Graph Edit Networks.
Capturing Label Characteristics in VAEs.
Neural Delay Differential Equations.
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning.
Group Equivariant Stand-Alone Self-Attention For Vision.
Risk-Averse Offline Reinforcement Learning.
Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech.
Learning Hyperbolic Representations of Topological Features.
Lipschitz Recurrent Neural Networks.
Explaining the Efficacy of Counterfactually Augmented Data.
Refining Deep Generative Models via Discriminator Gradient Flow.
Layer-adaptive Sparsity for the Magnitude-based Pruning.
Prototypical Representation Learning for Relation Extraction.
Learning Reasoning Paths over Semantic Graphs for Video-grounded Dialogues.
When does preconditioning help or hurt generalization?
Group Equivariant Conditional Neural Processes.
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences.
Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies.
Molecule Optimization by Explainable Evolution.
Predicting Inductive Biases of Pre-Trained Models.
Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing.
Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation.
Multi-timescale Representation Learning in LSTM Language Models.
Adaptive Procedural Task Generation for Hard-Exploration Problems.
Interactive Weak Supervision: Learning Useful Heuristics for Data Labeling.
Extreme Memorization via Scale of Initialization.
Prototypical Contrastive Learning of Unsupervised Representations.
Learning from others' mistakes: Avoiding dataset biases without modeling them.
LowKey: Leveraging Adversarial Attacks to Protect Social Media Users from Facial Recognition.
WaNet - Imperceptible Warping-based Backdoor Attack.
Neural representation and generation for RNA secondary structures.
Can a Fruit Fly Learn Word Embeddings?
RNNLogic: Learning Logic Rules for Reasoning on Knowledge Graphs.
Evaluations and Methods for Explanation through Robustness Analysis.
gradSim: Differentiable simulation for system identification and visuomotor control.
Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization.
Spatio-Temporal Graph Scattering Transform.
Isotropy in the Contextual Embedding Space: Clusters and Manifolds.
Reinforcement Learning with Random Delays.
Deep Learning meets Projective Clustering.
Dual-mode ASR: Unify and Improve Streaming ASR with Full-context Modeling.
On Graph Neural Networks versus Graph-Augmented MLPs.
NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation.
MoVie: Revisiting Modulated Convolutions for Visual Counting and Beyond.
NOVAS: Non-convex Optimization via Adaptive Stochastic Search for End-to-end Learning and Control.
Understanding the effects of data parallelism and sparsity on neural network training.
Planning from Pixels using Inverse Dynamics Models.
Benchmarks for Deep Off-Policy Evaluation.
BiPointNet: Binary Neural Network for Point Clouds.
Meta-Learning of Structured Task Distributions in Humans and Machines.
Training independent subnetworks for robust prediction.
Better Fine-Tuning by Reducing Representational Collapse.
Selective Classification Can Magnify Disparities Across Groups.
Zero-shot Synthesis with Group-Supervised Learning.
Learning Task-General Representations with Generative Neuro-Symbolic Modeling.
BERTology Meets Biology: Interpreting Attention in Protein Language Models.
AutoLRS: Automatic Learning-Rate Schedule by Bayesian Optimization on the Fly.
BSQ: Exploring Bit-Level Sparsity for Mixed-Precision Neural Network Quantization.
Economic Hyperparameter Optimization with Blended Search Strategy.
Average-case Acceleration for Bilinear Games and Normal Matrices.
Multi-Prize Lottery Ticket Hypothesis: Finding Accurate Binary Neural Networks by Pruning A Randomly Weighted Network.
IsarStep: a Benchmark for High-level Mathematical Reasoning.
Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis.
Genetic Soft Updates for Policy Evolution in Deep Reinforcement Learning.
Distance-Based Regularisation of Deep Networks for Fine-Tuning.
Ringing ReLUs: Harmonic Distortion Analysis of Nonlinear Feedforward Networks.
No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks.
Efficient Continual Learning with Modular Networks and Task-Driven Priors.
Activation-level uncertainty in deep neural networks.
Local Convergence Analysis of Gradient Descent Ascent with Finite Timescale Separation.
Scaling the Convex Barrier with Active Sets.
NAS-Bench-ASR: Reproducible Neural Architecture Search for Speech Recognition.
PseudoSeg: Designing Pseudo Labels for Semantic Segmentation.
Symmetry-Aware Actor-Critic for 3D Molecular Design.
Long Live the Lottery: The Existence of Winning Tickets in Lifelong Learning.
Robust Overfitting may be mitigated by properly learned smoothening.
Characterizing signal propagation to close the performance gap in unnormalized ResNets.
Learning continuous-time PDEs from sparse data with graph neural networks.
Latent Skill Planning for Exploration and Transfer.
Uncertainty-aware Active Learning for Optimal Bayesian Classifier.
Self-supervised Adversarial Robustness for the Low-label, High-data Regime.
Single-Photon Image Classification.
CcGAN: Continuous Conditional Generative Adversarial Networks for Image Generation.
Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks.
ANOCE: Analysis of Causal Effects with Multiple Mediators via Constrained Structural Learning.
Transformer protein language models are unsupervised structure learners.
Uncertainty Estimation in Autoregressive Structured Prediction.
Learning to live with Dale's principle: ANNs with separate excitatory and inhibitory units.
CT-Net: Channel Tensorization Network for Video Classification.
On the Universality of Rotation Equivariant Point Cloud Networks.
Universal approximation power of deep residual neural networks via nonlinear control theory.
Learning a Latent Search Space for Routing Problems using Variational Autoencoders.
A teacher-student framework to distill future trajectories.
What they do when in doubt: a study of inductive biases in seq2seq learners.
Group Equivariant Generative Adversarial Networks.
CoCon: A Self-Supervised Approach for Controlled Text Generation.
Cross-Attentional Audio-Visual Fusion for Weakly-Supervised Action Localization.
Robust Curriculum Learning: from clean label detection to noisy label self-correction.
In Search of Lost Domain Generalization.
Graph Information Bottleneck for Subgraph Recognition.
Online Adversarial Purification based on Self-supervised Learning.
Learning Deep Features in Instrumental Variable Regression.
Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning.
Differentiable Segmentation of Sequences.
Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation.
Network Pruning That Matters: A Case Study on Retraining Variants.
Degree-Quant: Quantization-Aware Training for Graph Neural Networks.
Boost then Convolve: Gradient Boosting Meets Graph Neural Networks.
Learning Associative Inference Using Fast Weight Memory.
Task-Agnostic Morphology Evolution.
SaliencyMix: A Saliency Guided Data Augmentation Strategy for Better Regularization.
Differentiable Trust Region Layers for Deep Reinforcement Learning.
Discovering Non-monotonic Autoregressive Orderings with Variational Inference.
Rethinking Positional Encoding in Language Pre-training.
Improving Relational Regularized Autoencoders with Spherical Sliced Fused Gromov Wasserstein.
Calibration of Neural Networks using Splines.
Exploring Balanced Feature Spaces for Representation Learning.
UMEC: Unified model and embedding compression for efficient recommendation systems.
Adversarially-Trained Deep Nets Transfer Better: Illustration on Image Classification.
Learning Task Decomposition with Ordered Memory Policy Network.
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning.
SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning.
VA-RED2: Video Adaptive Redundancy Reduction.
On InstaHide, Phase Retrieval, and Sparse Matrix Factorization.
Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity.
HyperGrid Transformers: Towards A Single Model for Multiple Tasks.
Statistical inference for individual fairness.
Towards Robust Neural Networks via Close-loop Control.
Measuring Massive Multitask Language Understanding.
Kanerva++: Extending the Kanerva Machine With Differentiable, Locally Block Allocated Latent Memory.
Aligning AI With Shared Human Values.
Learning Manifold Patch-Based Representations of Man-Made Shapes.
Filtered Inner Product Projection for Crosslingual Embedding Alignment.
Progressive Skeletonization: Trimming more fat from a network at initialization.
Learning What To Do by Simulating the Past.
High-Capacity Expert Binary Networks.
Fuzzy Tiling Activations: A Simple Approach to Learning Sparse Representations Online.
Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting.
Contrastive Syn-to-Real Generalization.
Incremental few-shot learning via vector quantization in deep embedded space.
In-N-Out: Pre-Training and Self-Training using Auxiliary Information for Out-of-Distribution Robustness.
Reducing the Computational Cost of Deep Generative Models with Binary Neural Networks.
MALI: A memory efficient and reverse accurate integrator for Neural ODEs.
FedBE: Making Bayesian Model Ensemble Applicable to Federated Learning.
My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control.
Neural Learning of One-of-Many Solutions for Combinatorial Problems in Structured Output Spaces.
Adaptive Universal Generalized PageRank Graph Neural Network.
Latent Convergent Cross Mapping.
Property Controllable Variational Autoencoder via Invertible Mutual Dependence.
Semantic Re-tuning with Contrastive Tension.
ResNet After All: Neural ODEs and Their Numerical Solution.
GANs Can Play Lottery Tickets Too.
Efficient Conformal Prediction via Cascaded Inference with Expanded Admission.
Learning Parametrised Graph Shift Operators.
Disambiguating Symbolic Expressions in Informal Documents.
Neural networks with late-phase weights.
Lossless Compression of Structured Convolutional Models via Lifting.
Uncertainty in Gradient Boosting via Ensembles.
An Unsupervised Deep Learning Approach for Real-World Image Denoising.
Conformation-Guided Molecular Representation with Hamiltonian Neural Networks.
Neural ODE Processes.
Multi-Class Uncertainty Calibration via Mutual Information Maximization-based Binning.
Effective Distributed Learning with Random Features: Improved Bounds and Algorithms.
On Learning Universal Representations Across Languages.
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Cut out the annotator, keep the cutout: better segmentation with weak supervision.
Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments.
Self-Supervised Learning of Compressed Video Representations.
Learning to Generate 3D Shapes with Generative Cellular Automata.
Initialization and Regularization of Factorized Neural Layers.
i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning.
Trusted Multi-View Classification.
Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics.
On Fast Adversarial Robustness Adaptation in Model-Agnostic Meta-Learning.
Probing BERT in Hyperbolic Spaces.
Efficient Wasserstein Natural Gradients for Reinforcement Learning.
Robust Pruning at Initialization.
Parameter Efficient Multimodal Transformers for Video Representation Learning.
Active Contrastive Learning of Audio-Visual Video Representations.
Enforcing robust control guarantees within neural network policies.
Unsupervised Representation Learning for Time Series with Temporal Neighborhood Coding.
Domain-Robust Visual Imitation Learning with Mutual Information Constraints.
Theoretical bounds on estimation error for meta-learning.
Towards Impartial Multi-task Learning.
Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data.
Counterfactual Generative Networks.
IOT: Instance-wise Layer Reordering for Transformer Structures.
A statistical theory of cold posteriors in deep neural networks.
The inductive bias of ReLU networks on orthogonally separable data.
A Unified Approach to Interpreting and Boosting Adversarial Transferability.
Contextual Transformation Networks for Online Continual Learning.
Private Image Reconstruction from System Side Channels Using Generative Models.
Model-based micro-data reinforcement learning: what are the crucial model properties and which model to choose?
HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents.
MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering.
AdamP: Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights.
Variational State-Space Models for Localisation and Dense 3D Mapping in 6 DoF.
Intrinsic-Extrinsic Convolution and Pooling for Learning on 3D Protein Structures.
Robust and Generalizable Visual Representation Learning via Random Convolutions.
Linear Mode Connectivity in Multitask and Continual Learning.
Non-asymptotic Confidence Intervals of Off-policy Evaluation: Primal and Dual Bounds.
Model Patching: Closing the Subgroup Performance Gap with Data Augmentation.
Blending MPC & Value Function Approximation for Efficient Reinforcement Learning.
Using latent space regression to analyze and leverage compositionality in GANs.
Shape-Texture Debiased Neural Network Training.
Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study.
DC3: A learning method for optimization with hard constraints.
Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks.
On the geometry of generalization and memorization in deep neural networks.
Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit.
Usable Information and Evolution of Optimal Representations During Training.
Zero-Cost Proxies for Lightweight NAS.
Perceptual Adversarial Robustness: Defense Against Unseen Threat Models.
Enjoy Your Editing: Controllable GANs for Image Editing via Latent Space Navigation.
Noise or Signal: The Role of Image Backgrounds in Object Recognition.
Shapley explainability on the data manifold.
Improving Transformation Invariance in Contrastive Representation Learning.
Learning "What-if" Explanations for Sequential Decision-Making.
A Trainable Optimal Transport Embedding for Feature Aggregation and its Relationship to Attention.
Continual learning in recurrent neural networks.
Meta-Learning with Neural Tangent Kernels.
Learning Robust State Abstractions for Hidden-Parameter Block MDPs.
FedBN: Federated Learning on Non-IID Features via Local Batch Normalization.
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization.
Colorization Transformer.
Separation and Concentration in Deep Networks.
Influence Functions in Deep Learning Are Fragile.
Training GANs with Stronger Augmentations via Contrastive Discriminator.
Language-Agnostic Representation Learning of Source Code from Structure and Context.
Clustering-friendly Representation Learning via Instance Discrimination and Feature Decorrelation.
Set Prediction without Imposing Structure as Conditional Density Estimation.
Do not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning.
Effective Abstract Reasoning with Dual-Contrast Network.
Loss Function Discovery for Object Detection via Convergence-Simulation Driven Search.
Learning Accurate Entropy Model with Global Reference for Image Compression.
What Makes Instance Discrimination Good for Transfer Learning?
A unifying view on implicit bias in training linear neural networks.
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components.
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks.
Policy-Driven Attack: Learning to Query for Hard-label Black-box Adversarial Examples.
Fast And Slow Learning Of Recurrent Independent Mechanisms.
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate.
Learning Safe Multi-agent Control with Decentralized Neural Barrier Certificates.
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint Learning.
Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers.
Neural Architecture Search on ImageNet in Four GPU Hours: A Theoretically Inspired Perspective.
Semi-supervised Keypoint Localization.
Learning to Set Waypoints for Audio-Visual Navigation.
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth.
What Should Not Be Contrastive in Contrastive Learning.
A Design Space Study for LISTA and Beyond.
Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective.
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval.
Hierarchical Reinforcement Learning by Discovering Intrinsic Options.
Denoising Diffusion Implicit Models.
Sliced Kernelized Stein Discrepancy.
Intraclass clustering: an implicit learning ability that regularizes DNNs.
Contrastive Learning with Hard Negative Samples.
Discrete Graph Structure Learning for Forecasting Multiple Time Series.
Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs.
A Block Minifloat Representation for Training Deep Neural Networks.
On the Impossibility of Global Convergence in Multi-Loss Optimization.
Offline Model-Based Optimization via Normalized Maximum Likelihood Estimation.
Self-supervised Representation Learning with Relative Predictive Coding.
Clairvoyance: A Pipeline Toolkit for Medical Time Series.
A PAC-Bayesian Approach to Generalization Bounds for Graph Neural Networks.
Heating up decision boundaries: isocapacitory saturation, adversarial scenarios and generalization bounds.
CaPC Learning: Confidential and Private Collaborative Learning.
Incorporating Symmetry into Deep Dynamics Models for Improved Generalization.
Learning with AMIGo: Adversarially Motivated Intrinsic Goals.
Explaining by Imitating: Understanding Decisions by Interpretable Policy Learning.
IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression.
not-MIWAE: Deep Generative Modelling with Missing not at Random Data.
Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization.
Distilling Knowledge from Reader to Retriever for Question Answering.
Adaptive Extra-Gradient Methods for Min-Max Optimization and Games.
Training with Quantization Noise for Extreme Model Compression.
The Role of Momentum Parameters in the Optimal Convergence of Adaptive Polyak's Heavy-ball Methods.
IEPT: Instance-Level and Episode-Level Pretext Tasks for Few-Shot Learning.
Learning to Deceive Knowledge Graph Augmented Models via Targeted Perturbation.
ChipNet: Budget-Aware Pruning with Heaviside Continuous Approximations.
Learning Long-term Visual Dynamics with Region Proposal Interaction Networks.
Text Generation by Learning from Demonstrations.
Conditional Generative Modeling via Learning the Latent Space.
When Optimizing f-Divergence is Robust with Label Noise.
Contrastive Learning with Adversarial Perturbations for Conditional Text Generation.
Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks.
Unbiased Teacher for Semi-Supervised Object Detection.
Efficient Reinforcement Learning in Factored MDPs with Application to Constrained RL.
Learning with Instance-Dependent Label Noise: A Sample Sieve Approach.
Bag of Tricks for Adversarial Training.
DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation.
The Risks of Invariant Risk Minimization.
DOP: Off-Policy Multi-Agent Decomposed Policy Gradients.
Generative Time-series Modeling with Fourier Flows.
Neural Spatio-Temporal Point Processes.
Contemplating Real-World Object Classification.
Learning Neural Event Functions for Ordinary Differential Equations.
Mastering Atari with Discrete World Models.
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy.
DeLighT: Deep and Light-weight Transformer.
Domain Generalization with MixStyle.
Concept Learners for Few-Shot Learning.
Creative Sketch Generation.
Rethinking Embedding Coupling in Pre-trained Language Models.
Lifelong Learning of Compositional Structures.
Debiasing Concept-based Explanations with Causal Analysis.
Learning to Represent Action Values as a Hypergraph on the Action Vertices.
Categorical Normalizing Flows via Continuous Transformations.
Achieving Linear Speedup with Partial Worker Participation in Non-IID Federated Learning.
Entropic gradient descent algorithms and wide flat minima.
Collective Robustness Certificates: Exploiting Interdependence in Graph Neural Networks.
Efficient Generalized Spherical CNNs.
Learning explanations that are hard to vary.
Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System.
Physics-aware, probabilistic model order reduction with guaranteed stability.
Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction.
RODE: Learning Roles to Decompose Multi-Agent Tasks.
Neural gradients are near-lognormal: improved quantized and sparse training.
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics.
Neural Thompson Sampling.
Continuous Wasserstein-2 Barycenter Estimation without Minimax Optimization.
Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization.
Effective and Efficient Vote Attack on Capsule Networks.
Isometric Propagation Network for Generalized Zero-shot Learning.
SEED: Self-supervised Distillation For Visual Representation.
A Learning Theoretic Perspective on Local Explainability.
Unsupervised Audiovisual Synthesis via Exemplar Autoencoders.
Learning Energy-Based Generative Models via Coarse-to-Fine Expanding and Sampling.
Multi-Time Attention Networks for Irregularly Sampled Time Series.
DialoGraph: Incorporating Interpretable Strategy-Graph Networks into Negotiation Dialogues.
Mirostat: a Neural Text decoding Algorithm that directly controls perplexity.
Contextual Dropout: An Efficient Sample-Dependent Dropout Module.
Proximal Gradient Descent-Ascent: Variable Convergence under KŁ Geometry.
Protecting DNNs from Theft using an Ensemble of Diverse Models.
Large Associative Memory Problem in Neurobiology and Machine Learning.
Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation.
Linear Convergent Decentralized Optimization with Compression.
On the Dynamics of Training Attention Models.
Adaptive Federated Optimization.
Improved Estimation of Concentration Under ℓp-Norm Distance Metrics Using Half Spaces.
INT: An Inequality Benchmark for Evaluating Generalization in Theorem Proving.
A Critique of Self-Expressive Deep Subspace Clustering.
Learning to Recombine and Resample Data For Compositional Generalization.
You Only Need Adversarial Supervision for Semantic Image Synthesis.
Overparameterisation and worst-case generalisation: friend or foe?
Calibration tests beyond classification.
On the Transfer of Disentangled Representations in Realistic Settings.
Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS.
Tilted Empirical Risk Minimization.
Revisiting Few-sample BERT Fine-tuning.
Ask Your Humans: Using Human Instructions to Improve Generalization in Reinforcement Learning.
SSD: A Unified Framework for Self-Supervised Outlier Detection.
Auxiliary Task Update Decomposition: the Good, the Bad and the neutral.
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval.
Partitioned Learned Bloom Filters.
Rank the Episodes: A Simple Approach for Exploration in Procedurally-Generated Environments.
Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms.
LEAF: A Learnable Frontend for Audio Classification.
Monotonic Kronecker-Factored Lattice.
Tomographic Auto-Encoder: Unsupervised Bayesian Recovery of Corrupted Data.
Vulnerability-Aware Poisoning Mechanism for Online RL with Unknown Dynamics.
Wasserstein-2 Generative Networks.
Emergent Road Rules In Multi-Agent Driving Environments.
Generative Language-Grounded Policy in Vision-and-Language Navigation with Bayes' Rule.
Understanding the failure modes of out-of-distribution generalization.
Uncertainty Estimation and Calibration with Finite-State Probabilistic RNNs.
Hopfield Networks is All You Need.
Interpreting Knowledge Graph Relation Representation from Word Embeddings.
The Importance of Pessimism in Fixed-Dataset Policy Optimization.
Accelerating Convergence of Replica Exchange Stochastic Gradient MCMC via Variance Reduction.
Representation Balancing Offline Model-based Reinforcement Learning.
FairBatch: Batch Selection for Model Fairness.
Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks.
DICE: Diversity in Deep Ensembles via Conditional Redundancy Adversarial Estimation.
Efficient Inference of Flexible Interaction in Spiking-neuron Networks.
Early Stopping in Deep Networks: Double Descent and How to Eliminate it.
Graph Coarsening with Neural Networks.
Deep Equals Shallow for ReLU Networks in Kernel Regimes.
Optimal Conversion of Conventional Artificial Neural Networks to Spiking Neural Networks.
Are wider nets better given the same number of parameters?
DARTS-: Robustly Stepping out of Performance Collapse Without Indicators.
Adversarially Guided Actor-Critic.
Balancing Constraints and Rewards with Meta-Gradient D4PG.
Auxiliary Learning by Implicit Differentiation.
Fully Unsupervised Diversity Denoising with Convolutional Variational Autoencoders.
Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent.
Large-width functional asymptotics for deep Gaussian neural networks.
Generalized Multimodal ELBO.
Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits.
Convex Regularization behind Neural Reconstruction.
Efficient Certified Defenses Against Patch Attacks on Image Classifiers.
Learning Neural Generative Dynamics for Molecular Conformation Generation.
Individually Fair Rankings.
Hierarchical Autoregressive Modeling for Neural Video Compression.
Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning.
Robust Reinforcement Learning on State Observations with Learned Optimal Adversary.
Auction Learning as a Two-Player Game.
How Much Over-parameterization Is Sufficient to Learn Deep ReLU Networks?
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima.
Evaluation of Similarity-based Explanations.
Open Question Answering over Tables and Text.
The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods.
Integrating Categorical Semantics into Unsupervised Domain Translation.
Self-supervised Learning from a Multi-view Perspective.
Fair Mixup: Fairness via Interpolation.
On the Universality of the Double Descent Peak in Ridgeless Regression.
Mind the Gap when Conditioning Amortised Inference in Sequential Latent-Variable Models.
Removing Undesirable Feature Contributions Using Out-of-Distribution Data.
Meta-learning Symmetries by Reparameterization.
Interpretable Models for Granger Causality Using Self-explaining Neural Networks.
How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision.
For self-supervised learning, Rationality implies generalization, provably.
A Temporal Kernel Approach for Deep Learning with Continuous-time Information.
Improve Object Detection with Feature-based Knowledge Distillation: Towards Accurate and Efficient Detectors.
Conservative Safety Critics for Exploration.
GraphCodeBERT: Pre-training Code Representations with Data Flow.
No MCMC for me: Amortized sampling for fast and stable training of energy-based models.
BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction.
Predicting Classification Accuracy When Adding New Unobserved Classes.
Estimating and Evaluating Regression Predictive Uncertainty in Deep Object Detectors.
Learning the Pareto Front with Hypernetworks.
Projected Latent Markov Chain Monte Carlo: Conditional Sampling of Normalizing Flows.
The Recurrent Neural Tangent Kernel.
MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space.
Impact of Representation Learning in Linear Bandits.
EEC: Learning to Encode and Regenerate Images for Continual Learning.
What Can You Learn From Your Muscles? Learning Visual Representation from Human Interactions.
Improving VAEs' Robustness to Adversarial Attack.
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers.
C-Learning: Learning to Achieve Goals via Recursive Classification.
Control-Aware Representations for Model-based Reinforcement Learning.
Scaling Symbolic Methods using Gradients for Neural Model Explanation.
Empirical or Invariant Risk Minimization? A Sample Complexity Perspective.
CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning.
Gradient Descent on Neural Networks Typically Occurs at the Edge of Stability.
The geometry of integration in text classification RNNs.
On the Bottleneck of Graph Neural Networks and its Practical Implications.
On the Critical Role of Conventions in Adaptive Human-AI Collaboration.
CopulaGNN: Towards Integrating Representational and Correlational Roles of Graphs in Graph Neural Networks.
Learning perturbation sets for robust machine learning.
Primal Wasserstein Imitation Learning.
A Universal Representation Transformer Layer for Few-Shot Image Classification.
MoPro: Webly Supervised Learning with Momentum Prototypes.
Signatory: differentiable computations of the signature and logsignature transforms, on both CPU and GPU.
Diverse Video Generation using a Gaussian Process Trigger.
Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning.
Optimism in Reinforcement Learning with Generalized Linear Function Approximation.
Deberta: decoding-Enhanced Bert with Disentangled Attention.
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching.
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning.
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines.
Computational Separation Between Convolutional and Fully-Connected Networks.
Probabilistic Numeric Convolutional Neural Networks.
FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders.
MixKD: Towards Efficient Distillation of Large-scale Language Models.
Teaching with Commentaries.
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding.
Fantastic Four: Differentiable and Efficient Bounds on Singular Values of Convolution Layers.
Negative Data Augmentation.
Scalable Transfer Learning with Expert Models.
Viewmaker Networks: Learning Views for Unsupervised Representation Learning.
A Wigner-Eckart Theorem for Group Equivariant Convolution Kernels.
Learning A Minimax Optimizer: A Pilot Study.
Meta Back-Translation.
Optimal Regularization can Mitigate Double Descent.
Net-DNF: Effective Deep Modeling of Tabular Data.
MultiModalQA: complex question answering over text, tables and images.
AdaGCN: Adaboosting Graph Convolutional Networks into Deep Models.
Few-Shot Learning via Learning the Representation, Provably.
Wandering within a world: Online contextualized few-shot learning.
WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic.
Nearest Neighbor Machine Translation.
Knowledge distillation via softmax regression representation learning.
Deep Networks and the Multiple Manifold Problem.
Empirical Analysis of Unlabeled Entity Problem in Named Entity Recognition.
Practical Massively Parallel Monte-Carlo Tree Search Applied to Molecular Design.
Neural Pruning via Growing Regularization.
Mixed-Features Vectors and Subspace Splitting.
Taking Notes on the Fly Helps Language Pre-Training.
Explainable Deep One-Class Classification.
Revisiting Dynamic Convolution via Matrix Decomposition.
RMSprop converges with proper hyper-parameter.
Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs.
MARS: Markov Molecular Sampling for Multi-objective Drug Discovery.
Quantifying Differences in Reward Functions.
Stabilized Medical Image Attacks.
Memory Optimization for Deep Networks.
Neural Topic Model via Optimal Transport.
Identifying nonlinear dynamical systems with multiple time scales and long-range dependencies.
Self-supervised Visual Reinforcement Learning with Object-centric Representations.
Retrieval-Augmented Generation for Code Summarization via Hybrid GNN.
On Self-Supervised Image Representations for GAN Evaluation.
Fidelity-based Deep Adiabatic Scheduling.
Fast Geometric Projections for Local Robustness Certification.
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning.
Mind the Pad - CNNs Can Develop Blind Spots.
LambdaNetworks: Modeling long-range Interactions without Attention.
Orthogonalizing Convolutional Layers with the Cayley Transform.
A Panda? No, It's a Sloth: Slowdown Attacks on Adaptive Multi-Exit Neural Network Inference.
Neural Approximate Sufficient Statistics for Implicit Models.
Generalization bounds via distillation.
Disentangled Recurrent Wasserstein Autoencoder.
Sequential Density Ratio Estimation for Simultaneous Optimization of Speed and Accuracy.
Uncertainty Sets for Image Classifiers using Conformal Prediction.
The Intrinsic Dimension of Images and Its Impact on Learning.
Recurrent Independent Mechanisms.
A Gradient Flow Framework For Analyzing Network Pruning.
Random Feature Attention.
Practical Real Time Recurrent Learning with a Sparse Approximation.
HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark.
Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels.
Multivariate Probabilistic Time Series Forecasting via Conditioned Normalizing Flows.
Undistillable: Making A Nasty Teacher That CANNOT teach students.
Behavioral Cloning from Noisy Demonstrations.
Learning from Protein Structure with Geometric Vector Perceptrons.
DeepAveragers: Offline Reinforcement Learning By Solving Derived Non-Parametric MDPs.
Async-RED: A Provably Convergent Asynchronous Block Parallel Stochastic Method using Deep Denoising Priors.
Distributional Sliced-Wasserstein and Applications to Generative Modeling.
Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters.
Mathematical Reasoning via Self-supervised Skip-tree Training.
Generalization in data-driven models of primary visual cortex.
Implicit Convex Regularizers of CNN Architectures: Convex Optimization of Two- and Three-Layer Networks in Polynomial Time.
On Statistical Bias In Active Learning: How and When to Fix It.
Unsupervised Object Keypoint Learning using Local Spatial Predictability.
Differentially Private Learning Needs Better Features (or Much More Data).
Long-tailed Recognition by Routing Diverse Distribution-Aware Experts.
Grounded Language Learning Fast and Slow.
The Traveling Observer Model: Multi-task Learning Through Spatial Variable Embeddings.
Learning Incompressible Fluid Dynamics from Scratch - Towards Fast, Differentiable Fluid Models that Generalize.
Winning the L2RPN Challenge: Power Grid Management via Semi-Markov Afterstate Actor-Critic.
Support-set bottlenecks for video-text representation learning.
Implicit Normalizing Flows.
PlasticineLab: A Soft-Body Manipulation Benchmark with Differentiable Physics.
Influence Estimation for Generative Adversarial Networks.
Emergent Symbols through Binding in External Memory.
Correcting experience replay for multi-agent communication.
How Benign is Benign Overfitting ?
Structured Prediction as Translation between Augmented Natural Languages.
On the Theory of Implicit Deep Learning: Global Convergence with Implicit Layers.
Towards Robustness Against Natural Language Word Substitutions.
Minimum Width for Universal Approximation.
Regularization Matters in Policy Optimization - An Empirical Study on Continuous Control.
Predicting Infectiousness for Proactive Contact Tracing.
Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?
Topology-Aware Segmentation Using Discrete Morse Theory.
Contrastive Divergence Learning is a Time Reversal Adversarial Game.
GAN "Steerability" without optimization.
Tent: Fully Test-Time Adaptation by Entropy Minimization.
Deep Neural Network Fingerprinting by Conferrable Adversarial Examples.
Interpreting Graph Neural Networks for NLP With Differentiable Edge Masking.
Regularized Inverse Reinforcement Learning.
Graph Convolution with Low-rank Learnable Local Filters.
Locally Free Weight Sharing for Network Width Search.
Learning Mesh-Based Simulation with Graph Networks.
Unlearnable Examples: Making Personal Data Unexploitable.
What are the Statistical Limits of Offline RL with Linear Function Approximation?
Improving Adversarial Robustness via Channel-wise Activation Suppressing.
BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration.
UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers.
A Good Image Generator Is What You Need for High-Resolution Video Synthesis.
Watch-And-Help: A Challenge for Social Perception and Human-AI Collaboration.
Data-Efficient Reinforcement Learning with Self-Predictive Representations.
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images.
PMI-Masking: Principled masking of correlated spans.
Sharpness-aware Minimization for Efficiently Improving Generalization.
Self-Supervised Policy Adaptation during Deployment.
Large Scale Image Completion via Co-Modulated Generative Adversarial Networks.
Individually Fair Gradient Boosting.
Dataset Inference: Ownership Resolution in Machine Learning.
How Does Mixup Help With Robustness and Generalization?
Benefit of deep learning with non-convex noisy gradient descent: Provable excess risk bound and superiority to kernel methods.
Mutual Information State Intrinsic Control.
Information Laundering for Model Privacy.
Learning with Feature-Dependent Label Noise: A Progressive Approach.
DDPNOpt: Differential Dynamic Programming Neural Optimizer.
Long-tail learning via logit adjustment.
Understanding the role of importance weighting for deep learning.
Iterative Empirical Game Solving via Single Policy Best Response.
Systematic generalisation with group invariant predictions.
Autoregressive Entity Retrieval.
Deciphering and Optimizing Multi-Task Learning: a Random Matrix Approach.
Learning-based Support Estimation in Sublinear Time.
Geometry-Aware Gradient Algorithms for Neural Architecture Search.
VAEBM: A Symbiosis between Variational Autoencoders and Energy-based Models.
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning.
Noise against noise: stochastic label noise helps combat inherent label noise.
Model-Based Visual Planning with Self-Supervised Functional Distances.
Discovering a set of policies for the worst case reward.
Expressive Power of Invariant and Equivariant Graph Neural Networks.
Learning a Latent Simplex in Input Sparsity Time.
CPT: Efficient Deep Neural Network Training via Cyclic Precision.
Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models.
Dynamic Tensor Rematerialization.
Graph-Based Continual Learning.
Sparse Quantized Spectral Clustering.
Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies.
Iterated learning for emergent systematicity in VQA.
Why Are Convolutional Nets More Sample-Efficient than Fully-Connected Nets?
Gradient Projection Memory for Continual Learning.
MONGOOSE: A Learnable LSH Framework for Efficient Neural Network Training.
Improved Autoregressive Modeling with Distribution Smoothing.
Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions.
How Neural Networks Extrapolate: From Feedforward to Graph Neural Networks.
Image GANs meet Differentiable Rendering for Inverse Graphics and Interpretable 3D Neural Rendering.
Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation.
Score-Based Generative Modeling through Stochastic Differential Equations.
Self-training For Few-shot Transfer Across Extreme Task Differences.
Towards Nonlinear Disentanglement in Natural Data with Temporal Sparse Coding.
Complex Query Answering with Neural Link Predictors.
Augmenting Physical Models with Deep Networks for Complex Dynamics Forecasting.
EigenGame: PCA as a Nash Equilibrium.
Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Growing Efficient Deep Networks by Structured Continuous Sparsification.
Evolving Reinforcement Learning Algorithms.
SMiRL: Surprise Minimizing Reinforcement Learning in Unstable Environments.
On the mapping between Hopfield networks and Restricted Boltzmann Machines.
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
DiffWave: A Versatile Diffusion Model for Audio Synthesis.
Neural Synthesis of Binaural Speech From Mono Audio.
Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability.
VCNet and Functional Targeted Regularization For Learning Causal Effects of Continuous Treatments.
Do 2D GANs Know 3D Shape? Unsupervised 3D Shape Reconstruction from 2D Image GANs.
Learning Invariant Representations for Reinforcement Learning without Reconstruction.
Parrot: Data-Driven Behavioral Priors for Reinforcement Learning.
Human-Level Performance in No-Press Diplomacy via Equilibrium Search.
Learning Cross-Domain Correspondence for Control with Dynamics Cycle-Consistency.
A Distributional Approach to Controlled Text Generation.
Rethinking Architecture Selection in Differentiable NAS.
Dataset Condensation with Gradient Matching.
End-to-end Adversarial Text-to-Speech.
SenSeI: Sensitive Set Invariance for Enforcing Individual Fairness.
Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity.
Geometry-aware Instance-reweighted Adversarial Training.
Federated Learning Based on Dynamic Regularization.
When Do Curricula Work?
Getting a CLUE: A Method for Explaining Uncertainty Estimates.
Rethinking Attention with Performers.
Rao-Blackwellizing the Straight-Through Gumbel-Softmax Gradient Estimator.
Global Convergence of Three-layer Neural Networks in the Mean Field Regime.
Learning Generalizable Visual Representations via Interactive Gameplay.
Randomized Automatic Differentiation.
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes.
Free Lunch for Few-shot Learning: Distribution Calibration.
Optimal Rates for Averaged Stochastic Gradient Descent under Neural Tangent Kernel Regime.
Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients.
Learning to Reach Goals via Iterated Supervised Learning.
Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data.
What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study.