2020/01/19 10:50

楚航、罗若天作者

ArXiv Weekly Radiostation：本周NLP、CV、ML精选论文30篇（1.13-1.19）

机器之心联合由楚航、罗若天发起的ArXiv Weekly Radiostation，精选每周NLP、CV、ML领域各10篇重要论文，本周详情如下：

ArXiv Weekly: 10 NLP Papers You May Want to Read

Why you may want to read this: Newest paper from Gerhard Weikum (Professor of Computer Science, Max Planck Institute for Informatics, Saarland Informatics …).

Joint Reasoning for Multi-Faceted Commonsense Knowledge.

Yohan Chalier, Simon Razniewski, Gerhard Weikum

Commonsense knowledge (CSK) supports a variety of AI applications, from visual understanding to chatbots. Prior works on acquiring CSK, such as ConceptNet, have compiled statements that associate concepts, like everyday objects or activities, with properties that hold for most or some instances of the concept. Each concept is treated in isolation from other concepts, and the only quantitative measure (or ranking) of properties is a confidence score that the statement is valid. This paper aims to overcome these limitations by introducing a multi-faceted model of CSK statements and methods for joint reasoning over sets of inter-related statements. Our model captures four different dimensions of CSK statements: plausibility, typicality, remarkability and salience, with scoring and ranking along each dimension. For example, hyenas drinking water is typical but not salient, whereas hyenas eating carcasses is salient. For reasoning and ranking, we develop a method with soft constraints, to couple the inference over concepts that are related in in a taxonomic hierarchy. The reasoning is cast into an integer linear programming (ILP), and we leverage the theory of reduction costs of a relaxed LP to compute informative rankings. This methodology is applied to several large CSK collections. Our evaluation shows that we can consolidate these inputs into much cleaner and more expressive knowledge. Results are available at https://dice.mpi-inf.mpg.de.

Why you may want to read this: Newest paper from Minlie Huang (computer science, Tsinghua University).

A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation.

Jian Guan, Fei Huang, Zhihao Zhao, Xiaoyan Zhu, Minlie Huang

Story generation, namely generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we employ multi-task learning which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

Why you may want to read this: Newest paper from Yueting Zhuang (Professor of Computer Science, Zhejiang University), Deng Cai (Professor of Computer Science, Zhejiang University).

Bi-Decoder Augmented Network for Neural Machine Translation.

Boyuan Pan, Yazheng Yang, Zhou Zhao, Yueting Zhuang, Deng Cai

Neural Machine Translation (NMT) has become a popular technology in recent years, and the encoder-decoder framework is the mainstream among all the methods. It's obvious that the quality of the semantic representations from encoding is very crucial and can significantly affect the performance of the model. However, existing unidirectional source-to-target architectures may hardly produce a language-independent representation of the text because they rely heavily on the specific relations of the given language pairs. To alleviate this problem, in this paper, we propose a novel Bi-Decoder Augmented Network (BiDAN) for the neural machine translation task. Besides the original decoder which generates the target language sequence, we add an auxiliary decoder to generate back the source language sequence at the training time. Since each decoder transforms the representations of the input text into its corresponding language, jointly training with two target ends can make the shared encoder has the potential to produce a language-independent semantic space. We conduct extensive experiments on several NMT benchmark datasets and the results demonstrate the effectiveness of our proposed approach.

Why you may want to read this: Newest paper from Zhenyu Xuan (University of Texas at Dallas).

FGN: Fusion Glyph Network for Chinese Named Entity Recognition.

Zhenyu Xuan, Rui Bao, Chuyu Ma, Shengyi Jiang

Chinese NER is a challenging task. As pictographs, Chinese characters contain latent glyph information, which is often overlooked. We propose the FGN, Fusion Glyph Network for Chinese NER. This method may offer glyph information for fusion representation learning with BERT. The major innovations of FGN include: (1) a novel CNN structure called CGS-CNN is proposed to capture glyph information from both character graphs and their neighboring graphs. (2) we provide a method with sliding window and Slice-Attention to extract interactive information between BERT representation and glyph representation. Experiments are conducted on four NER datasets, showing that FGN with LSTM-CRF as tagger achieves new state-of-the-arts performance for Chinese NER. Further, more experiments are conducted to investigate the influences of various components and settings in FGN.

Why you may want to read this: Newest paper from Nigam H. Shah (Associate Professor of Medicine, Stanford University).

Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data.

Ethan Steinberg, Ken Jung, Jason A. Fries, Conor K. Corbin, Stephen R. Pfohl, Nigam H. Shah

Widespread adoption of electronic health records (EHRs) has fueled development of clinical outcome models using machine learning. However, patient EHR data are complex, and how to optimally represent them is an open question. This complexity, along with often small training set sizes available to train these clinical outcome models, are two core challenges for training high quality models. In this paper, we demonstrate that learning generic representations from the data of all the patients in the EHR enables better performing prediction models for clinical outcomes, allowing for these challenges to be overcome. We adapt common representation learning techniques used in other domains and find that representations inspired by language models enable a 3.5% mean improvement in AUROC on five clinical outcomes compared to standard baselines, with the average improvement rising to 19% when only a small number of patients are available for training a prediction model for a given clinical outcome.

Why you may want to read this: Newest paper from Thomas Hain (Professor of Speech Technology, University of Sheffield).

Robust Speaker Recognition Using Speech Enhancement And Attention Model.

Yanpei Shi, Qiang Huang, Thomas Hain

In this paper, a novel architecture for speaker recognition is proposed by cascading speech enhancement and speaker processing. Its aim is to improve speaker recognition performance when speech signals are corrupted by noise. Instead of individually processing speech enhancement and speaker recognition, the two modules are integrated into one framework by a joint optimisation using deep neural networks. Furthermore, to increase robustness against noise, a multi-stage attention mechanism is employed to highlight the speaker related features learned from context information in time and frequency domain. To evaluate speaker identification and verification performance of the proposed approach, we test it on the dataset of VoxCeleb1, one of mostly used benchmark datasets. Moreover, the robustness of our proposed approach is also tested on VoxCeleb1 data when being corrupted by three types of interferences, general noise, music, and babble, at different signal-to-noise ratio (SNR) levels. The obtained results show that the proposed approach using speech enhancement and multi-stage attention models outperforms two strong baselines not using them in most acoustic conditions in our experiments.

Why you may want to read this: Newest paper from Ruofei Zhang (Microsoft).

ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training.

Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou

In this paper, we present a new sequence-to-sequence pre-training model called ProphetNet, which introduces a novel self-supervised objective named future n-gram prediction and the proposed n-stream self-attention mechanism.Instead of the optimization of one-step ahead prediction in traditional sequence-to-sequence model, the ProphetNet is optimized by n-step ahead prediction which predicts the next n tokens simultaneously based on previous context tokens at each time step.The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. We pre-train ProphetNet using a base scale dataset (16GB) and a large scale dataset (160GB) respectively. Experimental results show ProphetNet achieves the best performance on both abstractive summarization and question generation tasks compared to the models using the same base scale pre-training dataset. For the large scale dataset pre-training, ProphetNet achieves new state-of-the-art results on Gigaword and comparable results on CNN/DailyMail using only about 1/5 pre-training epochs of the previous model.

Why you may want to read this: Newest paper from Maosong Sun (Professor of Computer Science and Technology, Tsinghua University).

Lexical Sememe Prediction using Dictionary Definitions by Capturing Local Semantic Correspondence.

Jiaju Du, Fanchao Qi, Maosong Sun, Zhiyuan Liu

Sememes, defined as the minimum semantic units of human languages in linguistics, have been proven useful in many NLP tasks. Since manual construction and update of sememe knowledge bases (KBs) are costly, the task of automatic sememe prediction has been proposed to assist sememe annotation. In this paper, we explore the approach of applying dictionary definitions to predicting sememes for unannotated words. We find that sememes of each word are usually semantically matched to different words in its dictionary definition, and we name this matching relationship local semantic correspondence. Accordingly, we propose a Sememe Correspondence Pooling (SCorP) model, which is able to capture this kind of matching to predict sememes. We evaluate our model and baseline methods on a famous sememe KB HowNet and find that our model achieves state-of-the-art performance. Moreover, further quantitative analysis shows that our model can properly learn the local semantic correspondence between sememes and words in dictionary definitions, which explains the effectiveness of our model. The source codes of this paper can be obtained from https://github.com/thunlp/scorp.

Why you may want to read this: Newest paper from Cong Yu (Senior Staff Research Scientist, Google Research NYC).

CLUENER2020: Fine-grained Name Entity Recognition for Chinese.

Liang Xu, Qianqian Dong, Cong Yu, Yin Tian, Weitang Liu, Lu Li, Xuanwei Zhang

In this paper, we introduce the NER dataset from CLUE organization (CLUENER2020), a well-defined fine-grained dataset for name entity recognition in Chinese. CLUENER2020 contains 10 categories. Apart from common labels like person, organization and location, it contains more diverse categories. It is more challenging than current other Chinese NER datasets and could better reflect real-world applications. For comparison, we implement several state-of-the-art baselines as sequence labelling tasks and report human performance, as well as its analysis. To facilitate future work on fine-grained NER for Chinese, we release our dataset, baselines and leader-board.

Why you may want to read this: Newest paper from Gholamreza Haffari (Associate Professor, Faculty of Information Technology, Monash University).

Learning to Multi-Task Learn for Better Neural Machine Translation.

Poorya Zaremoodi, Gholamreza Haffari

Scarcity of parallel sentence pairs is a major challenge for training high quality neural machine translation (NMT) models in bilingually low-resource scenarios, as NMT is data-hungry. Multi-task learning is an elegant approach to inject linguistic-related inductive biases into NMT, using auxiliary syntactic and semantic tasks, to improve generalisation. The challenge, however, is to devise effective training schedules, prescribing when to make use of the auxiliary tasks during the training process to fill the knowledge gaps of the main translation task, a setting referred to as biased-MTL. Current approaches for the training schedule are based on hand-engineering heuristics, whose effectiveness vary in different MTL settings. We propose a novel framework for learning the training schedule, ie learning to multi-task learn, for the MTL setting of interest. We formulate the training schedule as a Markov decision process which paves the way to employ policy learning methods to learn the scheduling policy. We effectively and efficiently learn the training schedule policy within the imitation learning framework using an oracle policy algorithm that dynamically sets the importance weights of auxiliary tasks based on their contributions to the generalisability of the main NMT task. Experiments on low-resource NMT settings show the resulting automatically learned training schedulers are competitive with the best heuristics, and lead to up to +1.1 BLEU score improvements.

ArXiv Weekly: 10 CV Papers You May Want to Read

Why you may want to read this: Newest paper from Alan Bovik (Cockrell Family Regents Endowed Chair Professor, The University of Texas at Austin).

180-degree Outpainting from a Single Image.

Zhenqiang Ying, Alan Bovik

Presenting context images to a viewer's peripheral vision is one of the most effective techniques to enhance immersive visual experiences. However, most images only present a narrow view, since the field-of-view (FoV) of standard cameras is small. To overcome this limitation, we propose a deep learning approach that learns to predict a 180{\deg} panoramic image from a narrow-view image. Specifically, we design a foveated framework that applies different strategies on near-periphery and mid-periphery regions. Two networks are trained separately, and then are employed jointly to sequentially perform narrow-to-90{\deg} generation and 90{\deg}-to-180{\deg} generation. The generated outputs are then fused with their aligned inputs to produce expanded equirectangular images for viewing. Our experimental results show that single-view-to-panoramic image generation using deep learning is both feasible and promising.

Why you may want to read this: Newest paper from Leonidas J. Guibas (Professor of Computer Science, Stanford University).

Learning multiview 3D point cloud registration.

Zan Gojcic, Caifa Zhou, Jan D. Wegner, Leonidas J. Guibas, Tolga Birdal

We present a novel, end-to-end learnable, multiview 3D point cloud registration algorithm. Registration of multiple scans typically follows a two-stage pipeline: the initial pairwise alignment and the globally consistent refinement. The former is often ambiguous due to the low overlap of neighboring point clouds, symmetries and repetitive scene parts. Therefore, the latter global refinement aims at establishing the cyclic consistency across multiple scans and helps in resolving the ambiguous cases. In this paper we propose, to the best of our knowledge, the first end-to-end algorithm for joint learning of both parts of this two-stage problem. Experimental evaluation on well accepted benchmark datasets shows that our approach outperforms the state-of-the-art by a significant margin, while being end-to-end trainable and computationally less costly. Moreover, we present detailed analysis and an ablation study that validate the novel components of our approach. The source code and pretrained models will be made publicly available under https: //github.com/zgojcic/3D_multiview_reg.

Why you may want to read this: Newest paper from Ming-Hsuan Yang (University of California at Merced).

Visual Question Answering on 360{\deg} Images.

Shih-Han Chou, Wei-Lun Chao, Wei-Sheng Lai, Min Sun, Ming-Hsuan Yang

In this work, we introduce VQA 360, a novel task of visual question answering on 360 images. Unlike a normal field-of-view image, a 360 image captures the entire visual content around the optical center of a camera, demanding more sophisticated spatial understanding and reasoning. To address this problem, we collect the first VQA 360 dataset, containing around 17,000 real-world image-question-answer triplets for a variety of question types. We then study two different VQA models on VQA 360, including one conventional model that takes an equirectangular image (with intrinsic distortion) as input and one dedicated model that first projects a 360 image onto cubemaps and subsequently aggregates the information from multiple spatial resolutions. We demonstrate that the cubemap-based model with multi-level fusion and attention diffusion performs favorably against other variants and the equirectangular-based models. Nevertheless, the gap between the humans' and machines' performance reveals the need for more advanced VQA 360 algorithms. We, therefore, expect our dataset and studies to serve as the benchmark for future development in this challenging task. Dataset, code, and pre-trained models are available online.

Why you may want to read this: Newest paper from Ming-Hsuan Yang (University of California at Merced).

CrDoCo: Pixel-level Domain Transfer with Cross-Domain Consistency.

Yun-Chun Chen, Yen-Yu Lin, Ming-Hsuan Yang, Jia-Bin Huang

Unsupervised domain adaptation algorithms aim to transfer the knowledge learned from one domain to another (e.g., synthetic to real images). The adapted representations often do not capture pixel-level domain shifts that are crucial for dense prediction tasks (e.g., semantic segmentation). In this paper, we present a novel pixel-wise adversarial domain adaptation algorithm. By leveraging image-to-image translation methods for data augmentation, our key insight is that while the translated images between domains may differ in styles, their predictions for the task should be consistent. We exploit this property and introduce a cross-domain consistency loss that enforces our adapted model to produce consistent predictions. Through extensive experimental results, we show that our method compares favorably against the state-of-the-art on a wide variety of unsupervised domain adaptation tasks.

Why you may want to read this: Newest paper from Kevin Bowyer (Schubmehl-Prein Family Professor of Computer Science and Engineering, University of …).

Learning Transformation-Aware Embeddings for Image Forensics.

Aparna Bharati, Daniel Moreira, Patrick Flynn, Anderson Rocha, Kevin Bowyer, Walter Scheirer

A dramatic rise in the flow of manipulated image content on the Internet has led to an aggressive response from the media forensics research community. New efforts have incorporated increased usage of techniques from computer vision and machine learning to detect and profile the space of image manipulations. This paper addresses Image Provenance Analysis, which aims at discovering relationships among different manipulated image versions that share content. One of the main sub-problems for provenance analysis that has not yet been addressed directly is the edit ordering of images that share full content or are near-duplicates. The existing large networks that generate image descriptors for tasks such as object recognition may not encode the subtle differences between these image covariates. This paper introduces a novel deep learning-based approach to provide a plausible ordering to images that have been generated from a single image through transformations. Our approach learns transformation-aware descriptors using weak supervision via composited transformations and a rank-based quadruplet loss. To establish the efficacy of the proposed approach, comparisons with state-of-the-art handcrafted and deep learning-based descriptors, and image matching approaches are made. Further experimentation validates the proposed approach in the context of image provenance analysis.

Why you may want to read this: Newest paper from Wen Gao (Professor of Computer Science, Peking University).

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics.

Ling-Yu Duan, Jiaying Liu, Wenhan Yang, Tiejun Huang, Wen Gao

Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one with compactness and efficiency to serve for machine vision, and the other with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.

Why you may want to read this: Newest paper from Jia Li (Professor of Statistics, The Pennsylvania State University), Xiaogang Wang (Associate Professor of Electronic Engineering, the Chinese University of Hong Kong).

Single Image Dehazing Using Ranking Convolutional Neural Network.

Yafei Song, Jia Li, Xiaogang Wang, Xiaowu Chen

Single image dehazing, which aims to recover the clear image solely from an input hazy or foggy image, is a challenging ill-posed problem. Analysing existing approaches, the common key step is to estimate the haze density of each pixel. To this end, various approaches often heuristically designed haze-relevant features. Several recent works also automatically learn the features via directly exploiting Convolutional Neural Networks (CNN). However, it may be insufficient to fully capture the intrinsic attributes of hazy images. To obtain effective features for single image dehazing, this paper presents a novel Ranking Convolutional Neural Network (Ranking-CNN). In Ranking-CNN, a novel ranking layer is proposed to extend the structure of CNN so that the statistical and structural attributes of hazy images can be simultaneously captured. By training Ranking-CNN in a well-designed manner, powerful haze-relevant features can be automatically learned from massive hazy image patches. Based on these features, haze can be effectively removed by using a haze density prediction model trained through the random forest regression. Experimental results show that our approach outperforms several previous dehazing approaches on synthetic and real-world benchmark images. Comprehensive analyses are also conducted to interpret the proposed Ranking-CNN from both the theoretical and experimental aspects.

Why you may want to read this: Newest paper from Philip H. S. Torr (Professor, University of Oxford).

Few-shot Action Recognition via Improved Attention with Self-supervision.

Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz

Most existing few-shot learning methods in computer vision focus on class recognition given a few of still images as the input. In contrast, this paper tackles a more challenging task of few-shot action-recognition from video clips. We propose a simple framework which is both flexible and easy to implement. Our approach exploits joint spatial and temporal attention mechanisms in conjunction with self-supervised representation learning on videos. This design encourages the model to discover and encode spatial and temporal attention hotspots important during the similarity learning between dynamic video sequences for which locations of discriminative patterns vary in the spatio-temporal sense. Our method compares favorably with several state-of-the-art baselines on HMDB51, miniMIT and UCF101 datasets, demonstrating its superior performance.

Why you may want to read this: Newest paper from Philip H. S. Torr (Professor, University of Oxford).

Rethinking Class Relations: Absolute-relative Few-shot Learning.

Hongguang Zhang, Philip H. S. Torr, Hongdong Li, Songlei Jian, Piotr Koniusz

The majority of existing few-shot learning describe image relations with {0,1} binary labels. However, such binary relations are insufficient to teach the network complicated real-world relations, due to the lack of decision smoothness. Furthermore, current few-shot learning models capture only the similarity via relation labels, but they are not exposed to class concepts associated with objects, which is likely detrimental to the classification performance due to underutilization of the available class labels. To paraphrase, while children learn the concept of tiger from a few of examples with ease, and while they learn from comparisons of tiger to other animals, they are also taught the actual concept names. Thus, we hypothesize that in fact both similarity and class concept learning must be occurring simultaneously. With these observations at hand, we study the fundamental problem of simplistic class modeling in current few-shot learning, we rethink the relations between class concepts, and propose a novel absolute-relative learning paradigm to fully take advantage of label information to refine the image representations and correct the relation understanding. Our proposed absolute-relative learning paradigm improves the performance of several the state-of-the-art models on publicly available datasets.

Why you may want to read this: Newest paper from Philip H.S. Torr (Professor, University of Oxford).

Unifying Training and Inference for Panoptic Segmentation.

Qizhu Li, Xiaojuan Qi, Philip H.S. Torr

We present an end-to-end network to bridge the gap between training and inference pipeline for panoptic segmentation, a task that seeks to partition an image into semantic regions for "stuff" and object instances for "things". In contrast to recent works, our network exploits a parametrised, yet lightweight panoptic segmentation submodule, powered by an end-to-end learnt dense instance affinity, to capture the probability that any pair of pixels belong to the same instance. This panoptic submodule gives rise to a novel propagation mechanism for panoptic logits and enables the network to output a coherent panoptic segmentation map for both "stuff" and "thing" classes, without any post-processing. Reaping the benefits of end-to-end training, our full system sets new records on the popular street scene dataset, Cityscapes, achieving 61.4 PQ with a ResNet-50 backbone using only the fine annotations. On the challenging COCO dataset, our ResNet-50-based network also delivers state-of-the-art accuracy of 43.4 PQ. Moreover, our network flexibly works with and without object mask cues, performing competitively under both settings, which is of interest for applications with computation budgets.

ArXiv Weekly: 10 ML Papers You May Want to Read

Why you may want to read this: Newest paper from Salvador García (Full Professor of Computer Science and Artificial Intelligence. University of Granada.), Francisco Herrera (Professor Computer Science and AI, Granada Univ.; Senior Associate Researcher in …).

Smart Data based Ensemble for Imbalanced Big Data Classification.

Diego García-Gil, Johan Holmberg, Salvador García, Ning Xiong, Francisco Herrera

Big Data scenarios pose a new challenge to traditional data mining algorithms, since they are not prepared to work with such amount of data. Smart Data refers to data of enough quality to improve the outcome from a data mining algorithm. Existing data mining algorithms unability to handle Big Datasets prevents the transition from Big to Smart Data. Automation in data acquisition that characterizes Big Data also brings some problems, such as differences in data size per class. This will lead classifiers to lean towards the most represented classes. This problem is known as imbalanced data distribution, where one class is underrepresented in the dataset. Ensembles of classifiers are machine learning methods that improve the performance of a single base classifier by the combination of several of them. Ensembles are not exempt from the imbalanced classification problem. To deal with this issue, the ensemble method have to be designed specifically. In this paper, a data preprocessing ensemble for imbalanced Big Data classification is presented, with focus on two-class problems. Experiments carried out in 21 Big Datasets have proved that our ensemble classifier outperforms classic machine learning models with an added data balancing method, such as Random Forests.

Why you may want to read this: Newest paper from George Em Karniadakis (The Charles Pitts Robinson and John Palmer Barstow Professor of Applied Mathematics …).

Symplectic networks: Intrinsic structure-preserving networks for identifying Hamiltonian systems.

Pengzhan Jin, Aiqing Zhu, George Em Karniadakis, Yifa Tang

This work presents a framework of constructing the neural networks preserving the symplectic structure, so-called symplectic networks (SympNets). With the symplectic networks, we show some numerical results about (\romannumeral1) solving the Hamiltonian systems by learning abundant data points over the phase space, and (\romannumeral2) predicting the phase flows by learning a series of points depending on time. All the experiments point out that the symplectic networks perform much more better than the fully-connected networks that without any prior information, especially in the task of predicting which is unable to do within the conventional numerical methods.

Why you may want to read this: Newest paper from Michael L. Littman (Brown University).

Lipschitz Lifelong Reinforcement Learning.

Erwan Lecarpentier, David Abel, Kavosh Asadi, Yuu Jinnai, Emmanuel Rachelson, Michael L. Littman

We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipschitz continuous with respect to the tasks space. These theoretical results lead us to a value transfer method for Lifelong RL, which we use to build a PAC-MDP algorithm with improved convergence rate. We illustrate the benefits of the method in Lifelong RL experiments.

Why you may want to read this: Newest paper from Tong Zhang (HKUST).

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems.

Luo Luo, Haishan Ye, Tong Zhang

We consider nonconvex-concave minimax problems of the form \min_{\bf x}\max_{\bf y} f({\bf x},{\bf y}), where f is strongly-concave in \bf y but possibly nonconvex in \bf x. We focus on the stochastic setting, where we can only access an unbiased stochastic gradient estimate of f at each iteration. This formulation includes many machine learning applications as special cases such as adversary training and certifying robustness in deep learning. We are interested in finding an {\mathcal O}(\varepsilon)-stationary point of the function \Phi(\cdot)=\max_{\bf y} f(\cdot, {\bf y}). The most popular algorithm to solve this problem is stochastic gradient decent ascent, which requires \mathcal O(\kappa^3\varepsilon^{-4}) stochastic gradient evaluations, where \kappa is the condition number. In this paper, we propose a novel method called Stochastic Recursive gradiEnt Descent Ascent (SREDA), which estimates gradients more efficiently using variance reduction. This method achieves the best known stochastic gradient complexity of {\mathcal O}(\kappa^3\varepsilon^{-3}), and its dependency on \varepsilon is optimal for this problem.

Why you may want to read this: Newest paper from Lior Wolf (The School of Computer Science at Tel Aviv University).

On the Convex Behavior of Deep Neural Networks in Relation to the Layers' Width.

Etai Littwin, Lior Wolf

The Hessian of neural networks can be decomposed into a sum of two matrices: (i) the positive semidefinite generalized Gauss-Newton matrix G, and (ii) the matrix H containing negative eigenvalues. We observe that for wider networks, minimizing the loss with the gradient descent optimization maneuvers through surfaces of positive curvatures at the start and end of training, and close to zero curvatures in between. In other words, it seems that during crucial parts of the training process, the Hessian in wide networks is dominated by the component G. To explain this phenomenon, we show that when initialized using common methodologies, the gradients of over-parameterized networks are approximately orthogonal to H, such that the curvature of the loss surface is strictly positive in the direction of the gradient.

Why you may want to read this: Newest paper from Lior Wolf (The School of Computer Science at Tel Aviv University).

Unsupervised Learning of the Set of Local Maxima.

Lior Wolf, Sagie Benaim, Tomer Galanti

This paper describes a new form of unsupervised learning, whose input is a set of unlabeled points that are assumed to be local maxima of an unknown value function v in an unknown subset of the vector space. Two functions are learned: (i) a set indicator c, which is a binary classifier, and (ii) a comparator function h that given two nearby samples, predicts which sample has the higher value of the unknown function v. Loss terms are used to ensure that all training samples x are a local maxima of v, according to h and satisfy c(x)=1. Therefore, c and h provide training signals to each other: a point x' in the vicinity of x satisfies c(x)=-1 or is deemed by h to be lower in value than x. We present an algorithm, show an example where it is more efficient to use local maxima as an indicator function than to employ conventional classification, and derive a suitable generalization bound. Our experiments show that the method is able to outperform one-class classification algorithms in the task of anomaly detection and also provide an additional signal that is extracted in a completely unsupervised way.

Why you may want to read this: Newest paper from Lior Wolf (The School of Computer Science at Tel Aviv University).

A Formal Approach to Explainability.

Lior Wolf, Tomer Galanti, Tamir Hazan

We regard explanations as a blending of the input sample and the model's output and offer a few definitions that capture various desired properties of the function that generates these explanations. We study the links between these properties and between explanation-generating functions and intermediate representations of learned models and are able to show, for example, that if the activations of a given layer are consistent with an explanation, then so do all other subsequent layers. In addition, we study the intersection and union of explanations as a way to construct new explanations.

Why you may want to read this: Newest paper from Carsten Rother (Professor Uni Heidelberg / Germany).

Disentanglement by Nonlinear ICA with General Incompressible-flow Networks (GIN).

Peter Sorrenson, Carsten Rother, Ullrich Köthe

A central question of representation learning asks under which conditions it is possible to reconstruct the true latent variables of an arbitrarily complex generative process. Recent breakthrough work by Khemakhem et al. (2019) on nonlinear ICA has answered this question for a broad class of conditional generative processes. We extend this important result in a direction relevant for application to real-world data. First, we generalize the theory to the case of unknown intrinsic problem dimension and prove that in some special (but not very restrictive) cases, informative latent variables will be automatically separated from noise by an estimating model. Furthermore, the recovered informative latent variables will be in one-to-one correspondence with the true latent variables of the generating process, up to a trivial component-wise transformation. Second, we introduce a modification of the RealNVP invertible neural network architecture (Dinh et al. (2016)) which is particularly suitable for this type of problem: the General Incompressible-flow Network (GIN). Experiments on artificial data and EMNIST demonstrate that theoretical predictions are indeed verified in practice. In particular, we provide a detailed set of exactly 22 informative latent variables extracted from EMNIST.

Why you may want to read this: Newest paper from Stephen Roberts (Professor of Engineering Science (Machine Learning, Information Engineering), University …).

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset.

Ivan Kiskin, Adam D. Cobb, Lawrence Wang, Stephen Roberts

Mosquitoes are the only known vector of malaria, which leads to hundreds of thousands of deaths each year. Understanding the number and location of potential mosquito vectors is of paramount importance to aid the reduction of malaria transmission cases. In recent years, deep learning has become widely used for bioacoustic classification tasks. In order to enable further research applications in this field, we release a new dataset of mosquito audio recordings. With over a thousand contributors, we obtained 195,434 labels of two second duration, of which approximately 10 percent signify mosquito events. We present an example use of the dataset, in which we train a convolutional neural network on log-Mel features, showcasing the information content of the labels. We hope this will become a vital resource for those researching all aspects of malaria, and add to the existing audio datasets for bioacoustic detection and signal processing.

Why you may want to read this: Newest paper from Jeppe Johan Waarkjær Olsen ().

Autoencoding undirected molecular graphs with neural networks.

Jeppe Johan Waarkjær Olsen, Peter Ebert Christensen, Martin Hangaard Hansen, Alexander Rosenberg Johansen

We propose a machine learning model, inspired by language modeling from natural language processing, which can automatically correct molecules in discrete representations using a structure rule learned from a collection of undirected molecular graphs. Using discrete representations of molecules allows cheap, fast, and coarse grained insights. We introduce an adaption on a modern neural network architecture, the Transformer, which can learn relationships between atoms and bonds. The algorithm thereby solves the unsupervised task of recovering partially observed molecules represented as undirected graphs. This is to our knowledge, the first work that can automatically learn any discrete molecular structure rule with input exclusively consisting of a training set of molecules. In this work the neural network successfully approximates the octet rule, relations in hypervalent molecules and ions when trained on the ZINC and QM9 dataset. These results provides encouraging evidence that neural networks can learn advanced molecular structure rules and dataset specific properties, as the transformer surpasses a strong octet-rule baseline.

欢迎订阅论文广播的每日更新版：http://www.buzzsprout.com/632479。

理论论文机器学习自然语言处理计算机视觉