Unsupervised Abstractive Summaries of Controllable Length

Automatic Summarization

Automatic summarization is concerned with compressing document(s) into a concise and fluent summary while preserving the most important information. There are many approaches to the automatic generation of summaries. One of the most important distinction to be made is whether the summary generation system is developed using a supervised approach, that is using examples of documents with their corresponding summaries, or without supervision, that is without such examples. Although supervised approaches recently made great progress (Lewis, Liu, Goyal, Ghazvininejad, Mohamed, Levy, Stoyanov & Zettlemoyer, 2019; Liu & Lapata, 2019; Nallapati, Zhou, Santos, Gulcehre & Xiang, 2016; Rush, Chopra & Weston, 2015; See, Liu & Manning, 2016; Zhang, Zhao, Saleh & Liu, 2019), they are not without their shortcomings when it comes to real-world applications. One of their main drawbacks is the fact that currently available datasets are rarely an optimal fit for specific application domains. Another limitation is the lack of datasets in languages other than English. Collecting and annotating large amount of aligned data require great efforts, therefore, the ability to train summarization models in an unsupervised fashion is interesting because it eliminates the need to provide reference summaries. For these reasons, this work focuses on the unsupervised training of automatic summarization systems.

Summarization algorithms can be split into two main categories: extractive and abstractive. Extractive algorithms (Erkan & Radev, 2004; Zheng & Lapata, 2019) redact summaries by concatenating relevant portions of the input, while abstractive algorithms (Liu & Lapata, 2019; Rush et al., 2015) generate new texts that may use terms that are not present in the input (Das & Martins, 2007; Nenkova & McKeown, 2011). It has been observed that human written summaries tend to be abstractive (Kryściński, Keskar, McCann, Xiong & Socher, 2019).

Artificial Neural Networks

Inspired by the mammal brain, Artificial Neural Networks (ANN) have proven themselves powerful tools to deal with multiple Artificial Intelligence (AI) tasks in recent years. This section offers an introduction to the most common types of ANNs currently used in the field of Natural Language Processing (NLP).

Artificial Neural Networks are a vast and ever-evolving subject and for this reason an in depth survey is out of the scope of this work. Readers interested in the matter are invited to consult one of the many resources on the subject, such as the excellent Deep Learning textbook (Goodfellow, Bengio & Courville, 2016).

Recurrent Neural Networks

Recurrent Neural Networks (RNN) are a class of Artificial Neural Networks whose hidden units use recurrent connections to persist information as they go through a sequence of states (Elman, 1990). Persistence of information through time allows RNNs to model temporal dynamic behaviors.

One of the most widely used RNN architecture is Long Short-Term Memory (LSTM) (Hochreiter & Schmidhuber, 1997). The key concept behind LSTMs is the memory cell. A memory cell is a recurrent unit that persists information in a cell state vector and computes an hidden state vector at each time step. Memory cells also use gates to filter their informational content. LSTMs typically use three types of gates: the forget gate, the input gate and the output gate.

Autoencoders

Modern autoencoder neural network architectures were introduced by (Bengio, Lamblin, Popovici & Larochelle, 2007) and (Ranzato, Poultney, Chopra & Cun, 2007) as a mean to learn “good” representations for initializing deep architectures. A good representation could be defined as a representation potentially useful for addressing tasks of interest. That is a representation that will help a system quickly achieve higher performance on a task than if it had not learned to form the representation in the first place (Vincent, Larochelle, Lajoie, Bengio & Manzagol, 2010). It is typically expected from a good representation to retain a significant amount of information about the input.

Attention Mechanisms

One of the main limitations of basic RNN-based Encoder-Decoders is that the encoder compress all the relevant information of an input sequence to a fixed-dimensional vector representation. This architectural bottleneck was found to hurt performances, especially as the length of sequences starts to grow and gets longer than sequences observed during training (Cho, van Merriënboer, Bahdanau & Bengio, 2014b).

Attention mechanisms (Bahdanau et al., 2014) propose to overcome this limitation by encoding an input sequence into a sequence of vectors and by adaptively choosing a subset of these vectors during decoding in order to better model the task at hand. In other words, attention allows a model to focus on the relevant parts of an input sequence depending on the current output needs.

Previous work

Supervised Abstractive Summarization

Supervised approaches have recently made great progress in the field of abstractive singledocument summarization. Rush et al. (2015) first introduced the usage of RNN-based EncoderDecoders and attention mechanisms for the task of sentence compression. Nallapati et al. (2016) soon followed and introduced the CNN/DailyMail corpus, the first large-scale dataset for the training of abstractive summarizers. Then, See et al. (2016) proposed two important improvements to the basic architecture: a pointer-generator mechanism to help summarizers manage out-of-vocabulary tokens, and a coverage mechanism to reduce the amount of unnecessary repetitions that plagued the outputs generated by this first wave of deep neural summarizers. Recent proposals have switched to the more modern Transformer based Encoder Decoder architecture and explored various form of pretrained representations (Lewis et al., 2019; Liu & Lapata, 2019; Zhang et al., 2019).

Bert for Summarization, or BertSum (Liu & Lapata, 2019), is a document-level encoder specifically designed to provide deep bidirectionnal representations for the task of singledocument summarization (extractive and abstractive). One key difference between Bert and BertSum lies in the preprocessing of the input data.

In order to leverage the pretrained Bert model for summarization purposes, BertSum’s preprocessing procedure considers an input document as a sequence of n sentences. Bert’s special tokens [CLS] and [SEP] are inserted before and after each sentence of a sequence. Segment embeddings are used at interval depending on whether senti is odd or even (document D = [sent1, sent2, sent3, sent4, sent5] would have the [eA, eB, eA, eB, eA] embeddings assigned to it). Finally, BertSum uses Transformer Position Encodings that can be fine-tuned to support sequences longer than 512 tokens.

Unsupervised Abstractive Summarization

Research in unsupervised single-document summarization historically focused on extractive approaches (Erkan & Radev, 2004; Hirao, Yoshida, Nishino, Yasuda & Nagata, 2013; Marcu, 1997; Mihalcea, 2004; Parveen, Ramsl & Strube, 2015; Yin & Pei, 2015). Recent work in this area include (Fevry & Phang, 2018), in which a denoising autoencoder (Vincent, Larochelle, Bengio & Manzagol, 2008) is trained to remove and reorder words from an input sequence and (Zheng & Lapata, 2019), which uses Bert (Devlin et al., 2018) to capture sentential meaning and compute sentence similarity for the purpose of sentence selection.

Table des matières

INTRODUCTION
CHAPTER 1 BACKGROUND
1.1 Automatic Summarization
1.2 Artificial Neural Networks
1.2.1 Feed-Forward Neural Networks
1.2.2 Recurrent Neural Networks
1.2.3 Autoencoders
1.2.4 Encoder-Decoders
1.2.5 Attention Mechanisms
1.2.6 Transformer
1.2.7 Encoder-Decoder Output Control
1.2.8 Artificial Neural Networks and Natural Languages
1.3 Previous work
1.3.1 Supervised Abstractive Summarization
1.3.2 Unsupervised Abstractive Summarization
1.3.2.1 MeanSum: unsupervised multi-document abstractive summarization
1.4 Summary
CHAPTER 2 METHODOLOGY
2.1 Datasets
2.2 Preprocessing
2.2.1 RNN-based experiments
2.2.2 Transformer-based experiments
2.3 Evaluation
2.3.1 ROUGE: Recall-Oriented Understudy for Gisting Evaluation
2.3.1.1 ROUGE-N: n-gram cooccurrence statistics
2.3.1.2 ROUGE-L: Longest Common Subsequence
2.3.1.3 Limitations
2.3.2 Output length control evaluation
2.4 Summary
CHAPTER 3 UNSUPERVISED ABSTRACTIVE SUMMARIZATION BASED ON RNN
3.1 Proposed Model
3.2 Experiments
3.3 Results
3.4 Discussion
3.5 Future work
3.6 Summary
CHAPTER 4 UNSUPERVISED ABSTRACTIVE SUMMARIZATION BASED ON TRANSFORMER
4.1 Proposed Model
4.2 Experiments
4.2.1 Unsupervised Experiments’ Baseline Models
4.3 Results and Discussion
4.4 Future work
4.5 Summary
CONCLUSION 

Cours gratuitTélécharger le document complet

Télécharger aussi :

Laisser un commentaire

Votre adresse e-mail ne sera pas publiée. Les champs obligatoires sont indiqués avec *