Keywords: Deep Learning, Natural Language Processing
Webpages:
https://blogs.technet.microsoft.com/machinelearning/2017/02/13/cloud-scale-text-classification-with-convolutional-neural-networks-on-microsoft-azure/,
https://github.com/dmlc/mxnet/tree/master/R-package,
https://github.com/Azure/Cortana-Intelligence-Gallery-Content/tree/master/Tutorials/Deep-Learning-for-Text-Classification-in-Azure The use of deep learning for NLP has attracted a lot of interest in the research community over recent years. This talk describes how deep learning techniques can be applied to natural language processing (NLP) tasks using
R. We demonstrate how the
MXNet deep learning framework can be used to implement, train and deploy deep neural networks that can solve text categorization and sentiment analysis problems.
We begin by briefly discussing the motivation and theory behind applying deep learning to NLP tasks. Deep learning has achieved a lot of success in the domain of image recognition. State-of-the-art image classification systems employ convolutional neural networks (CNNs) with a large number of layers. These networks perform well because they can learn hierarchical representations of the input with increasing levels of abstraction. In the context of NLP, neural networks have been shown to achieve good results. In particular, Recurrent Neural Networks such as Long Short Term Memory Networks (LSTMs) perform well for problems where the input is a sequence, such as speech recognition and text understanding. In this talk we explore an interesting approach which takes inspiration from the image recognition domain and applies CNNs to NLP problems. This is achieved by encoding segments of text in an image-like matrix, where each encoded word or character is equivalent to a pixel in the image.
CNNs have achieved excellent performance for text categorization and sentiment analysis. In this talk, we demonstrate how to implement a CNN for these tasks in
R. As an example, we describe in detail the code to implement the
Crepe model. To train this network, each input sentence is transformed into a matrix in which each column represents a one-hot encoding of each character. We describe the code needed to perform this transformation and how to specify the structure of the network and hyperparameters using the
R bindings to
MXNet provided in the
mxnet package. We show how we implemented a custom
C++ iterator class to efficiently manage the input and output of data. This allows us to process CSV files in chunks, taking batches of raw text and tranforming them into matrices in memory, whilst distributing the computation over multiple GPUs. We describe how to set up a virtual machine with GPUs on Microsoft Azure to train the network, including installation of the necessary drivers and libraries. The network is trained on the Amazon categories dataset which consists of a training set of 2.38 million sentences, each of which map to one of 7 categories including
Books,
Electronics and
Home & Kitchen.
The talk concludes with a demo of how a trained network can be deployed to classify new sentences. We demonstrate how this model can be deployed as a web service which can be consumed from a simple web app. The user can query the web service with a sentence and the API will return a product category. Finally, we show how the Crepe model can be applied to the sentiment analysis task using exactly the same network structure and training methods.
Through this talk, we aim to give the audience insight into the motivation for employing CNNs to solve NLP problems. Attendees will also gain an understanding of how they can be implemented, efficiently trained and deployed in
R.