So let’s write about NLP !

After some experimentation with ULMFiT during the fast.ai course on sentiment analysis, I wanted to explore the new language model release by OpenAI in february.

I used the language model (LM) published by OpenAI (GPT2-117M) which is a small version of their cutting edge LM released in february 2019. This small version is already quite fun to work with and I was entertained by the idea of using it in this blog post.

So what’s GPT-2 ?

GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset We created a new dataset which emphasizes diversity of content, by scraping content from the Internet. In order to preserve document quality, we used only pages which have been curated/filtered by humans—specifically, we used outbound links from Reddit which received at least 3 karma. This can be thought of as a heuristic indicator for whether other users found the link interesting (whether educational or funny), leading to higher data quality than other similar datasets, such as CommonCrawl. of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks across diverse domains.

More info here

Fine tuning the model

I used a technique call Transfer Learning to train the model with 360 articles from Medium on DataScience and Artificial Intelligence. The Dataset is available on Kaggle.

The goal was to help the model answer some questions about AI by giving him some literature about it ! A bit like I did when I asked myself that question a few years ago.

So basically, there is two types of output, one is conditional, you give an input to the model and the model complete the rest with a prediction of the following words and the other one is unconditional where the model just generate a text without any input. In this case, I have specific topics I want the model to use, so I used the conditional output.

I trained the model for 10 hours and got an average loss of 0.09 at the end.

Note: [4199 : 18881.07] loss=0.06 avg=0.09

Samples

Let’s watch some of the answers to see if it make a bit of sense !

Model prompt : Let me tell you a story about AI


and how would you say it? Don’t worry, I’ve got you covered. The science of AI is still very young and relatively new. The main reason is that neural networks are not hardware friendly. You need a powerful graphics processor and a decent RAM card to feed it. Even then, you might be limited by the computer’s graphic cards. The memory card isn’t big for neural networks and most applications. Also, we do not have any hardware right now. The Intel Xeon processor E5–1620 v4 (8 GB L3 cache) is the single most powerful processor in the world. This processor supports 2.4 GB of RAM. Unfortunately, we had to bring it from Slovenia in order to make this recipe. If you don’t have the cash to do IT work, but can afford the computer, then this might be your best bet. There are two reasons why I say this recipe is known as “cheap hardware hardware hardware hardware hardware.” The first reason is that neural networks are very hard to tune and would usually fail when this happened. The second reason is that there are a huge number of GPUs on this planet with very little on earth that isn’t tuned to look the part. If we could just import every single pixel of each global illumination into a neural network,we’d be able to create something incredibly impressive like this: Unfortunately, we can’t do this — the neural network tends to fail often. So we’re stuck with trying.

Model prompt : Let me tell you a story about AI


and how it’s all going to improve you. First, it all came down to one easy answer: don’t worry about it. I’ll just tell you what to expect. There are dozens of easy answers to “Why should I care about AI?” that I’ve seen over the past 15 years of research and development. Most obviously, deep learning is a technology that can learn. Most significantly, convolutional neural networks are an industry-leading technology. However, there are challenges. Most obviously, the use of arrays. As an AI, do not open sourced their neural networks unless specifically noted. However, a lot of efforts are being made to bring computational arrays to market. Most notably, the Azure CTO are shipping a staggering amount of software code in bundled updates that are available for both the CLI and the IoT environment. So, while these updates are generally welcomed, most people who are using AI should expect them to be running in a black box.


Let’s consider how AI will affect us in the future. Will we have more money, will we take drugs, will we have better relationships, will we have larger families? Will we experience less pain, because we become aware of our limitations more effectively? Many people fear that AI will overload the brain with irrelevant thoughts, reducing activities like planning, learning, and acting. This is the very harmful effect of hyper-automation — which is to inhibit the activities that enhance the benefits of high-level engagement and exploration. Impulsive purchases like chess players and car buyers are being trained to acquire more and more of our cognitive potential. Autonomous vehicles are already helping to create quite the social order in the driving world: they can drive like we do, take control of our car like we do, and so forth. But the mechanism behind our cognition is a question that is being driven less and explored, at least in the US. Since the days of the great researchers of machine learning and neural networks, the mechanism by which our cognition functions is unknown. It’s still not understood how effectively these networks can deceive us, or what we’re really concerned with.” We’re only scratching the surface here. It’s even at the very least possible that the altered us might help push advances in artificial cognition, though our mechanisms of communication are not fully understood (this is the main question on which the major advance in Deep RL is based). Another question that is likely to be of major interest in the near future is: where do we draw the line? We’re not entirely sure where the line draws it, but it’s clear that our reasoning regarding how we learn and how we can learn from experiences is fundamentally based on almost nothing. Many researchers have pointed out that the very first steps in our learning process are looking at the environment around us, which is primarily a representation of what we are looking at. Obviously, since we don’t know what the environment is right now, we attempt to learn it from the viewpoint of X and above.

Results

Some of theses output were quite funny but it’s not exactly ready to write a blog post for me ;) I didn’t push it too far because it was just a test to practice transfer learning with this LM but I will definitly try more usefull things with it soon.