Introduction to Artificial Intelligence

Author

Andreas Handel

Modified

2024-03-20

Overview

In this unit, we will very briefly talk about Artificial Intelligence from a general perspective.

Learning Objectives

  • Know what Deep Learning and Artificial Intelligence are.
  • Know how one can (maybe) use R for such tasks.

Introduction

Artificial intelligence (AI) is everywhere these days. AI approaches have led to some impressive recent technological advances, such as self-driving cars, very good language translation, biometrics and image recognition, and computers that beat humans at games like Chess/Go/Poker, etc. Most recently, we have seen LLM AI tools (which we’ll discuss separately) that can be very useful for doing data analysis and coding tasks.

I have no doubt that progress in AI will continue to be rapid and lead to many important advances. With everything “new and shiny” there comes a good bit of hype. The goal for this short unit is to give you a very brief introduction to what you should think of when you hear AI, how AI is related to data analysis/machine learning, and how you could – if you wanted – do AI yourself (using R).

Neural networks (NN)

Currently, the main workhorse of artificial intelligence (AI) is a type of method/algorithm called a neural net/network (NN).

These are specific kinds of models that originated as an attempt to capture the functioning of biological neurons, and, by extension, brains. While neural nets have been around for a while and have had some successes (e.g. in digit recognition), they have really taken off in recent years with the advent of more data, faster computers, and better algorithms.

Without going into details, you can think of how neural nets work in a few ways. One is that they are collections of individual, in silico, neurons, which are combined into layers. On one end, the input is fed into the model, at the other end, the output is produced. See e.g. Wikipedia page on artificial neural networks for some schematic drawing. The input are your predictor variables, e.g. lots of characteristics measured for a patient; data from an -omics array; pixels of an image; words of some text; sounds of an audio file, etc. The output is whatever label you want to predict, e.g. for some images it could be the 4 categories cat/dog/neither/both. You feed the data to the model and train it. The training of a NN is conceptually similar to training other machine learning (ML) algorithms. Each neuron has parameters associated with it, and as you fit the model, these parameters are tweaked to optimize performance. The bigger your neural network (more neurons and more layers), the more flexible the model. This means it is potentially more powerful, but also more data-hungry.

While the analogy to a biological brain is apt, this comparison does sometimes make NN sound more complicated and fancy than they are. To demystify NN somewhat, you can switch out the comparison to a biological brain, and instead think of them as a coupled set of logistic regressions. Each neuron is more or less described by some type of logistic regression function. It gets some input from multiple variables (either the original input, or a previous layer of neurons), and then based on that input and its parameter setting returns either 0/No or Yes/1. The output is then fed to the next layer of networks. The NN is thus a combination of individual logistic regression models, coupled together in some way.

This idea of connecting individual models to make a bigger, better model should be familiar by now. You saw it when we discussed making models like random forests that combine individual trees, and it was also mentioned as a general idea of ensemble models. You can think of NN as an ensemble of simpler, logistic-type models.

Of course, neural networks are complicated. At this point, there are many different versions of NN in use, and understanding them in detail is time-consuming. However, as general user, you don’t need to understand all the details. Instead, like for other complex ML models, as long as you know how to use them and evaluate what they return, it is ok to not fully understand their inner workings. Properly training/fitting NN can be tricky, because they generally have many different parameters that need tuning to achieve good performance. Thus, a lot of development has gone into methods that allow for efficient tuning/fitting. Fortunately for us as users, these methods have become fairly good such that it is possible to use NN algorithms built by others and generally trust they work well – similar to us using other models (GLM, trees, SVM, etc.) – without having to worry about all the details.

If you want to learn more about neural nets, the Wikipedia entry is a good place to start. Also check the AI Resources page.

Deep Learning (DL)

You might also hear the term Deep Learning (DL) used in the context of AI. Some folks distinguish DL and AI, considering the former a subset of the latter. Deep Learning (DL) generally refers to using a specific class of neural networks, namely those that have multiple layers of (in silico) neurons (it has nothing to do with deep as in especially insightful). These days, DL is sometimes used a bit more loosely and can refer to any NN-based complex algorithm (or sometimes even a non-NN complex model) applied to a problem. However, most often if someone says/writes that they use deep learning to address some problem, they mean using a type of neural net to fit data.

Artificial Intelligence (AI)

Artificial Intelligence (AI) is definitely a trendy topic these days. It is widely used and also widely mis-used in many contexts these days. Roughly speaking, AI is the use of complex models, usually NN, to solve difficult problems. If one wanted to differentiate DL and AI, one might say that AI is the use of DL approaches applied to “complex” problems. However, it seems these days that DL and AI are terms that are used largely interchangeably. Overall, if you hear AI or DL, you can think of it roughly as fitting a neural net model to data. Unfortunately, since DL and AI have become such hot topics, terms are often misused these days, especially outside academia.

If you want to learn a bit more on the distinction between DL and AI (and just in general), the AI and DL sections of Wikipedia are good starting points.

As with all “new and shiny” things, there seems to be a bit of a current trend to use DL/AI approaches even when not needed. As an example, DeVries et al. 2018 used deep learning to analyze data; this follow-up correspondence article from Mignan and Broccardo 2019 showed that a single logistic regression produced more or less the same results. The reply by one of the original authors, Meade 2019, is also worth reading.

Overall, DL and AI are certainly very promising approaches and will undoubtedly lead to significant improvements in our ability to harness data. As with most technologies going through a “bubble” there is currently work that is both substantial and important, and work that is fluffy and full of hype.

DL and AI in R

This section is about using R to train/fit NN models, not how to use LLM AI tools like ChatGPT with R. That is covered in a separate unit.

While R is a great tool for data analysis, it’s currently probably not the best choice for doing AI/fitting NN models. If you really want to go deep into AI work, using something like Python, Julia, or some more specialized programming language is likely better.

That said, there are some ways to use R to fit some NN models. The best and easiest way is likely to use R packages that allow you to interact with powerful tools such as TensorFlow and Keras through R.

The RStudio TensorFlow website has a lot of good information and documentation on how to use Keras through R to do DL/AI. Starting there with the Tutorials section is probably the best way to get going. After that you can branch out to some of the other resources. While I’m sure a DL/AI expert uses more than just Keras/Tensorflow as tools, you can get very far with those. And for playing around with DL/AI, Keras through R is a great place to start. See the exercise for suggested starting points.

Since DL/AI usually involves fitting large amounts of data to complex models, time constraints are often crucial and, at times, powerful computers are needed. An important development is the use of GPU (graphical processing unit) computing. While modern computers usually have more than 1 CPU, they are still generally limited to a small number. Even a very powerful single desktop generally will have less than 100 CPUs. In contrast, modern graphics cards often have >1000 GPUs that can all be used in parallel to perform model fitting. Products such as Tensorflow allow one to use (mainly NVIDIA) GPUs to fit complex models to a lot of data in an often reasonable amount of time without requiring a supercomputer cluster. Unfortunately, last time I tried, R still didn’t have great GPU support. I have not recently tried to use Keras with GPUs through R.

Further Resources

See the AI Resources page for some additional information on various AI topics.