Back to blog

LLMs For Builders : Jargons, Theory & History

Equipping you with the knowledge to start building AI applications

·4 min read·
llmssoftwareengineeraidata
Image by the author (generated with DALL-E)

AI is a hot topic, with abundant consumer-focused content and continuous research breakthroughs.
For a typical software engineer or data engineer starting his journey into the world of Large Language Models (LLMs) to build, it can be overwhelming. At least, I know I felt that way.
What level of understanding is truly essential?

This post aims to demystify just enough theory, terminology, and history so you can grasp how these elements interconnect. My goal is to provide a comprehensive yet accessible overview, equipping you with the knowledge to start building.

In this blog, we'll explore the jargon and history around LLMs and cover the key features that define them. To keep things practical and true to our mission of building, we'll conclude by running an LLM on a local machine.

Note: this article is the 1st part of a series.

Image by the Author

A brief look back: from complexity to accessibility

To appreciate where we stand today with technologies like ChatGPT, it's helpful to rewind and see the journey that led to these advancements.

Up to 2017 - RNNs & LSTMs

Initially, deep neural network models such as Recurrent Neural Networks (RNNs) and their advanced variant, Long Short-Term Memory Networks (LSTMs), were predominant. They are sequentially processing text, but they face two challenges :

  • Handling long sequences and fully understanding broader contexts was difficult.

  • Due to their sequential nature, RNNs and LSTMs are limited in their ability to be processed in parallel, affecting how much data you can effectively feed the model.

2017 - A Paradigm shift with transformers

The transformer model, introduced in the "Attention Is All You Need" paper, changed the landscape.
Unlike RNNs and LSTMs, transformers used parallel processing and an attention mechanism, handling context and long-range dependencies more effectively.
In brief, the transformer's attention mechanism allows the model to "focus" on different parts of the input data at once, much like how you focus on different speakers at a noisy cocktail party.
It can weigh the importance of each part of the input data, no matter how far apart they are in the sequence.

2018 - Post-Transformer

Transformers enabled us to move away from linear processing to a more dynamic, context-aware approach.
Two major milestones :

  • BERT: This model, focusing only on encoding, was great at getting the context right. It changed the game in areas like figuring out what text means and spotting emotions in words.

  • GPT: The GPT series, like GPT-3, which focused just on decoding, became famous for creating text that feels like a human wrote it. They're really good at many tasks involving coming up with new text.

But hold on, what exactly do we mean by 'encoding' and 'decoding'?

Encoder-decoder models combine input understanding and output generation, ideal for machine translation. For instance, in Google Translate, the encoder comprehends an English sentence, and the decoder then produces its French equivalent.

Encoder-only models like BERT are geared towards understanding inputs, excelling in tasks like sentiment analysis where deep text comprehension is essential.

Decoder-only models, such as GPT, specialize in generating text. They're less focused on input interpretation but excel in creating coherent outputs, perfect for text generation and chatbots.

Decoder-only models have become quite the trend because they're versatile and simpler to use. This makes them a favorite for all sorts of tasks, and they keep getting better thanks to improvements in how they're trained and the hardware they run on.

2021 - Multimodal era

In 2021, with DALL-E's release, we saw the expansion of LLM capabilities, similar to those in GPT, into the realm of multimodal applications. 'Multimodal' means these models handle more than just text - they understand images too! DALL-E, built on the foundations of GPT, used its language understanding skills to interpret text and then creatively generate corresponding images. This was a big deal because it showed that the techniques used in text-based models like GPT could also revolutionize how AI interacts with visual content.

For reference, the below-left image was pre-DALL-E from a 2020 paper, and the one to the right was taken from today’s Midjourney. Things are moving fast.

2022 Release of ChatGPT and mass adoption

ChatGPT has become the user interface of AI, democratizing access for anyone who can type on a laptop - and it's free. It's also the fastest-growing application in history, reaching 100 million users in just two months.

Source : Peter Yang

Since then, there's been an explosion of new models, both open-sourced (Llama2 , Mistral, etc) and proprietary (Claude, Cohere, etc), and a whole bunch of startups have sprung up. Not only have images become more impressive, but we've also started seeing things like text-to-video or text-to-audio. AI is branching out in so many directions, and this is just the beginning. Big players like Adobe are even integrating AI into their products, showing just how mainstream this technology is becoming.

Image by the Author, inspired by the great Intro to LLM from Andrej Karpathy

Running a model on your laptop

Now that we've got a handle on the basics, it's time to run our first model.
Thanks to the surge in open-source projects like Llama2 and Mistral, we've got many tools to help us run these models right on our laptops. While big cloud platforms and services like HuggingFace are often the go-to for hosting LLMs, there's a lot of progress in making it possible to run them efficiently on your own computer, using the full potential of your CPU and/or GPU.
Ollama is a great example that lets you run, create, and share large language models with a command-line interface.
You can think of it as "Docker for LLMs".

Setup

If you are on MacOs, you can use brew package manager to install

brew install ollama

Or visit their download page for other distribution.

Download and running a model

Let's say we want to try the famous latest Mixtral 8x7B from Mistral, we simply have to do :

ollama run mixtral

Time to get your coffee ready! The first time you run the command, it's going to download the model, which is a 26GB file size.
Once that's done, you can simply type in your prompt and press enter!

Of course, you can run many other supported models. Have a look at their model library.

Two great resources to help you choose the right model are:

Onward and Upward

Well done! You've navigated through the complex jargon and now have a solid grasp of the key elements in the world of LLMs. Plus, you've even managed to run a model on your own computer!

What's next? Only the exciting stuff. In the upcoming blog, I'll explore how to craft effective prompts and explore various techniques to make the most of LLMs.

Stay tuned for level 2 🧗‍♂️ !