August 22, 2024

LLM terminology. The top 50 terms to know

A comprehensive list and simple explanations of the top 50 terms in the Large Language Model (LLM) space.

1. Large Language Model (LLM)
  • A type of artificial intelligence model designed to understand, generate, and manipulate human language by predicting the next word in a sequence.
2. Transformer Architecture
  • A neural network architecture that uses self-attention mechanisms to process sequences of data, foundational for many LLMs.
3. Self-Attention Mechanism
  • A technique within transformers where each word in a sentence is weighted by its relevance to other words, allowing the model to capture context more effectively.
4. Tokenization
  • The process of converting text into tokens (words, subwords, or characters) that the model can process.
5. GPT (Generative Pre-trained Transformer)
  • A series of LLMs developed by OpenAI that are pre-trained on a large corpus of text and fine-tuned for specific tasks.
6. Pre-training
  • The initial phase where a model is trained on a vast dataset to learn general language patterns.
7. Fine-tuning
  • The process of further training a pre-trained model on a smaller, task-specific dataset to adapt it for specific tasks.
8. Context Window
  • The amount of text the model can consider at once, typically defined by the number of tokens.
9. BERT (Bidirectional Encoder Representations from Transformers)
  • A type of LLM designed to understand the context of a word in a sentence by looking at the words before and after it.
10. Masked Language Model (MLM)
  • A training approach used in models like BERT where certain words are masked and the model is trained to predict them.
11. Zero-Shot Learning
  • The ability of a model to perform a task without having been explicitly trained on examples of that task.
12. Few-Shot Learning
  • The model's ability to perform a task after being given only a few examples.
13. Prompt Engineering
  • The process of designing inputs (prompts) to guide the behavior of an LLM to generate desired outputs.
14. Natural Language Processing (NLP)
  • The field of AI focused on the interaction between computers and human language.
15. Sequence-to-Sequence Model (Seq2Seq)
  • A model architecture used for tasks where the input is a sequence of tokens and the output is another sequence, like translation.
16. Decoder
  • The part of a transformer model responsible for generating output sequences from encoded inputs.
17. Encoder
  • The component of a transformer that processes the input sequence and encodes it into a format suitable for decoding.
18. Attention Head
  • One of the multiple sub-units within the self-attention mechanism, each capturing different aspects of the input context.
19. Multi-Head Attention
  • A process where multiple attention heads work in parallel to capture various types of relationships in the data.
20. Position Embedding
  • The technique used to add information about the position of words in a sequence to the model, since transformers don’t inherently understand word order.
21. Embedding
  • A dense vector representation of words, sentences, or other data types that captures semantic meaning.
22. Transfer Learning
  • The process of using a pre-trained model on a new, related task, taking advantage of the knowledge the model has already learned.
23. Backpropagation
  • The algorithm used to adjust the weights in a neural network during training by propagating the error backward through the network.
24. Gradient Descent
  • An optimization algorithm used to minimize the error in the model by iteratively adjusting the model's parameters.
25. Overfitting
  • A situation where a model performs well on the training data but poorly on unseen data due to excessive complexity.
26. Underfitting
  • When a model is too simple to capture the underlying patterns in the data, leading to poor performance on both training and test data.
27. Hyperparameters
  • The parameters of a model that are set before training begins, such as learning rate, batch size, and the number of layers.
28. Epoch
  • One complete pass through the entire training dataset.
29. Batch Size
  • The number of training examples used in one iteration of training.
30. Learning Rate
  • A hyperparameter that controls how much to change the model in response to the estimated error each time the model's weights are updated.
31. Regularization
  • Techniques used to prevent overfitting by penalizing complex models, such as L2 regularization.
32. Dropout
  • A regularization technique where random units in the network are ignored during training to prevent overfitting.
33. Activation Function
  • A function applied to the output of a neural network layer, introducing non-linearity into the model (e.g., ReLU, Sigmoid).
34. Softmax
  • An activation function often used in the output layer of a classifier to convert logits to probabilities.
35. Logits
  • The raw, unnormalized scores output by a model before applying a softmax function.
36. Language Model
  • A statistical model that assigns probabilities to sequences of words or tokens, predicting the likelihood of a given sequence.
37. Beam Search
  • A search algorithm used to generate sequences by keeping track of multiple sequences at each step and only keeping the best ones.
38. Perplexity
  • A measurement of how well a language model predicts a sample; lower perplexity indicates better performance.
39. Latent Space
  • The abstract, multi-dimensional space where the model represents different features or concepts learned during training.
40. Neural Network
  • A series of algorithms that mimic the human brain to recognize patterns and make decisions.
41. Parameters
  • The weights and biases within a model that are learned from data during training.
42. Evaluation
  • The process of evaluating the capabilities and quality of the LLM, typically against a benchmark dataset or by humans.
43. Attention Score
  • The value calculated during the self-attention process that determines the importance of one word to another in a sequence.
44. Language Understanding
  • The ability of an AI model to comprehend and make sense of human language, beyond just processing it.
45. Inference
  • The process of an AI model taking an input and generating an output.
46. Natural Language Generation (NLG)
  • The process of generating human-like text from a model, often used in chatbots, translation, and summarization.
47. Reinforcement Learning from Human Feedback (RLHF)
  • A technique where a model is fine-tuned using feedback from humans to improve its performance on tasks.
48. Autoencoder
  • A type of neural network used to learn efficient representations (codings) of data, typically for dimensionality reduction or noise reduction.
49. Cross-Entropy Loss
  • A common loss function used in classification problems that measures the difference between the predicted and actual distributions.
50. Knowledge Distillation
  • A technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher) to reduce complexity while maintaining performance.

Other Blogs

View pricing and plans

SaaS Webflow Template - Frankfurt - Created by Wedoflow.com and Azwedo.com
blog content
Keep reading

Get the latest news and updates
straight to your inbox

Thank you!
Your submission has been received!
Oops! Something went wrong while submitting the form.