1. Large Language Model (LLM)
- A type of artificial intelligence model designed to understand, generate, and manipulate human language by predicting the next word in a sequence.
2. Transformer Architecture
- A neural network architecture that uses self-attention mechanisms to process sequences of data, foundational for many LLMs.
3. Self-Attention Mechanism
- A technique within transformers where each word in a sentence is weighted by its relevance to other words, allowing the model to capture context more effectively.
4. Tokenization
- The process of converting text into tokens (words, subwords, or characters) that the model can process.
5. GPT (Generative Pre-trained Transformer)
- A series of LLMs developed by OpenAI that are pre-trained on a large corpus of text and fine-tuned for specific tasks.
6. Pre-training
- The initial phase where a model is trained on a vast dataset to learn general language patterns.
7. Fine-tuning
- The process of further training a pre-trained model on a smaller, task-specific dataset to adapt it for specific tasks.
8. Context Window
- The amount of text the model can consider at once, typically defined by the number of tokens.
9. BERT (Bidirectional Encoder Representations from Transformers)
- A type of LLM designed to understand the context of a word in a sentence by looking at the words before and after it.
10. Masked Language Model (MLM)
- A training approach used in models like BERT where certain words are masked and the model is trained to predict them.
11. Zero-Shot Learning
- The ability of a model to perform a task without having been explicitly trained on examples of that task.
12. Few-Shot Learning
- The model's ability to perform a task after being given only a few examples.
13. Prompt Engineering
- The process of designing inputs (prompts) to guide the behavior of an LLM to generate desired outputs.
14. Natural Language Processing (NLP)
- The field of AI focused on the interaction between computers and human language.
15. Sequence-to-Sequence Model (Seq2Seq)
- A model architecture used for tasks where the input is a sequence of tokens and the output is another sequence, like translation.
16. Decoder
- The part of a transformer model responsible for generating output sequences from encoded inputs.
17. Encoder
- The component of a transformer that processes the input sequence and encodes it into a format suitable for decoding.
18. Attention Head
- One of the multiple sub-units within the self-attention mechanism, each capturing different aspects of the input context.
19. Multi-Head Attention
- A process where multiple attention heads work in parallel to capture various types of relationships in the data.
20. Position Embedding
- The technique used to add information about the position of words in a sequence to the model, since transformers don’t inherently understand word order.
21. Embedding
- A dense vector representation of words, sentences, or other data types that captures semantic meaning.
22. Transfer Learning
- The process of using a pre-trained model on a new, related task, taking advantage of the knowledge the model has already learned.
23. Backpropagation
- The algorithm used to adjust the weights in a neural network during training by propagating the error backward through the network.
24. Gradient Descent
- An optimization algorithm used to minimize the error in the model by iteratively adjusting the model's parameters.
25. Overfitting
- A situation where a model performs well on the training data but poorly on unseen data due to excessive complexity.
26. Underfitting
- When a model is too simple to capture the underlying patterns in the data, leading to poor performance on both training and test data.
27. Hyperparameters
- The parameters of a model that are set before training begins, such as learning rate, batch size, and the number of layers.
28. Epoch
- One complete pass through the entire training dataset.
29. Batch Size
- The number of training examples used in one iteration of training.
30. Learning Rate
- A hyperparameter that controls how much to change the model in response to the estimated error each time the model's weights are updated.
31. Regularization
- Techniques used to prevent overfitting by penalizing complex models, such as L2 regularization.
32. Dropout
- A regularization technique where random units in the network are ignored during training to prevent overfitting.
33. Activation Function
- A function applied to the output of a neural network layer, introducing non-linearity into the model (e.g., ReLU, Sigmoid).
34. Softmax
- An activation function often used in the output layer of a classifier to convert logits to probabilities.
35. Logits
- The raw, unnormalized scores output by a model before applying a softmax function.
36. Language Model
- A statistical model that assigns probabilities to sequences of words or tokens, predicting the likelihood of a given sequence.
37. Beam Search
- A search algorithm used to generate sequences by keeping track of multiple sequences at each step and only keeping the best ones.
38. Perplexity
- A measurement of how well a language model predicts a sample; lower perplexity indicates better performance.
39. Latent Space
- The abstract, multi-dimensional space where the model represents different features or concepts learned during training.
40. Neural Network
- A series of algorithms that mimic the human brain to recognize patterns and make decisions.
41. Parameters
- The weights and biases within a model that are learned from data during training.
42. Evaluation
- The process of evaluating the capabilities and quality of the LLM, typically against a benchmark dataset or by humans.
43. Attention Score
- The value calculated during the self-attention process that determines the importance of one word to another in a sequence.
44. Language Understanding
- The ability of an AI model to comprehend and make sense of human language, beyond just processing it.
45. Inference
- The process of an AI model taking an input and generating an output.
46. Natural Language Generation (NLG)
- The process of generating human-like text from a model, often used in chatbots, translation, and summarization.
47. Reinforcement Learning from Human Feedback (RLHF)
- A technique where a model is fine-tuned using feedback from humans to improve its performance on tasks.
48. Autoencoder
- A type of neural network used to learn efficient representations (codings) of data, typically for dimensionality reduction or noise reduction.
49. Cross-Entropy Loss
- A common loss function used in classification problems that measures the difference between the predicted and actual distributions.
50. Knowledge Distillation
- A technique where a smaller model (student) is trained to mimic the behavior of a larger model (teacher) to reduce complexity while maintaining performance.