Generative AI with LLMs (Week 3)
Generative AI with LLMs (Week 3)
Course Notes and Slides from DeepLearning.AI’s Generative AI with LLMs course.
Â
Reinforcement Learning with Human Feedback (RLHF)
Â
RL algorithm typically used is PPO (proximal policy optimization)
Â
Â
Â
KL divergence is used to penalize model outputs that shift too far from reference model.
Â
Constitutional AI
Using model self supervision to train harmless AI assistant.
System is given a set of rules to follow.
Supervision Phase: Generate self-critiques and revisions
Â
In the supervised phase we sample from an initial model, then generate self-critiques and revisions, and then finetune the original model on revised responses. In the RL phase, we sample from the finetuned model, use a model to evaluate which of the two samples is better, and then train a preference model from this dataset of AI preferences
Â
Model Optimizations
Distillation - Train smaller model with larger model
Quantization - Reduce precision of model weights
Pruning - Removing model weights with value close or qual to zero (retraining, PEFT, LORA)
Â
Time and Effort in Lifecycle
Â
LLM Applications
Â
Modern LLM applications involve orchestration libraries interacting with LLMs, data sources and applications to facilitate user request.
Retrieval Augmented Generation
Adding external information sources to the context window to
Â
Chain of Thought Prompting
Â
Guiding LLM to breakdown a problem by steps.
Program Aided Language Model
Get LLM to work with code interpreter. Allows for better math responses.
Â
ReAct: Synergizing Reasoning and Actions in LLMs
Â
ReAct Prompt consists of:
Question - Problem that requires advanced reasoning and multiple steps to solve.
Thought - identifies how model will tackle problem (e.g search [entity])
Action - info to lookup
Observation - new info found
Â
LLM App Architecture
Â
Modern apps will contain the following:
- API
- LLM Tools & Frameworks
- Info sources
- LLM Models
- Generated feedback
- Infrastructure
Â
Â