1、Large Language ModelsIntroduction to Large Language ModelsLanguage modelsRemember the simple n-gram language modelAssigns probabilities to sequences of wordsGenerate text by sampling possible next wordsIs trained on counts computed from lots of textLarge language models are similar and different:Ass
2、igns probabilities to sequences of wordsGenerate text by sampling possible next wordsAre trained by learning to guess the next wordLarge language modelsEven through pretrained only to predict wordsLearn a lot of useful language knowledgeSince training on a lot of textThree architectures for large la
3、nguage modelsDecoders Encoders Encoder-decodersGPT,Claude,BERT family,Flan-T5,WhisperLlama HuBERTMixtralPretraining for three types of architecturesThe neural architecture influences the type of pretraining,and natural use cases.32DecodersLanguage models!What weve seen so far.Nice to generate from;c
4、ant condition on future wordsEncodersGets bidirectional context can condition on future!How do we train them to build strong representations?Encoder-DecodersGood parts of decoders and encoders?Whats the best way to pretrain them?Pretraining for three types of architecturesThe neural architecture inf
5、luences the type of pretraining,and natural use cases.32DecodersLanguage models!What weve seen so far.Nice to generate from;cant condition on future wordsEncodersGets bidirectional context can condition on future!How do we train them to build strong representations?Encoder-DecodersGood parts of deco
6、ders and encoders?Whats the best way to pretrain them?Pretraining for three types of architecturesThe neural architecture influences the type of pretraining,and natural use cases.32DecodersLanguage models!What weve seen so far.Nice to generate from;cant condition on future wordsEncodersGets bidirect