《HC2022.KAIST.SeongminHong.v01.pdf》由会员分享,可在线阅读,更多相关《HC2022.KAIST.SeongminHong.v01.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、DFX:A Low-latency Multi-FPGA Appliance for Accelerating Transformer-basedText GenerationSeongmin Hong1,Seungjae Moon1,Junsoo Kim1,Sungjae Lee2,Minsub Kim2,Dongsoo Lee2,and Joo-Young Kim11CastLab,School of EE,KAIST,2NAVER CLOVAHOTCHIPS22 Poster SessionText Generation Text generation Automatic generat
2、ion of human-readable text by a computer Example:dialogue system,topic-to-essay generation,and code generation Generative Pre-trained Transformer(GPT)State-of-the-art model in natural language processing that scale up to 175B parameters High-quality text generation and remarkable inference accuracy
3、for benchmarks(e.g.,LAMBADA)2 of 14isHello,my nameInput TokensJames SmithandOutput Tokens.LanguageModelLanguageModelLanguageModel.Generation StageSummarization Stage.LanguageModelGPTDecoder LayerDecoder LayerDecoder LayerDecoder LayerLanguageModelHOTCHIPS22Transformer-based Text Generation Transform
4、er-based text generation consists of summarization and generation stages Summarization stage:process with given input words from a user Generation stage:sequentially produce output words by language model3 of 14LM headTokenEmbeddingLM headTokenEmbeddingLM headTokenEmbeddingPositional EncodingGenerat
5、ion StageisJamesSmithHello,my name.Summarization StageDecoder Layer 1Decoder Layer 1Decoder Layer 1Fully-ConnectedGELUFully-ConnectedFully-ConnectedLayerNormSoftmaxConcat K,VInput Tokens:Output Tokens:+Feed-Forward NetworkFeed-Forward NetworkFeed-Forward NetworkSelf-AttentionSelf-AttentionSelf-Atten
6、tionResidualLayerNormLayerNormResidualResidualLayerNormLayerNormResidualResidualLayerNormLayerNormResidualDecoder Layer 2Decoder Layer NDecoder Layer 2Decoder Layer NDecoder Layer 2Decoder Layer NVectorVectorMatrixMatrixID Emb Vectorvec00vec11.vecnnWTEToken IDVectorVectorVectorVectorVectorVectorVect