《通过在 IREE 中启用 RISC-V 微内核支持来加速 GenAI 工作负载.pdf》由会员分享,可在线阅读,更多相关《通过在 IREE 中启用 RISC-V 微内核支持来加速 GenAI 工作负载.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREEAdeel Ahmad,Ahmad Tameem,Nouman Amir,Bilal Zafar,Saad Bin Nasir10 xEngineersOutlineGenerative AI workloadsIREE compilation with custom microkernels(ukernels)Custom RISC-V matrix multiplication ukernels-implementationKernel-an
2、d model-level resultsSummary2Generative AI WorkloadsConversational LLMsGenerative AI workloads are dominated by transformer-based auto-regressive large language models(LLMs)text/image/code generation,chatbots,content writing,video generation and other common uses-cases heavily employ LLMsMatrix-matr
3、ix and matrix-vector multiplications dominate these workloadsSource:Chatgpt3IREE Compilation with Custom KernelsOpen-source direct code generation MLIR-based compiler and runtimeHost/device programming model with multiple target architectures through a hardware abstraction layer(HAL)stack is mostly
4、architecture agnostic step towards heterogeneous compilationHost does scheduling,vm-bytecode for runtime portabilityDevice-side codegen;Upstream IREE has RVV codegen through LLVMMicrokernelsIntended to prevent the dichotomy between compiler and kernelsperform arithmetic but no memory allocationstand
5、alone development and unit testing in C leads to quicker development4Matrix Multiplication ukernel(mmt4d)Compilation in IREEFor x86_64 and ARM64 architectures,IREE leverages linalg dialects mmt4d op for matrix multiplicationmmt4d op is meticulously optimized to exploit hardware-specific vector instr
6、uctions and cache hierarchiesMaterializeHostEncodingPassCPULowerToUKernelsPassLowerUKernelOpsToCallsPass+Only relevant parts of MLIR and pass pipeline are shownmatmul pack+mmt4d+unpackmmt4d iree_uk_mmt4d ukernel call ConvertToLLVMPassmatmul.mlirPrecompiled ukernel bitcodeukernel_bitcode_*.bcStatic l