《GPU 加速的数据处理在推荐系统中的应用.pdf》由会员分享,可在线阅读,更多相关《GPU 加速的数据处理在推荐系统中的应用.pdf(45页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIAGPU-ACCELERATED DATAPROCESSINGFOR RECOMMENDATION SYSTEMGTC-China魏英灿2020/12#page#主讲人介绍魏英灿GPU计算专家经验毕业于香港大学,研究领域包括深度学习域适应,生成对抗网络,推荐算法设计优化,在加入英伟达前,任职于欧美外资以及互联网等企业,拥有多年图像处理、数据挖掘,推荐系统设计开发经验。当前主要负责HugeCTR的算法设计架构工作。工作经历2020年加入英伟达半导体(上海)有限公司#page#AGENDAINTRODUCTIONDASK-RAPIDSNVTABULARSPARK3.0BENCHMARKC
2、ONCLUSION#page#INTRODUCTION#page#FLOW OF RECOMMENDATION SYSTEMDatasource1DataStructured1apowFeatureresultSourceData2DataSource3ModelFeatureInference713TrainingEngineering#page#REAL-TIME ETL ARCHITECTUREHBASELAelasticfhadoopMySQLV mongoDBSensorsDRSiteInteractiveexploration byDataMap ReduceMirror Make
3、rScientistsafkcSkafkaMobile DevicesAPPLOgS富SoarkstreamingReal-timeintelligenceattheNOC#page#FEATURE ENGINEERINGNonlinearMissingFlNormalizationSamplingOutlier&TruncationTransformationmimax,Mean,EasyEnsemble,Logarihm,Anomaly detectGaussian,Medianlance CascaddPolynomial30 cnteroQuantile,wode,NearMissIn
4、terpolationLogistiictransformaiion#page#SUMMARY OF ETL AND FEATURE ENGINEERINGFeature Engineering-Extract featureETL-Convert raw data to structured datafrom structed dataExtractFeature ExtractionsAggregate raw data from different sourcesExtract features from the feature spaceconstructed by structure
5、d dataTransformFeature SelectionData cleaning andvalidationFilter/Wrapper based on various operatorsGenerate structured dataFeature EvaluationLoadIteration based on model performanceLoad structured data into data warehouse orlake#page#HIERARCHY-RAPIDS/SPARK3.0/NVTABULAR/DASKGPUGPUGPUGPUGPUGPUGPURAPI
6、DSRAPIDSRAPIDSRAPIDSDASK广江SBark3.0NVTabular#page#DASK-RAPIDS#page#RAPIDSGPU-accelerated data science ecosystemA suite of open source software libraries and PyData-like APls to execute end-to-end datascience and analytics pipelines entirely on GPUS without paying typical serialization costsRAPIDS als