1、ES 向量搜索及 AI Agents 开发刘晓国|Elastic刘晓国Elastic 中国社区首席布道师现为 Elastic 社区首席布道师。新加坡国立大学硕士,西北工业大学本硕。曾就职于新加坡科技,康柏电脑,通用汽车,爱立信,诺基亚,Linaro 非营利组织(Linux for ARM),Ubuntu,LinkMotion Future,Vantiq 等企业。从事过电脑设计,实时系统,手机,汽车电子,计算机操作系统,通信,云实时事件处理及大数据等行业。从爱立信开始,诺基亚,Ubuntu 到现在的 Elastic 从事社区工作有将近 20 年的经历。喜欢分享自己所学到的知识。坚信帮助别人就是帮
2、助自己!希望和大家一起学习,进步并分享所得。欢迎大家来参阅 Elastic 官方中文博客 目 录CONTENTSI.智能时代的搜索需求II.ES 向量搜索及最新进展III.智能体开发智能时代的搜索需求PART 01智能时代的搜索需求过去用户需求现在用户需求全文搜索结构化搜索聚合统计复杂混合搜索排序调优分词向量搜索向量和经典搜索的混合语义搜索模型重排序RAGHybridCombines Keyword+SemanticVectorSimilar itemsGenAIGenerates based on retrieved documentRetrieval Augmented Generatio
3、nKeywordCtrl+F搜索的范式通过文字搜索找到图片:覆盖雪的山峰通过图像比较找到相似的图片Elasticsearch:如何在 Elastic 中实现图片相似度搜索Elasticsearch:聊天机器人教程(一)Demo for image searchES 向量搜索及最新进展PART 02有两种向量模型SPARSEVectorToken Weighted PairsDENSEVector一长串数字,每个维度一个 数十万至数百万的标记词汇量 Token 加权对 Token:Weight 每个文档-仅存储 N 个最高权重的标记(其余为 0)通过 DotProduct 实现语义搜索 与密集向
4、量搜索相比,内存要求更低 稀疏模型可以实现“后期交互”在数据集上进行训练,以获得较高的“域内”性能 低维(312,512,1536,.)捕捉语义 对于相似性和聚类有用 多模式支持 Text Image Audio 较大的数据集占用大量内存 可解释性差Elastic 有两种向量模型SPARSEVector OutputELSERDENSEVector Outpute5-smallText InputAn image of the Death Star from Star Wars,a large,spherical space station with a gray,metallic surfa
5、ce.It has a distinct circular superlaser dish on its surface,making it instantly recognizable as a powerful weapon from the Galactic Empire0.0039048328,0.00070659374,-0.006999771,0.05141522,-0.005965463,-0.045049522,0.00019682708,0.062380187,0.0110530555,-0.014826156,0.026582375,-0.0076479157,0.0787
6、0574,-0.020013824,-0.015210986,-0.03071503,0.021925598,0.014036275,-0.020098573,0.0013701725,-0.02552182,-0.045320384,0.023408612,0.029272491,0.027291939,0.027002065,0.009618439,0.025841322,-0.03824202,-0.031804346,-0.005024673,0.019800879,0.014722629,0.016817614,0.0025832115,0.020656556,-0.01515848