当前位置:首页 > 报告详情

Web Intelligence Hub (WIH) 的数据采集服务 (DAS) 的现状.pdf

上传人: Fl****zo 编号:718578 2025-06-22 11页 906.63KB

1、2025/02/04State of play of the Data Acquisition Service(DAS)of the Web Intelligence Hub(WIH)Mszros MtysWeb Intelligence Network Conference-From Web to Data Gdansk,4-5 February2025Web Intelligence Network Conference-From Web to Data GdanskWIH platform componentsOnline Job Advertisements Data Producti

2、on System(OJA-DPS)DatalabWIH Data Acquisition Service(DAS)DAS development principlesScalableBuild on open-source toolsTry to use the state of the artCan handle static and dynamic contentNo coding,only configuration by the userUniversal,can be used for several use cases(OJA,MNE,price,etc.)Separation

3、of use cases with possible collaboration in the same use caseThe beginning of the DASVersion 1 was released in 2021 NovGeneric data acquisition service with API access for static and dynamic web pagesUsing StormCrawler and SeleniumEUi frontend(Dashboard)with user authentication(EU Login and AWS Cogn

4、ito)Deployment using Infrastructure as Code(IaC)Version 2 was released in 2022 AprAdding the playground for data acquisition to test filters and dynamic web pagesVersion 3 was released in 2022 SepAdditional Selenium filters based on the needs of tourism websitesDashboard:EUi frontend to manage the A

5、PIsDAS:SpringBootAPI with StormCrawlerand Selenium in the backgroundPlayground:SpringBoot with Selenium in the backgroundFirst testing by the WINBased on the feedbackVersion 4 was released in 2022 DecMultitenancy was introducedMoving authentication from AWS Cognito to KeycloakVersion 5 was released

6、in 2023 Feb1stSecurity testing and update Adding new functionalities like advanced search,copy configuration and acquisition action historyVersion 6 was released in 2023 MarIntroduction of user roles(guest,developer and admin)Possibility to use sitemap discove

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要介绍了Web Intelligence Hub(WIH)的数据获取服务(DAS)的发展状况。核心数据包括: 1. DAS遵循可扩展、基于开源工具、使用最新技术、处理静态和动态内容等开发原则。 2. DAS自2021年11月发布以来,经历多个版本的迭代,最新版本为2025年1月发布的Version 9。 关键点分条如下: - DAS提供通用数据获取服务,支持API访问静态和动态网页。 - 使用StormCrawler和Selenium技术,具备前端管理界面和用户认证功能。 - 不断进行安全性测试和更新,引入多租户、用户角色、URL队列和增量爬取等功能。 - 目前正在进行第四次安全性测试,计划将OJA-DPS爬虫迁移至DAS,并更新相关组件。 综上所述,DAS作为一个通用、可扩展的数据获取服务,持续优化和升级,以满足不同使用场景的需求。
"DAS版本9有哪些新特性?" "如何使用DAS进行高效数据抓取?" "Web Intelligence Hub未来发展方向?"
客服
商务合作
小程序
服务号
折叠