当前位置：首页 > 报告详情

基于网络内容的统计：未来的挑战.pdf

上传人： Fl****zo 编号：718604 2025-06-22 PDF PDF 10页 168.43KB

该报告所属合集： 2025年Web智能网络会议-从Web到数据嘉宾演讲PPT合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/10

立即下载

《基于网络内容的统计：未来的挑战.pdf》由会员分享，可在线阅读，更多相关《基于网络内容的统计：未来的挑战.pdf（10页珍藏版）》请在三个皮匠报告上搜索。

1、31/01/2025Web Content-Based Statistics:The Challenges AheadFernando REISWeb Intelligence Network Conference-From Web to Data Gdansk,4-5 February 2025Challenges OverviewInstability of the WebDuplication of objectsAutomatic information extractionFakery and misinformationRepresentativenessInstability o

2、f the WebWebsites appear,disappear,or changeDowntime and access restrictionsImpact on continuity and time series consistencyIts unavoidableWe need methods to address this instabilityE.g.Chaining Promissing,but we need to address breakdownsDuplication of ObjectsA curse and a blessingDuplicates lead t

3、o over-estimation of totalsRedundancy across websites,reduces impact of instability of the webDuplication happens across websites and within websitesPossible solutions:Restrict the web sources:eliminates the curse,but also the blessingIncrease the effectiveness of the deduplicationSurveys on web sou

4、rces owners and statistical units(enterprises,individuals)Automatic Information ExtractionNeed for automated methods(NLP,AI)Human annotation/labelling is very expensivePrecision of latest AI developments(LLM)put algorithms at par with humansTrade-off between cost and precision of AIMeasurement error

5、s introduced by algorithms bias our statisticsWe must be able to measure the precision of the algorithmsSolution(s):We urgently need gold standards/test datasets to estimate precision using LLMsFakery and misinformationHow fakery differs from noise biasIntentional distortions targeting key variables

6、Not much work done in official statisticsSolutions:Source validation and trustworthiness assessmentDetection using AICross-validation with other data sourcesHuman expert oversight&hybrid approachesRepresentativenessCoverage and selectivityBias in web-based dat

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

本文主要讨论网络内容为基础的统计学面临的挑战。关键点如下： 1. 网络不稳定性：网站出现、消失或更改，导致数据连续性和时间序列一致性受影响。需发展方法应对，如链接技术。 2. 对象重复：重复数据既带来过估计问题，也因网站间的冗余减轻了网络不稳定性的影响。解决方案包括限制数据源和提升去重效果。 3. 自动信息提取：依赖自动化方法（如自然语言处理和人工智能），但算法引入的测量误差会影响统计数据准确性。需制定黄金标准/测试数据集以评估算法精确度。 4. 伪造和误信息：有意扭曲关键变量，与噪声不同，目前官方统计研究不足。解决方法包括来源验证、AI检测和交叉验证。 5. 代表性：网络数据的覆盖和选择性导致偏差。需采用纠正选择性的估计方法。文章强调跨学科合作和基础设施及专业知识投资对未来发展至关重要。

"网页不稳定性如何影响统计？" "如何利用AI应对网络虚假信息？" "网络数据代表性面临哪些挑战？"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询商务合作事宜

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙思想领动信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备2023027541号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠