当前位置:首页 > 报告详情

使用类似编码器的大型语言模型进行在线招聘广告分类.pdf

上传人: Fl****zo 编号:718584 2025-06-22 17页 553.38KB

1、ONLINE JOB ADVERTISEMENTS CLASSIFICATION USING ENCODER-LIKE LARGE LANGAUGE MODELMIKOAJ TYM,JAKUB EREBECKIWEB INTELLIGENCE CHALLENGEEUROPEAN STATISTICS AWARDSWeb intelligence classification challenge Challenge was announced by European statistics awards Each team could submit 10 submissions which con

2、tain classification of online job advertisements occupations Predicted classes were evaluated by Lowest Common Ancestor metric Competitors must provide fully documented scripts in R or Python Approaches were evaluated not only for accuracy but also reusability,so they should be scalable and openThe

3、International Standard Classification of Occupations Four-level classification of occupation groups managed by the International Labour Organisation There are 436 occupation classes Despite of some of the classes are strongly semantically related,they occur in different ISCO tree branches Accountant

4、s Professionals Accounting and bookkeeping clerks Clerical support workers LCA metric heavily penalizes such mistakesDataset The competition dataset contains 26,000 multilingual online job advertisements They were retrieved from around 400 websites active in the European Union These advertisements w

5、ere scrapped from the web,so they contain many irrelevant data GDPR clause HTML tags Job benefits Company policiesDUTY CLASSIFIERDATA PREPROCESSING STEP TO CLEAN JOB ADVERTISEMENTSJob offer example Advertisement contains many sections Not all of them are relevant in case of classification Employer o

6、verview is misleading The key part of job offer are requirements but the text which describes them is often shorter compared to other details1.Employer overview2.Requirements3.Benefits4.Equal employmentopportunity statementFiltering non-meaningful informations We have trained a m

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文介绍了使用编码器式大型语言模型对在线职位广告进行分类的研究。主要内容包括: 1. 挑战赛:由欧洲统计奖项宣布,要求团队提交职位广告职业分类,使用最低公共祖先度量评估预测类别。 2. 国际标准职业分类(ISCO):由国际劳工组织管理,包含436个职业类别。 3. 数据集:包含26,000个来自欧盟400个活跃网站的在线职位广告,需进行数据预处理以清理无关信息。 4. 数据预处理:训练模型区分包含工作要求的句子,精确过滤不相关信息。 5. 训练集:使用ESCO数据集创建含标签的训练样本,以解决原始数据集标签缺失和信息模糊问题。 6. 模型调优:对轻量级大型语言模型进行微调,以实现436个职业类别的分类,最佳LCA得分为0.58,人类基准为0.58,模型得分为0.52,Top-5准确度为0.8。 核心数据:436个职业类别,26,000个职位广告,0.58 LCA得分,0.8准确度得分。
"如何精准分类在线职位?" "数据清洗对职位分类有何影响?" "ESCO数据集在职位分类中扮演什么角色?"
客服
商务合作
小程序
服务号
折叠