[10-论文笔记][03] MS MARCO数据集整理
作者:互联网
MS MARCO数据集整理
论文地址:https://arxiv.org/pdf/1611.09268.pdf. NIPS2016
相关介绍:
- 2016|重磅 | 微软发布数据集MS MARCO,打造阅读理解领域的「ImageNet」
- 10W question dataset
- NLG
- passage ranking
- keyphrase extraction
- conversion search
任务1: Document Retrieval(2020/11/08-现在) 文档检索任务
Based the questions in the Question Answering Dataset(原始MRC数据集) and the documents which answered the questions a document ranking task was formulated. There are 3.2 million documents and the goal is to rank based on their relevance. 基于MRC任务进一步构建 query, 网页回答排序任务,基于相关性, 320W 网页检索
Relevance labels are derived from what passages was marked as having the answer in the QnA dataset making this one of the largest relevance datasets ever. 相关性标签来源:QnA数据集; 具体见MS MARCO网站介绍;
This dataset is the focus of the 2020 and 2019 TREC Deep Learning Track and has been used as a teaching aid for ACM SIGIR/SIGKDD AFIRM Summer School on Machine Learning for Data Mining and Search. 数据集在竞赛/会议中使用;
任务2:
标签:10,MARCO,questions,03,dataset,任务,MS,数据 来源: https://www.cnblogs.com/NSEW/p/16328009.html