[1]陈淑巧,邱 东*,江海欢.一种基于词向量的模糊查询扩展方法[J].四川师范大学学报(自然科学版),2019,(01):92.[doi:10.3969/j.issn.1001-8395.2019.01.014]
 CHEN Shuqiao,QIU Dong,JIANG Haihuan.Query Expansion Based on Word Embedding in Fuzzy Document Retrieval[J].Journal of SichuanNormal University,2019,(01):92.[doi:10.3969/j.issn.1001-8395.2019.01.014]
点击复制

一种基于词向量的模糊查询扩展方法()
分享到:

《四川师范大学学报(自然科学版)》[ISSN:1001-8395/CN:51-1295/N]

卷:
期数:
2019年01期
页码:
92
栏目:
基础理论
出版日期:
2018-12-15

文章信息/Info

Title:
Query Expansion Based on Word Embedding in Fuzzy Document Retrieval
文章编号:
1001-8395(2019)01-0092-06
作者:
陈淑巧 邱 东* 江海欢
重庆邮电大学 理学院, 重庆 400065
Author(s):
CHEN Shuqiao QIU Dong JIANG Haihuan
College of Mathematics and Physics, Chongqing University of Posts and Telecommunications, Chongqing 400065
关键词:
词向量 模糊查询项扩展 信息检索
Keywords:
word embedding fuzzy query expansion information retrieval
分类号:
O159
DOI:
10.3969/j.issn.1001-8395.2019.01.014
文献标志码:
A
摘要:
在中文文本信息中,同一个语义往往有多种不同的表达方法,不同的个体对同一个词语理解也会有一定的偏差,这将导致在信息检索时,出现查询项与检索数据“词不匹配”的问题.虽然,模糊检索是改善这一问题的有效方法之一,但仅仅利用已知信息进行模糊检索,已不能满足充斥着大规模无标定文本信息的网络时代的检索需要.提出一个基于词向量的模糊检索查询扩展方法,通过词向量计算查询项的相似词,进而进行查询项扩展.相比与传统的模糊检索方法,在同一测试集中,基于词向量的模糊查询扩展方法测评出的查全率、查准率以及两者的调和平均数均得到了有效提升.
Abstract:
There are different ways to express the same word sense in Chinese.When different individuals learn and understand the same words, deviations will appear. This results in term mismatch between queries and documents. A fuzzy document retrieval system is one of the effective method to solve the problem. However, it can not achieve satisfying results, when we deal with large-scale unmarked data. An approach to query expansion based on word embedding in fuzzy document retrieval is proposed to settle the issue in this paper. The word embedding, being trained in a large number of corpus with the continuous bag-of-words model, is used to gain the similar word, and then the fuzzy query is expanded. Compared with the traditional fuzzy retrieval method, the recall ratio, precision ratio and the harmonic average of them are all increased.

参考文献/References:

[1] 王知津,郑红军. 基于集合理论的信息检索模型[J]. 情报科学,2004,22(11):1288-1291.
[2] 刘树林. 基于领域本体信息检索的研究及其实现[D]. 长春:东北师范大学,2009.
[3] YASUSHI O W, TETSUYA M T, KIYOHIKO K. A fuzzy document retrieval system using the keyword connection matrix and a learning method[J]. Fuzzy Sets and Systems,1991,39(2):163-179.
[4] MANDALA R, TOKUNAGA T, TANAKA H. Query expansion using heterogeneous thesauri[J]. Information Processing and Management,2000,36(3):361-378.
[5] 马晖男,吴江宁,潘东华. 一种基于同义词词典的模糊查询扩展方法[J]. 大连理工大学学报,2007,47(3):439-443.
[6] LIU Z, CHEN J, LI X, et al. Design and application for the model of semantic query expansion based on domain ontology[J]. International J Modelling Identification and Control,2012,16(3):277-284.
[7] BENGIO Y, DUCHARME R, VINCENT P, et al. A neural probabilistic language model[J]. Machine Learning Research,2003,3(2):1137-1155.
[8] MNIH A, GEOFFREY H. Three new graphical models for statistical language modelling[C]//Proceedings of the 24th international conference on Machine learning. Corvalis, Oregon:ACM,2007:641-648.
[9] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[C]//Proceedings of Workshop at International Conference on Learning Representations. Scottsdale, Arizona:ICLR,2013:1301-1378.
[10] 叶光辉. 基于词词关联矩阵改进的模糊检索研究[D]. 武汉:华中师范大学,2013.
[11] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed Representations of Words and Phrases and their Compositionality[J]. Advances in Neural Information Processing Systems,2013,26(3):3111-3119.
[12] COLLOBERT R, JASON W. A unified architecture for natural language processing: Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning. Helsinki, Finland: ACM, 2008:160-167.
[13] MNIH A, KAVUKCUOGLU K. Learning word embeddings efficiently with noise-contrastive estimation[J]. Advances in Neural Information Processing Systems,2013,28(2):2265-2273.
[14] 刘欣,席耀一,王波,等. WordNet和词向量相结合的句子检索方法[J]. 信息工程大学学报,2017,12(4):486-491.
[15] 邹益民,张智雄. 网络科技信息情报价值评价方法综述[J]. 情报杂志,2014,33(5):25-30.
[16] 黄敬瑜. 三维模型精确测地线及其若干应用[D]. 桂林:广西师范大学,2013.

备注/Memo

备注/Memo:
收稿时间:2018-06-14 接受时间:2018-09-07
基金项目:国家自然科学基金(11671001和61472056)
*通信作者简介:邱 东(1977—),男,教授,主要从事模糊分析及其在自然语言处理中的应用研究,E-mail:dongqiumath@163.com
更新日期/Last Update: 2018-12-15