Research and construction of web crawler based forest management knowledge collection system

LIU Jiancheng; WU Baoguo; CHEN Dong

doi:10.11833/j.issn.2095-0756.2017.04.022

Volume 34 Issue 4

Jul. 2017

Turn off MathJax

Article Contents

Article Navigation > Journal of Zhejiang A&F University > 2017 > 34(4): 743-750

LIU Jiancheng, WU Baoguo, CHEN Dong. Research and construction of web crawler based forest management knowledge collection system[J]. Journal of Zhejiang A&F University, 2017, 34(4): 743-750. doi: 10.11833/j.issn.2095-0756.2017.04.022

Citation:

LIU Jiancheng, WU Baoguo, CHEN Dong. Research and construction of web crawler based forest management knowledge collection system[J]. Journal of Zhejiang A&F University, 2017, 34(4): 743-750. doi: 10.11833/j.issn.2095-0756.2017.04.022

Research and construction of web crawler based forest management knowledge collection system

doi: 10.11833/j.issn.2095-0756.2017.04.022

School of Information Science and Technology, Beijing Forestry University, Beijing 100083, China

Received Date: 2016-07-14
Rev Recd Date: 2016-11-04
Publish Date: 2017-08-20

Abstract

Accurate Internet access to forest management information can be obtained through the construction of a data collection system for forest management. Based on an analysis of the data collection, system process, system module and database were designed, rules governing web crawlers were improved and delimited, and workflow and algorithm of web crawlers were explored. This system summarized and analyzed the characteristics observed from webpages featuring forest management, and served to identify those collected data contents with an eigenvector of forest management. Information about forest management was also denoised by this system; information was extracted through intelligence match, and repeated information about forest management was eliminated through fingerprint recognition by Euclidean distance. The experiment results indicated that this data collection system for forest management featured high subject relevance, high accuracy, and low repetition rate. Therefore, it can satisfy the need of the forest management decision support system.
- forest management,
- forest management knowledge,
- knowledge base,
- knowledge collection,
- web crawler

Relative Article

[1]	ZHANG Yu, CHEN Cunyou, HU Xijun. Evaluation of forest ecological function based on projection pursuit classification . Journal of Zhejiang A&F University, 2020, 37(2): 243-250. doi: 10.11833/j.issn.2095-0756.2020.02.007
[2]	HONG Minghui, HU Chenpei, GU Lei, ZHANG Xue, BAO Jie. Households' willingness in participating forest management of carbon sequestration trading and the related influencing factors under the REDD+ . Journal of Zhejiang A&F University, 2017, 34(2): 207-214. doi: 10.11833/j.issn.2095-0756.2017.02.002
[3]	WANG Jianming, WU Baoguo. Research on technology of forest subcompartments management plan assistant decision . Journal of Zhejiang A&F University, 2017, 34(4): 730-736. doi: 10.11833/j.issn.2095-0756.2017.04.020
[4]	CHEN Dong, WU Baoguo, LIU Jiancheng, LU Yuanchang. Design and implementation of forest management knowledge service system based on frame representation . Journal of Zhejiang A&F University, 2017, 34(3): 491-500. doi: 10.11833/j.issn.2095-0756.2017.03.015
[5]	LAI Chao, FANG Luming, LI Ji, ZHOU Changhe. Design and implementation of an integrated forest resources information system . Journal of Zhejiang A&F University, 2015, 32(6): 890-896. doi: 10.11833/j.issn.2095-0756.2015.06.010
[6]	LUO Xian-Xian. Straight-line intersect sampling with comprehensive forest resource monitoring . Journal of Zhejiang A&F University, 2012, 29(4): 566-573. doi: 10.11833/j.issn.2095-0756.2012.04.013
[7]	HUANG Shui-sheng, XIE Yang-sheng, TANG Xiao-ming, WANG Jin-zeng. Research and implementation of cooperative information system for forest and greening inventory in Beijing . Journal of Zhejiang A&F University, 2011, 28(6): 884-892. doi: 10.11833/j.issn.2095-0756.2011.06.008
[8]	WANG Yi-xiang, CHEN Yong-gang, TANG Meng-ping, HONG Min, CHEN Hai-feng, CHEN De-hu. A plugin analysis system for forest spatial structure based on GIS and .NET . Journal of Zhejiang A&F University, 2011, 28(5): 720-726. doi: 10.11833/j.issn.2095-0756.2011.05.006
[9]	SUN Meng-jun, XU Jun. Division of high conservation value forest based on forest management of county level . Journal of Zhejiang A&F University, 2011, 28(6): 878-883. doi: 10.11833/j.issn.2095-0756.2011.06.007
[10]	WANG Xue, BAI Jiang-li, LI Chen, LIN Pei-yan. Construction criteria framework of forest resources management information system . Journal of Zhejiang A&F University, 2010, 27(1): 116-120. doi: 10.11833/j.issn.2095-0756.2010.01.019
[11]	WEI Xin-liang. Quantitative evaluation of rural forest-ecological adaptability . Journal of Zhejiang A&F University, 2009, 26(1): 1-6.
[12]	ZHANG Zhi-jie, YI Li-ta, HAN Hai-rong, YUAN Wei-gao. Study on forest carrying capacity in Zhejiang Province . Journal of Zhejiang A&F University, 2009, 26(3): 368-374.
[13]	ZHANG Mao-zhen, TANG Xiao-ming, XIE Yang-sheng, DING Li-xia. Analysis of query efficiency of forest resources database system . Journal of Zhejiang A&F University, 2009, 26(2): 149-154.
[14]	ZENG -wei, LI Guang-hui, HU Hai-gen, TANG Jian-feng. Design and implementation of personal digital assistant （PDA） based information collection system for forest resources . Journal of Zhejiang A&F University, 2009, 26(1): 111-115.
[15]	HUANG Chu-dong, SHAO Yun, LI Jing, LIU Jing-hui, CHEN Jie-qiong. Urban forest research based on regression tree techniques and ASTER imagery . Journal of Zhejiang A&F University, 2008, 25(2): 240-244.
[16]	LUO Xian-xian, KANG Xin-gang. Progress in research on the comprehensive monitoring of forest resources . Journal of Zhejiang A&F University, 2008, 25(6): 803-809.
[17]	BAI Jiang-li, PENG Dao-li, YANG Fu-ning. Information classification and code of forest resources . Journal of Zhejiang A&F University, 2007, 24(3): 326-330.
[18]	GE Wen-ning. Alteration of property right of state-owned forest in Zhejiang Province . Journal of Zhejiang A&F University, 2006, 23(3): 338-341.
[19]	LIU An-xing. Design of dynamic forest resources monitoring system of Zhejiang Province . Journal of Zhejiang A&F University, 2005, 22(4): 449-453.
[20]	CAI Liang-liang, CAI Xia, ZHU Hong-wei. Problems and solutions in the implementation of forest resources dynamic information system at the county level . Journal of Zhejiang A&F University, 2004, 21(2): 228-230.

References

[1]	WU Baoguo, LI Chengzan, MA Chi, et al. An expert decision support system for silviculture [J]. J Beijing For Univ, 2009, 31(supp 2): 1-8.
[2]	ZHANG Jianhui. Application of professional intelligent search system in veterinary medicine [J]. J Northeast Agric Univ, 2009, 40(9): 141-144.
[3]	SHEN Jin. Study and implementation of forest vertical search engine based on Lucene and Nutch [J]. Agric Network Inf, 2008(4): 16-18.
[4]	YUAN Jinsheng, GUO Yanfen. Algorithm research and design of forestry focused web crawler [J]. Comput Eng Des, 2011, 32(6): 2003-2006.
[5]	ZHANG Lisha, ZHANG Gui, LONG Chaoxi, et al. Search and integration of thematic dynamic information on forestry [J]. J Cent South Univ For Technol, 2013, 33(5): 47-51.
[6]	LI Jia, XU Qian, WANG Zi, et al. Forest products trading Web messages extraction algorithm based on semantic [J]. Comput Eng Appl, 2014, 50(19): 199-204.
[7]	DENG Houping, WU Gang. Discovery of topic-specific information source based on web crawler and website classification [J]. Comput Eng Appl, 2016, 52(3): 59-65.
[8]	LIU Jinhong, LU Yuliang. Survey on topic-focused Web crawler [J]. Appl Res Comput, 2007, 24(10): 26-29.
[9]	WANG Juan, WU Jinpeng. The design and implementation of Web crawler [J]. Software Guide, 2012, 11(4): 136-137.
[10]	GONG Bingjiang, HUANG Yanxin, JIA Haixin. Studying and designing topic crawler for mining equipments field [J]. Comput Appl Software, 2014, 31(11): 122-124.
[11]	DING Baoqiong, XIE Yuanping, WU Qiong. Noise elimination method in Web page based on improved DOM tree [J]. J Comput Appl, 2009, 29(supp 1): 175-177.
[12]	JIN Yuefu, FAN Jianying, FENG Yang. Design and realization of distributed Web crawler [J]. J Harbin Univ Sci Technol, 2010, 15(1): 116-119.
[13]	QIN Jie, YAN Fuliang, ZHU Haifeng, et al. A webpage classification algorithm based on link information [J]. Microelectron Comput, 2012, 29(6): 108-112.

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(7) / Tables(1)

Get Citation

PDF

XML

Article views(3384) PDF downloads(400) Cited by()

Proportional views

HTML

决策支持系统^[1]处理问题能力由知识库的知识丰富度决定，如何提升知识丰富度是一个难题。通过网络爬虫采集信息，识别其中的森林经营知识，并进行评价、提取、去重，可以解决这一问题。传统的搜索引擎有强大的网络爬虫，覆盖面广，但分类专业性较差，信息搜索结果不尽如人意^[2]，不能准确理解林业词汇。以林业常用名词“小班”为例，百度检索出来的结果绝大多数是幼儿园小班有关的结果，不能满足林业用户的信息检索需求。林业关于信息采集的研究大部分集中在林业主题搜索引擎的研究上，重点研究林业主题搜索引擎的设计、主题爬虫算法、信息源发现方法等算法优化问题^[3-7]，但对森林经营知识识别、提取等涉及较少。作者通过对主要的森林经营网站进行分析，设计了森林经营知识采集系统的基本工作流程、系统功能模块和数据库，改进了网络爬虫规则，研究森林经营主题爬虫算法、森林经营网页去噪、森林经营知识智能匹配、森林经营知识去重等。

5. 结论与讨论

知识丰富度决定了决策支持系统的问题处理能力。本研究研建的森林经营知识采集系统解决了在互联网上获取森林经营知识的问题，提升了森林经营决策支持系统的知识丰富度。

本研究在分析森林经营知识采集问题的基础上，建立林业专有词库，改进网络爬虫规则，并利用森林经营主题爬虫算法、森林经营网页去噪、森林经营知识智能匹配、森林经营知识去重等技术，设计并实现了森林经营知识采集系统。本研究分析了森林经营主题网站的特点，建立了森林经营特征向量对采集内容进行过滤，使用欧氏距离进行森林经营知识指纹识别，获得了高相关度、高准确率、低重复度的森林经营知识。

该系统已应用在国家高技术研究发展计划项目“数字化森林与牧场经营管理关键技术研究”中，长期为森林经营决策支持系统提供知识采集服务。

Reference (13)

[1]	吴保国, 李成赞, 马驰. 森林培育专家决策支持系统的研究[J]. 北京林业大学学报, 2009, 31(supp 2): 1-8.	WU Baoguo, LI Chengzan, MA Chi. An expert decision support system for silviculture[J]. J Beijing For Univ, 2009, 31(supp 2): 1-8.
[2]	张戬慧. 专业智能搜索系统在动物医学领域中的应用[J]. 东北农业大学学报, 2009, 40(9): 141-144.	ZHANG Jianhui. Application of professional intelligent search system in veterinary medicine[J]. J Northeast Agric Univ, 2009, 40(9): 141-144.
[3]	申晋. 基于Lucene和Nutch的林业垂直搜索引擎的研建[J]. 农业网络信息, 2008, (4): 16-18.	SHEN Jin. Study and implementation of forest vertical search engine based on Lucene and Nutch[J]. Agric Network Inf, 2008, (4): 16-18.
[4]	袁津生, 郭艳芬. 林业主题爬虫的算法研究与设计[J]. 计算机工程与设计, 2011, 32(6): 2003-2006.	YUAN Jinsheng, GUO Yanfen. Algorithm research and design of forestry focused web crawler[J]. Comput Eng Des, 2011, 32(6): 2003-2006.
[5]	张丽莎, 张贵, 龙朝夕. 林业专题动态信息的搜索与集成[J]. 中南林业科技大学学报, 2013, 33(5): 47-51.	ZHANG Lisha, ZHANG Gui, LONG Chaoxi. Search and integration of thematic dynamic information on forestry[J]. J Cent South Univ For Technol, 2013, 33(5): 47-51.
[6]	李嘉, 徐前, 王梓. 基于语义的林产品贸易Web信息抽取算法[J]. 计算机工程与应用, 2014, 50(19): 199-204. doi: 10.3778/j.issn.1002-8331.1212-0140	LI Jia, XU Qian, WANG Zi. Forest products trading Web messages extraction algorithm based on semantic[J]. Comput Eng Appl, 2014, 50(19): 199-204. doi: 10.3778/j.issn.1002-8331.1212-0140
[7]	邓厚平, 武刚. 基于爬虫和网站分类的主题信息源发现方法[J]. 计算机工程与应用, 2016, 52(3): 59-65.	DENG Houping, WU Gang. Discovery of topic-specific information source based on web crawler and website classification[J]. Comput Eng Appl, 2016, 52(3): 59-65.
[8]	刘金红, 陆余良. 主题网络爬虫研究综述[J]. 计算机应用研究, 2007, 24(10): 26-29. doi: 10.3969/j.issn.1001-3695.2007.10.007	LIU Jinhong, LU Yuliang. Survey on topic-focused Web crawler[J]. Appl Res Comput, 2007, 24(10): 26-29. doi: 10.3969/j.issn.1001-3695.2007.10.007
[9]	王娟, 吴金鹏. 网络爬虫的设计与实现[J]. 软件导刊, 2012, 11(4): 136-137.	WANG Juan, WU Jinpeng. The design and implementation of Web crawler[J]. Software Guide, 2012, 11(4): 136-137.
[10]	龚炳江, 黄彦欣, 贾海鑫. 矿山设备领域主题爬虫研究与设计[J]. 计算机应用与软件, 2014, 31(11): 122-124. doi: 10.3969/j.issn.1000-386x.2014.11.030	GONG Bingjiang, HUANG Yanxin, JIA Haixin. Studying and designing topic crawler for mining equipments field[J]. Comput Appl Software, 2014, 31(11): 122-124. doi: 10.3969/j.issn.1000-386x.2014.11.030
[11]	丁宝琼, 谢远平, 吴琼. 基于改进DOM树的网页去噪声方法[J]. 计算机应用, 2009, 29(supp 1): 175-177.	DING Baoqiong, XIE Yuanping, WU Qiong. Noise elimination method in Web page based on improved DOM tree[J]. J Comput Appl, 2009, 29(supp 1): 175-177.
[12]	金岳富, 范剑英, 冯扬. 分布式Web信息采集系统的设计与实现[J]. 哈尔滨理工大学学报, 2010, 15(1): 116-119.	JIN Yuefu, FAN Jianying, FENG Yang. Design and realization of distributed Web crawler[J]. J Harbin Univ Sci Technol, 2010, 15(1): 116-119.
[13]	秦杰, 闫付亮, 朱海丰. 基于链接信息的网页分类算法[J]. 微电子学与计算机, 2012, 29(6): 108-112.	QIN Jie, YAN Fuliang, ZHU Haifeng. A webpage classification algorithm based on link information[J]. Microelectron Comput, 2012, 29(6): 108-112.

方式	抓取链接数/个	保存链接数/个	符合主题数/个	符合主题数占抓取链接百分比/%
改进爬虫	12 785	6 377	6 377	49.87
普通爬虫	24 543	21 312	4 523	18.43

Research and construction of web crawler based forest management knowledge collection system

doi: 10.11833/j.issn.2095-0756.2017.04.022

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Related

Proportional views