网络爬虫多特征的恶意网页检测方法

摘要随着各类通讯网络和通讯终端的快速发展和普及，现代人对网络的依赖越来越强，但是互联网在给人们带来极大便利的同时，也带来了安全风险。随着网络技术的进步，恶意网站的伪装手段也越来越强，隐蔽性越来越高。一些网络攻击者会利用网络漏洞，将网站变为恶意网站，当人们登录浏览此网站时，计算器就很有可能不被察觉地被入侵，而导致系统的崩溃。

本文通过网络爬虫抓取网页，并利用正则表达式匹配出我们所需要的正常脚本和恶意脚本,即Javascript代码段，然后提取出来，保存在文件中。然后利用已知的恶意代码和正常代码，通过对比总结特征并提取特征，总结出特征并存为文件作为训练分类器的输入，通过分类器训练不同类别的恶意脚本代码得到对应的分类器模型，最后通过分类器测试验证模型的可用性和准确率。编程实现恶意网页检测功能。82738

毕业论文关键字恶意脚本 Javascript 网络爬虫分类器特征提取

毕业设计说明书（论文）外文摘要

Title Malicious Web page detection method based on multi-feature

Abstract With the rapid development and popularization of types of communi -cations networks terminals,dependence on the network is growing。 Internet has brought great convenience to people , it also brings security risks。 With advances in network technology, means of camouflage malicious Web site also growing secluded。 Some attackers will exploit network vulnerabilities, malicious Web sites into a site, when people log on to this website, the calculator is likely not to be aware of the invasion, which led to the collapse of the system。

In this paper, through the web crawler crawls the web and use regular expressions to match that we need normal scripts and malicious scripts that Javascript snippets, and then extracted and stored in a file。 Then use known malicious code and the normal code, by comparing the summary feature and feature extraction, feature summed up as a training document and save it as a classifier input, through the classifier training with different malicious script code to obtain the corresponding model, and finally by classification test to verify the availability and accuracy of the model。 Programming a malicious Web page detection。

Keywords: malicious script, Javascript, Web Crawler, classification, feature extraction

1 引言 1

1。1 课题背景与意义 1

1。2 恶意网页相关检测 2

1。3本文组织结构 4

2 网页恶意脚本 5

2。1恶意网页脚本 5

2。2脚本语言 6

3 样本特征选择和提取 9

3。1 数据获取 9

3。2样本特征选择和提取 12

4 分类器训练和测试 15

4。1 分类器介绍 15

4。2 分类器的训练 18

4。3分类器的测试及结果 18

结论 21

参考文献 22

1 引言

随着各类通讯网络和通讯终端的快速发展和普及，现代人对网络的依赖越来越强，与此同时，形形色色的恶意网站层出不穷，根据谷歌搜索中心的数据，超过10%的网页是恶意网页。尤其中国，恶意网页占总体网页中的比例已经高达43。21%，因此网络安全形势愈发严峻[1]。恶意网页检测是十分具有现实意义的课题，网络技术的不断进步，恶意网站进行伪装的手段也越来越强，隐蔽性越来越高，因此恶意网站检测方法也必须与时俱进才能应对互联网的快速发展。本课题通过从各个层面较全面的分析恶意网站的特点，从而实现能适应当今网络形势的恶意网站检测方法。网络爬虫多特征的恶意网页检测方法:http://www.youerw.com/jisuanji/lunwen_97168.html