查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于HDFS的云存储平台在企业信息管理系统中的应用研究
姓名：	牛茜
学号：	201208381
学科代码：	0835
学科名称：	软件工程
学生类型：	工程硕士
学位年度：	2015
院系：	计算机科学与技术学院
专业：	软件工程
研究方向：	企业信息化
第一导师姓名：	牟琦
第一导师单位：	西安科技大学
第二导师姓名：	吕俊明
论文外文题名：	Based on HDFS application of the enterprise information system in cloud storage platform
论文中文关键词：	小文件 ; 二级索引机制 ; HDFS ; 云存储
论文外文关键词：	cloud storage ; HDFS ; small file ; secondary index proposed mechanism
论文中文摘要：	︿伴随着现代科技的发展，各种信息呈现出爆炸是级数增长的方式。普通的存储体系的结构与方式已不再适应现代煤炭企业的爆炸式增长数据存储的需求，基于云计算的云存储系统便应运而生。HDFS(Hadoop Distributed FileSystem) 作为Hadoop技术框架的一部分，被称为是分布式文件存储系统。目前，国内外很多大型企业都利用HDFS来进行海量数据的存储与管理，HDFS体系结构构造之初是为了系统存储占用空间较大的文件而进行开发的，但随着HDFS分布式文件应用范围越来越广，在某些应用环境中，存在大量的小文件，会导致分布式文件系统便出现存储瓶颈，如何高效处理这样类型的文件成为一个亟待解决的问题。本文针对HDFS存储小文件的问题展开了研究，对HDFS存储前的小文件处理工作和存储后的检索分别提出了改进方法，并将其应用于煤炭企业的云存储平台。首先，在原有HDFS存储结构上添加小文件处理单元，目的在于对小文件进行判断和合并处理，小文件的索引和内容以追加写的方式存入合并文件，解决了大量小文件零散存储带来的浪费空间问题。其次，在改进后的存储结构上，提出二级索引机制，将合并索引与合并文件同时存储在数字节点上，仅用名字节点上的一条元数据记录合并文件中的小文件的文件名信息，用逐级索引解析的方式进行小文件的查找定位，节约了名字节点内存，提高了访问效率。最后，开发了以基于HDFS的云存储平台，并以煤炭企业生产与统计系统为例，详细阐述了HDFS云存储平台在企业信息管理系统中的应用。本系统采用Hadoop 0.20.1作为开发环境，1台名字节点和3台数字节点作为模拟平台进行改进后的小文件存储系统的性能测试试验，分别从内存消耗、小文件读取时间、小文件写入时间方面进行测试，取得了较好的效果。﹀
论文外文摘要：	︿ With the continuous development of science and technology, digital information is showing explosive growth. Traditional storage methods have been unable to meet the current demand for coal mass data storage, cloud storage, cloud-based systems have come into being. HDFS (Hadoop Distributed FileSystem) is Hadoop distributed file storage system, at present, many large enterprises at home and abroad to take advantage of HDFS to store and manage the vast amounts of data, HDFS beginning of the design is to store large file systems design and development, but with the HDFS storage systems increasingly wide range of applications, shortcomings and deficiencies of its existence gradually exposed, how to efficiently process and store small files become an urgent problem. This paper studies the problem of small files stored in HDFS and the development of HDFS-based enterprise cloud storage platform. First, this paper presents the architecture improvements, adding a small file processing unit in the original HDFS storage structure, aimed for small files and merge judge handling, indexing and content writing small files to append a way to merge files stored solve a large number of small files scattered storage space problems caused by waste. Secondly, in the storage structure improved, secondary index proposed mechanism, the index will be merged with the merge file simultaneously on DataNode, only one metadata records on the merged file NameNode small file name information is stored, used by level index to find a way to resolve the positioning of small files, saving NameNode memory, improve access efficiency. Finally, the development of coal enterprise cloud storage platform construction process, for example, described in detail the application of enterprise information technology platform in HDFS cloud storage platform. This article uses Hadoop 0.20.1 and Performance Test small file storage system of Eclipse as a development environment, a desk and three DataNode NameNode node node as a simulation platform improved, respectively, from the memory consumption, small file read time, small files write timing of the test, and achieved good results. ﹀
中图分类号：	TP393.09；TP311.52
开放日期：	2015-06-16

附件下载