- 无标题文档
查看论文信息

论文中文题名:

 基于Hadoop的电信大数据分析的设计与实现    

姓名:

 曹茜茜    

学号:

 201207304    

学科代码:

 081001    

学科名称:

 通信与信息系统    

学生类型:

 硕士    

学位年度:

 2015    

院系:

 通信与信息工程学院    

专业:

 通信与信息系统    

第一导师姓名:

 冀汶莉    

论文外文题名:

 Design and Implementation of Telecom Data Analysis Based on Hadoop    

论文中文关键词:

 大数据 ; Hadoop ; 电信流量数据    

论文外文关键词:

 Big Data ; Hadoop ; Telecommunication Traffic Data    

论文中文摘要:
2010国内移动互联网进入快速发展阶段,但随着互联网公司介入及基于应用商店模式的终端厂商的快速加入,电信运营商出现数据业务收入增速放缓的困境,面临被管道化的威胁。同时随着移动互联网应用的普及化,电信运营商存储的数据规模从GB级迈向TB级甚至PB级。在商业竞争中,通过数据分析辅助经营已成为有效的手段,但传统的数据分析架构已经不能适应这种海量数据处理和快速、深度挖掘的需求。Hadoop这种大数据处理框架为解决上述问题提供了一种新的思路。 在此背景下本系统的设计与实现作为陕西电信大数据平台建设项目的预研。课题通过构建处理电信数据的Hadoop平台系统,探索通过Hadoop平台的离线技术实现对每日百亿级数据的清洗、分析和挖掘的可行性;建立仿真的BI前端系统通过上述处理的数据进行套餐的分析,优化流量套餐设计;实现从访问、搜索、通话时长、短信使用量等行为构建了用户分析体系,多维度定位用户兴趣偏好并形成客户画像;为电信服务部门建立决策系统。 本文首先分析了Hadoop框架结构以及所用到的HDFS和MapReduce技术,然后对Hadoop平台上数据的采集,数据的存储进行阐述;重点分析了利用MapReduce技术进行并行计算的方法;将处理之后的数据存放在HDFS文件系统当中,并通过Sqoop组件将处理完的数据转存到关系型数据库中。本系统的前端BI设计采用J2EE的开发框架,并进行了详细的设计,在后台数据处理的基础上完成了流量监控功能、运营支撑功能、客户画像功能、决策支持功能,在决策支持功能的实现中采用了聚类算法。在实验室环境下进行开发环境的搭建配置,并分别进行大数据集的数据传输和Hadoop下的离线数据处理以及前端BI的展现的测试。系统运行正常有效,该预研表明基于Hadoop平台能够满足电信数据预处理和数据存储、数据分析的需求。
论文外文摘要:
The domestic mobile Internet has entered a stage of rapid development in 2010. But with the Internet company based intervention and terminal manufacturers to quickly join the application store model,telecom operators appear plight data revenue slowdown, the threat is pipelined. At the same time, along with the popularity of mobile Internet applications,telecom operators datasize from GB to TB or PB. In the commercial competition, run by the auxiliary data analysis has become an effective tool, but the traditional data analysis infrastructure can not meet the demand for such massive data processing and rapid, deep mining. This Hadoop data processing framework for solving the above problems and provides a new way of thinking. Design and implementation of this system as a platform for building large data Shaanxi Telecom pre-research project in this context.Construction of Hadoop system based on exploration,feasibility analysis and mining on the dailycleaning,ten billion data through offline Hadoop platform;Build BI system analysis simulation packages by the above process data traffic packages optimized design;Achieve user analysis system constructed from the access,search,call duration,SMS usage and other acts,multi-dimensional positioning user preferences and interests to form a customer portraits;Establish decision-making system for telecommunications services. This paper analyzes the Hadoop framework and the techniques used in HDFS and MapReduce, then on the Hadoop platform for data acquisition, data storage elaborated; Focuses on the use of MapReduce technology for parallel computing method; After the data stored in the data processing among the HDFS file system, and will be finished by Sqoop assembly process dump relational database;BI design of this system using J2EE development framework, and detailed design, completed on the basis of background data processing on traffic monitoring, operational support functions, customer-portrait function, decision support function in the realization of the use of clustering algorithms.Configure test environment in the laboratory environment,and separately for data transmission of large data sets and offline data processing under Hadoop,and front-end BI show.System operating normally effective,based on the experimental aspects of show Hadoop platform to meet the basic telecommunications data preprocessing and data storage needs.
中图分类号:

 TP311.13    

开放日期:

 2015-06-23    

无标题文档

   建议浏览器: 谷歌 火狐 360请用极速模式,双核浏览器请用极速模式