论文中文题名: | 区间型符号数据主成分分析及有效性研究 |
姓名: | |
学号: | 16208009002 |
学科代码: | 070104 |
学科名称: | 应用数学 |
学生类型: | 硕士 |
学位年度: | 2019 |
院系: | |
专业: | |
第一导师姓名: | |
论文外文题名: | Principal Component Analysis of Interval Symbol Data and Validity Study |
论文中文关键词: | |
论文外文关键词: | Interval data ; Principal component analysis ; Dimensionality reduction ; Empirical correlation matrix |
论文中文摘要: |
随着计算机的产生与普及,数据呈爆炸式增长。面对繁多复杂的高维数据,选择合适的数据分析技术,能够有效解决维数灾难。传统的数据分析技术仅针对点数据,难以把握数据的内在属性,符号数据分析技术基于分类“打包”的思想,能够从全局把握数据的内在关系。本文主要对区间型符号数据进行主成分分析研究,并对其有效性进行深入对比和分析。
首先针对现有的区间型符号数据主成分分析法计算量大和分析结果不准确等缺陷,提出两种改进的算法:IMO-PCA和ECM-PCA。IMO-PCA根据区间矩阵运算法则,推导出区间矩阵的协方差矩阵和相关系数矩阵等定义,借助谱半径法得到相关系数矩阵的区间特征向量,从区间特征向量出发得到区间主成分;ECM-PCA通过类比实数变量,得出区间变量的经验联合分布函数,进而推导出区间变量的均值、方差、协方差和相关系数,从相关系数矩阵出发得到区间主成分。以上两种改进的算法均假设样本数据为正态分布的区间数,更符合现实数据的分布情况。
然后对现有的区间主成分分析法和改进后两种算法的有效性进行比较研究。针对传统主成分效度指标度量方法的单一性缺陷,提出两种影响效度指标的度量因子,设计并实施随机模拟实验,并结合实例进一步对比研究。实验结果显示,本文提出的两种方法相比现有的区间主成分分析法具有明显的优势,分析结果更为准确,能客观的反映现实情况。最后,对本文提出的算法加以应用。
﹀
|
论文外文摘要: |
With the emergence and popularity of computers, data has exploded. Faced with a variety of complex high-dimensional data, the selection of appropriate data analysis technology can effectively solve the dimensional disaster. The traditional data analysis technology is only for point data, it is difficult to grasp the intrinsic properties of the data. The symbol data analysis technology is based on the idea of classification "packaging", and can grasp the inherent relationship of data from the whole. This paper mainly conducts principal component analysis on interval-type symbol data, and deeply compares and analyzes its effectiveness.
Firstly, aiming at the defects of large amount of calculation and inaccurate analysis results of the existing interval-type symbol data Principal Component Analysis, two improved algorithms are proposed: IMO-PCA and ECM-PCA. IMO-PCA derives the definitions of the covariance matrix and the correlation coefficient matrix of the interval matrix according to the interval matrix algorithm. The interval eigenvector of the correlation coefficient matrix is obtained by the spectral radius method, and the interval principal component is obtained from the interval eigenvector; ECM-PCA Through the analog real variable, the empirical joint distribution function of the interval variable is obtained, and then the mean, variance, covariance and correlation coefficient of the interval variable are derived. The interval principal component is obtained from the correlation coefficient matrix. Both of the above improved algorithms assume that the sample data is a normal distribution interval number, which is more in line with the distribution of real data.
Then compare the effectiveness of the existing interval principal component analysis method and the improved two algorithms. Aiming at the single defect of the traditional principal component validity index measurement method, two measurement factors affecting the validity index are proposed. The random simulation experiment is designed and implemented, and further research is carried out with examples. The experimental results show that the two methods proposed in this paper have obvious advantages compared with the existing interval principal component analysis method, and the analysis results are more accurate and can objectively reflect the reality. Finally, the algorithm proposed in this paper is applied.
﹀
|
中图分类号: | O29 |
开放日期: | 2019-06-20 |