查看论文信息

免费浏览

查看论文信息

论文中文题名：	基于强化学习的协作水声通信中继选择
姓名：	苏越
学号：	19207205048
保密级别：	公开
论文语种：	chi
学科代码：	085208
学科名称：	工学 - 工程 - 电子与通信工程
学生类型：	硕士
学位级别：	工程硕士
学位年度：	2022
培养单位：	西安科技大学
院系：	通信与信息工程学院
专业：	电子与通信工程
研究方向：	水声通信
第一导师姓名：	张育芝
第一导师单位：	西安科技大学
论文提交日期：	2022-06-22
论文答辩日期：	2022-06-06
论文外文题名：	Reinforcement Learning based Relay Selection for Underwater Acoustic Cooperative Communication
论文中文关键词：	协作通信 ; 中继选择 ; 强化学习 ; 水声网络
论文外文关键词：	Cooperative communication ; relay selection ; reinforcement learning ; underwater acoustic network
论文中文摘要：	︿在复杂和动态变化的水下声学信道中，水声（Underwater Acoustic, UWA）通信信道传输衰减大，时-空-频多重变化特性使得数据链路非常不稳定，导致在数据传输时容易发生中断。为了更好地选择中继节点，提高水声传感器网络的吞吐量，本文构建了协作数据传输网络模型，基于对UWA信道特性的分析，提出了基于强化学习（Reinforcement Learning, RL）的中继选择方案：（1）由于实际UWA信道具有长传输时延特性，进行中继选择时存在过时信道状态信息（Channel State Information, CSI）引起选择不准确的问题，在基于RL的中继选择方案中定义了动作集、状态集和动作选择策略，并且采用马尔可夫预测模型对信道状态进行预测，构建了时延CSI下基于RL的中继选择方案。仿真结果表明，与未进行信道预测的中继选择方案相比，该方案可以获得更高的吞吐量。（2）本文考虑到UWA信道的变化和长传输时延，构建了基于RL的UWA协作通信的状态和奖励函数，将模拟退火（Simulated Annealing, SA）算法与RL相结合，并提出了快速强化学习（Fast Reinforcement Learning, FRL）方案。设定状态为时延CSI和系统互信息的组合，奖励为选择不同中继节点对应系统互信息和接入时延的联合函数。在RL过程中，RL的贪婪因子通过SA算法的降温系数动态调整。此外，为了用于实际的UWA网络实现，提出了带有预训练过程的快速强化学习方案。仿真结果表明，与不考虑接入时延的方案相比，该方案可以选择信道质量好、接入时延小的最佳协作中继节点，所提出的SA-FRL方案具有更快的收敛速度和更高的吞吐量。综上所述，相对于没有同时考虑信道质量和传输时延的协作中继选择方案，本文提出的基于强化学习的协作水声通信方案收敛速度快、吞吐量大，且在实测的水声信道下具有鲁棒性。﹀
论文外文摘要：	︿ In the complex and dynamically changing underwater acoustic channel, the underwater acoustic (UWA) communication channel transmission attenuation is high. The time-space-frequency multiple variation characteristic makes the data link very unstable, leading to easy interruptions in data transmission. In order to better select relay node and improve the throughput of the UWA sensor network, this paper constructs a cooperative data transmission network model, and proposes a relay selection scheme based on reinforcement learning (RL) according to the analysis of UWA channel characteristics. (1) Due to the long transmission delay characteristic of the actual UWA channel, there is a problem of inaccurate selection caused by outdated channel state information (CSI) when performing relay selection. The RL based relay selection scheme is constructed by defining the action set, state set and action selection strategy, in which Markov prediction model is used to predict the channel state. Simulation results show that the scheme can obtain higher throughput compared with the relay selection scheme without channel prediction. (2) In this paper, the state and reward functions of RL based UWA cooperative communication are constructed considering the variability channels and long transmission delays. The simulated annealing (SA) algorithm is combined with RL. A fast reinforcement learning (FRL) scheme is proposed. The state is the combination of delay CSI and system mutual information, and the reward is the joint function of selecting different relay node corresponding to system mutual information and access delay. In the RL process, the exploration factor of RL is dynamically adjusted by the cooling factor of the SA algorithm. FRL scheme with a pre-training process is proposed for use in a practical UWA network implementation. Simulation results show that the scheme can select the best cooperative relay node with good channel quality and small access delay, and the proposed SA-FRL scheme has faster convergence speed and higher throughput than the scheme without access delay consideration. In summary, the reinforcement learning based cooperative UWA communication scheme proposed in this paper has fast convergence, high throughput, and robustness under the measured UWA channel compared to the cooperative relay selection scheme that is without considering both channel quality and transmission delay. ﹀
中图分类号：	TN929.3
开放日期：	2022-06-22

附件下载