This book presents reinforcement learning (RL) based solutions for user-centric online network selection optimization. The second part (chapter 4 and 5) focuses on how to meet dynamic user demand in complex and uncertain heterogeneous wireless networks under the framework of markov decision process (MDP).