本篇是sample.py
1 # -*- coding: utf-8 -*- 2 """Contains class representing an LSPI sample.""" 3 4 5 class Sample(object): 6 7 """Represents an LSPI sample tuple ``(s, a, r, s', absorb)``. 8 #表達了LSPI的采樣,用tuple表示 9 Parameters#輸入參數 10 ---------- 11 12 state : numpy.array#狀態向量 13 State of the environment at the start of the sample.采樣開始時環境的狀態 14 ``s`` in the sample tuple. 15 (The usual type is a numpy array.) 16 action : int#執行的動作的編號 17 Index of action that was executed. 18 ``a`` in the sample tuple 19 reward : float#從環境中獲得的獎勵 20 Reward received from the environment. 21 ``r`` in the sample tuple 22 next_state : numpy.array#采用了采樣中的動作后的下一個環境狀態 23 State of the environment after executing the sample's action. 24 ``s'`` in the sample tuple 25 (The type should match that of state.) 26 absorb : bool, optional#如果這個采樣終結了這個episode那么就返回True 27 True if this sample ended the episode. False otherwise. 28 ``absorb`` in the sample tuple 29 (The default is False, which implies that this is a 30 non-episode-ending sample) 31 32 33 Assumes that this is a non-absorbing sample (as the vast majority 34 of samples will be non-absorbing). 35 #假設這個sample是不會結束episode的, 36 #這么做:設成一個類,是為了方便不同的調用方式 37 This class is just a dumb data holder so the types of the different 38 fields can be anything convenient for the problem domain. 39 40 For states represented by vectors a numpy array works well. 41 42 """ 43 44 def __init__(self, state, action, reward, next_state, absorb=False):#初始化 45 """Initialize Sample instance.""" 46 self.state = state 47 self.action = action 48 self.reward = reward 49 self.next_state = next_state 50 self.absorb = absorb 51 52 def __repr__(self):#打印的時候調用該函數. 53 """Create string representation of tuple.""" 54 return 'Sample(%s, %s, %s, %s, %s)' % (self.state, 55 self.action, 56 self.reward, 57 self.next_state, 58 self.absorb)
本站转载的文章为个人学习借鉴使用,本站对版权不负任何法律责任。如果侵犯了您的隐私权益,请联系我们删除。