Selected topic
Deep Reinforcement Learning
Prefer practical output? Use related tools below while reading.
| State | Action (Joystick) | Reward Signal |
| --- | --- | --- |
| Cart at x=0, Pole angle θ=0 | Pull the cart left | +1 (good) |
| Cart at x=-2, Pole angle θ=30° | Do nothing | -1 (bad) |
python
import torch
import torch.nn as nnclass CartPoleAgent(nn.Module):
def __init__(self, num_states, num_actions):
super(CartPoleAgent, self).__init__()
self.fc1 = nn.Linear(num_states, 128) # hidden layer with 128 units
self.fc2 = nn.Linear(128, num_actions) # output layer with num_actions units
def forward(self, x):
x = torch.relu(self.fc1(x)) # ReLU activation function for hidden layer
return self.fc2(x)
# Initialize the agent and environment
agent = CartPoleAgent(num_states=4, num_actions=2)
env = gym.make('CartPole-v0')
# Train the agent using DQN (Deep Q-Network) algorithm
for episode in range(100):
state = env.reset()
done = False
while not done:
# Select an action based on current policy (e.g., ε-greedy)
action = agent(state).max(1)[1]
# Take the action and observe the new state, reward signal
next_state, reward, done, _ = env.step(action.item())
# Update Q-values using TD-error
td_error = (agent(next_state) - agent(state)).sum()
agent.state_dict()['fc2.weight'] += 0.01 * td_error
state = next_state
Note: This is a simplified example. In practice, you would need to tune hyperparameters, choose a suitable neural network architecture, and possibly incorporate techniques like exploration-exploitation trade-off, experience replay, or target networks to stabilize training.
Hope this helps!