This artificial intelligence mechanism learns by interacting with its environment and receiving rewards (as it observes the result of its interaction) which will help it decide if its actions are relevant over the long term.
As the relevance is unknown at the outset, the AI will learn by its successive interactions – just like a human learning a new game. As it tests each action in each possible situation of the environment, the AI will update each action’s estimated long-term impact. The “long term” notion is crucial in reinforcement learning. When humans play chess, for example, it is better to sacrifice a pawn (short-term loss) to win the game. In the same way, to go from point A to point B in a car, sometimes it makes more sense to travel a bit farther to reach a highway and shorten total travel time.
By weighing each action in terms of the satisfaction it will bring, AI will adapt its behavior that aims to be rational and optimal.