Left: A 5x5 input feature map depth 1.
Instead of taking the largest element we could also take the average Average Pooling or sum of all elements in that window.
It relies on the assumption that if a patch feature is useful to compute at some spatial position, then it should also be useful to compute at other positions.