The yaw-pitch-roll system is rotating an object (in this case the camera) about its own y-x-z axes. When the object is upright, the y-axis points up and down, the x-axis points to your left and right, and the z-axis points at and away from you.
The best way I find to visualize the camera's viewing angle is with an analogue to your monitor:
- Spinning your monitor around its base is change in
yaw (rotation about y-axis)
- Tilting your monitor's viewing angle up and down is change in
pitch (rotation about x-axis)
- Flipping your monitor 90 degrees to vertical orientation is change in
roll (rotation about z-axis)
The direction your monitor points here is
camera's perspective. Changing the yaw, pitch and roll change where the camera points, from a
fixed position.

But the camera doesn't have to be fixed. If you change where the camera is positioned in 3D space but fix the viewing angle, clearly it will point at something else. The camera's
focal point is the position in 3D space where the camera bases its own position on.
Imagine a sphere (or a dome) defined around the focal point. The camera lives on that sphere, and you can then think of the focal point as a center point at "ground level".
The
radius of the sphere is the distance from the camera to the focal point.
The
elevation angle is the angle from the ground, moving into the sky (or negative, moving beneath the ground). Put another way, increasing the elevation angle is like fixing the x-position of the camera and moving it along the sphere, starting by intersecting the z-axis at 0 and eventually intersecting the y-axis at 90.
The
azimuth angle is the angle circling around the focal point from a fixed elevation. If the elevation is 0, for example, changing the azimuth is like circling around the focal point at ground level. If the elevation is 90, changing the azimuth keeps you in the same place. In terms of the axes, increasing the azimuth angle is like fixing the y-position of the camera and moving it around the sphere, starting by intersecting the z-axis at 0 and eventually intersecting the x-axis at 90.

Important to note is that camera systems will by default always try to aim at the focal point. Whether you increase the elevation or azimuth, the camera will keep aiming at the focal point. Because of this, changing the yaw-pitch-roll is more like changing the camera's
offsets in perspective, away from the focal point. So when trying to relate the two images together, note that with zero yaw and pitch, the roll axis will point directly from the camera to the focal point. Additionally, the camera will try to stay upright relative to the focal point, so if you increase the elevation beyond 90 it'll flip itself over.
Lastly, if you want to move the camera around in its absolute position, without messing with the perspective, just change the focal point's position.