Micro and Macro Fusion
Conroe's performance is further improved by the micro and macro fusion capabilities of the new core - in the example above, a compare instruction and a jump instruction are combined, and the three instructions will now execute in the time it would normally take to execute two instructions.
Due to macro and micro fusion, in certain cases the four execution units might actually be executing what were originally five or possibly even more instructions - and we will later see some examples of fusion at work in our Sandra CPU tests.

Smart Memory Access
The memory access scheduling allows for out of order loads and stores - as long as they are not interdependent. Basically, the processor has a better prediction mechanism that allows the processor to schedule loading of data from memory into the L2 cache before the code asks for it, increasing throughput by not making the processor wait for the load from main memory more often, helping to keep the pipeline full.

I think the diagram could have been a bit clearer; as the current one shows the two instructions out of order already on the top!