Core 2 Extreme QX9650 Launch & Review - 45nm Yorkfield aka Penryn is here! - PAGE 3William Henning - Monday, October 29th, 2007
50+ New SSE4 instructions
- two types of 32 bit integer vector multiply operations
- 8 bit unsigned and 16 & 32 bit signed and unsigned min/max instructions
- Blends, Tests, Rounds
- Zero and sign extensions
- Inserts, extracts, scatters
- Strided loads and stores
- Video encode acceleration instructions
- Floating point dot product operations
- Streaming load instructions

According to Intel, these SSE4 additions can lead to dramatic performance gains, so let’s take a closer look at them:
PMULLD, PMULDQ – signed and unsigned multiplication for four packed 32 bit values
DPPS, DPPD – dot product instruction, used in matrix multiplication, 3D code
BLENDPDS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB PBLENDDW – conditional copying of fields in packed SSE registers
PMINSB, PMAXSB, PMINUW, PMAXUD, PMINUD, PMAXUD, PMINDS, PMAXSD – min and max operations for packed signed and unsigned bytes, words and dwords
ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD – rounding of packed single and double precision floating point data
INSERTPS, PINSRB, PINSRD, PINSRQ, EXTRACTPS, PEXTRB, PEXTRD, PEXTRW, PEXTRQ – data insertion/extraction between XMM registers and memory or cpu general purpose registers
PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ - conert from packed integer to zero or sign extended integer of a wider type
PTEST – packed test
PCMPEQQ, PCMPGTQ – compared packed qword’s
PACKUSDW – convert packed signed DWORDS to packed unsigned WORDS
PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM – advanced string comparison instructions
CRC32 – calculate a CRC polynomial
POPCNT – count number of bits set to 1
Ok, you can un-glaze your eyes now.

Counting every variation there is just a bit over fifty new instructions… but really there are only 14 totally unique instructions, with variations based on data type. Still, the new instructions will improve the quality of vector code, string comparisons, crc calculations and more, so they definitely will help – once compiler support for them arrives, and applications are re-compiled to take advantage of them.