Neoseeker : Articles : CPU : Socket 775 : Core 2 Extreme QX9650 Launch & Review - 45nm Yorkfield aka Penryn is here!
Hardware Newsletter:
Email:

News Headlines
New Articles
Compare Prices

Motherboards
Abit
ASUS
Gigabyte
MSI
DFI
Intel
Tyan
More...

Processors
AMD
Intel
More...

Memory
DDR
DDR2
SDRAM
More...

Video Cards
ATI
eVGA
XFX
Sapphire
More...

search for lowest prices

send article   hardware newsletter   article comments (16)   Lowest prices check
Core 2 Extreme QX9650 Launch & Review - 45nm Yorkfield aka Penryn is here! - PAGE 3
William Henning - Monday, October 29th, 2007


50+ New SSE4 instructions

  • two types of 32 bit integer vector multiply operations
  • 8 bit unsigned and 16 & 32 bit signed and unsigned min/max instructions
  • Blends, Tests, Rounds
  • Zero and sign extensions
  • Inserts, extracts, scatters
  • Strided loads and stores
  • Video encode acceleration instructions
  • Floating point dot product operations
  • Streaming load instructions

According to Intel, these SSE4 additions can lead to dramatic performance gains, so let’s take a closer look at them:

PMULLD, PMULDQ – signed and unsigned multiplication for four packed 32 bit values

DPPS, DPPD – dot product instruction, used in matrix multiplication, 3D code

BLENDPDS, BLENDPD, BLENDVPS, BLENDVPD, PBLENDVB PBLENDDW – conditional copying of fields in packed SSE registers

PMINSB, PMAXSB, PMINUW, PMAXUD, PMINUD, PMAXUD, PMINDS, PMAXSD – min and max operations for packed signed and unsigned bytes, words and dwords

ROUNDPS, ROUNDSS, ROUNDPD, ROUNDSD – rounding of packed single and double precision floating point data

INSERTPS, PINSRB, PINSRD, PINSRQ, EXTRACTPS, PEXTRB, PEXTRD, PEXTRW, PEXTRQ – data insertion/extraction between XMM registers and memory or cpu general purpose registers

PMOVSXBW, PMOVZXBW, PMOVSXBD, PMOVZXBD, PMOVSXBQ, PMOVZXBQ, PMOVSXWD, PMOVZXWD, PMOVSWQ, PMOVZXWQ, PMOVSXDQ, PMOVZXDQ    - conert from packed integer to zero or sign extended integer of a wider type

PTEST – packed test

PCMPEQQ, PCMPGTQ – compared packed qword’s

PACKUSDW – convert packed signed DWORDS to packed unsigned WORDS

PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM
– advanced string comparison instructions

CRC32 – calculate a CRC polynomial

POPCNT – count number of bits set to 1

Ok, you can un-glaze your eyes now.



Counting every variation there is just a bit over fifty new instructions… but really there are only 14 totally unique instructions, with variations based on data type. Still, the new instructions will improve the quality of vector code, string comparisons, crc calculations and more, so they definitely will help – once compiler support for them arrives, and applications are re-compiled to take advantage of them.


Article Index

1.Introduction
2.Changes to Architecture
3.New SSE4 instructions
4.Test Setup & Benchmarks
5.Business Winstone & Content Creation
6.Sandra Tests
7.WinRAR & WinRAR MT
8.RightMark Read & Write
9.LAME MP3 & TMPGEnc
10.Rendering Tests
11.Call of Duty & Commanche 4
12.Doom 3 & Halo
13.Jedi Knight & Unreal Tournament
14.Quake 4 & World In Conflict
15.Penryn improvements explored
16.Overclocking the Yorkfield/QX9650
17.Overclocked QX9650 Business Winstone & Content Cre
18.Overclocked QX9650 Sandra Tests
19.Overclocked QX9650 WinRAR & WinRAR MT
20.Overclocked QX9650 RightMark Read & Write
21.Overclocked QX9650 RightMark Latency & Bandwidth
22.Overclocked QX9650 RightMark Lame MP3 & TMPGEnc
23.Overclocked QX9650 Rendering Tests
24.Overclocked QX9650 Call of Duty & Commanche 4
25.Overclocked QX9650 Doom3 & Halo
26.Overclocked QX9650 Jedi Knight & Unreal Tournament
27.Overclocked QX9650 Quake 4 & World In Conflict
28.Power Consumption & Conclusion

Submit our article to: diggDigg this! de.le.ciousdel.icio.us

Get updates when we publish new articles
Email Address:

(0.6151/d/barracuda)