CPU Architecture and FAH
Posted: Fri Oct 17, 2008 10:30 am
I originally posted this detailed information in a specific thread, but some people asked me to create an unique thread to discuss CPU architecture, and how FAH will benefit from the different parts of the CPU. This thread can also be used to discuss how stability and heat generation is impacted by the use of the different parts of the CPU.
A CPU can be divided into different subparts that won't do the same operations ... here are the most common for a modern CPU (for example a Core 2 or an Athlon X2 or a Phenom X4) ... keep in mind that there is usually many units of one type in a single CPU :
- ALU (Arithmetic and Logic Unit) : the main job of this unit is to do basic arithmetic and logic operations, like addition, comparison, shifting, boolean operators, ... this is the oldest unit that exists in a CPU and it can only work on integers. This unit is not used a lot in FAH, but it is used in cryptography or compression algorithms.
- FPU (Floating Point Unit) : this unit has been added (first as a coprocessor) in the 286 (287 coprocessor) and 386 CPU (387 coprocessor). It is integrated to the CPU since 486. This unit is doing advanced math and logic operations like multiplication, power, division, ... This unit works with floating point numbers (the most common ones in the world), and can use single or double precision. This unit is used in many applications, like games (3D rendering) or multimedia applications for example. In FAH, this unit was used by Tinker core, and is now used in Amber core, or Gromacs when the message (using standard loops) is printed.
The following unit have been added as improvements to the two basic unit :
- MMX (MultiMedia eXtensions) : this unit was added by Intel to the Pentium core to speed up multimedia application. This is an extension to the ALU, and it can only work on integers too. This extension is usually useless, and multimedia application use floating point operation ... This unit can speed up file compression or cryptography operations, but it's usually not used by FAH.
- SSE (Streaming SIMD Extension) : this unit was first implement into the Pentium 3 CPU. This is one of the most interesting units in a CPU : this is the first unit that is able to apply one instruction to different data at the same time (SIMD : Single Instruction, Multiple Data). This unit works with floating point number (extension to the FPU), so it is used by many applications (multimedia, games, computing, ...). In FAH, this unit is massively used by Gromacs core and its variants (Gromacs, Gromacs33, GroST, GroSMP), and is signaled with the message "Extra SSE boost OK". SSE instruction can only be used in single precision calculations. AMD implemented 3Dnow! in Athlon CPUs to challenge Intel's SSE, which is their equivalent to the SSE instructions (all current AMD processors currently support both SSE and 3Dnow!).
- SSE2 : these are additional instructions added to the original SSE (in Pentium 4 CPUs) to speed up calculations in double precision. They apply to the same type of jobs (games, multimedia, computing, ...) as SSE. These instructions are used by Double Gromacs core in FAH, and it's variants (Dgromacs, Dgromacs B and Dgromacs C).
In addition to the processing units, you need to understand how a CPU gets it's data for memory. There is usually 3 "level" of memory :
- Level 1 cache
- Level 2 cache
- System memory
When the CPU is working on small data that fit in the L1 cache, there is no accesses to the other "memories". As data size grows, it will start using L2 cache, and then system memory.
Now we can talk about power consumption and stability issues. The worst case is of course when a lot of processing units are used, with a lot of data to move between CPU and memory. Here are some examples, with FAH cores and WU :
- Amber core : it's the lightest operation we can find in FAH as it only uses ALU and FPU. These unit are usually small, and won't stress caches a lot too.
- Gromacs (Dgromacs) core : it's the hardest thing to do for the CPU. It uses ALU, FPU and SSE (SSE2). If you're opted for BigWU, it will also stresses caches and memory.
- Gromacs33 is like Gromacs, but with a newer code, it tend to be more optimized and stressful.
- GroSMP is a bit different : it doesn't stresses CPU as hard as regular Gromacs because processing power is limited by data transfers between CPU cores, but it's easy to guess, it will stresses a lot the caches and memory subsystem. The A2 SMP core is progressively changing the rule as it better use the CPU cores ... So we can say this is one of the "worst" case using ALU, FPU, SSE, caches and system memory.
Tell me if there is something wrong or that you don't understand.
A CPU can be divided into different subparts that won't do the same operations ... here are the most common for a modern CPU (for example a Core 2 or an Athlon X2 or a Phenom X4) ... keep in mind that there is usually many units of one type in a single CPU :
- ALU (Arithmetic and Logic Unit) : the main job of this unit is to do basic arithmetic and logic operations, like addition, comparison, shifting, boolean operators, ... this is the oldest unit that exists in a CPU and it can only work on integers. This unit is not used a lot in FAH, but it is used in cryptography or compression algorithms.
- FPU (Floating Point Unit) : this unit has been added (first as a coprocessor) in the 286 (287 coprocessor) and 386 CPU (387 coprocessor). It is integrated to the CPU since 486. This unit is doing advanced math and logic operations like multiplication, power, division, ... This unit works with floating point numbers (the most common ones in the world), and can use single or double precision. This unit is used in many applications, like games (3D rendering) or multimedia applications for example. In FAH, this unit was used by Tinker core, and is now used in Amber core, or Gromacs when the message (using standard loops) is printed.
The following unit have been added as improvements to the two basic unit :
- MMX (MultiMedia eXtensions) : this unit was added by Intel to the Pentium core to speed up multimedia application. This is an extension to the ALU, and it can only work on integers too. This extension is usually useless, and multimedia application use floating point operation ... This unit can speed up file compression or cryptography operations, but it's usually not used by FAH.
- SSE (Streaming SIMD Extension) : this unit was first implement into the Pentium 3 CPU. This is one of the most interesting units in a CPU : this is the first unit that is able to apply one instruction to different data at the same time (SIMD : Single Instruction, Multiple Data). This unit works with floating point number (extension to the FPU), so it is used by many applications (multimedia, games, computing, ...). In FAH, this unit is massively used by Gromacs core and its variants (Gromacs, Gromacs33, GroST, GroSMP), and is signaled with the message "Extra SSE boost OK". SSE instruction can only be used in single precision calculations. AMD implemented 3Dnow! in Athlon CPUs to challenge Intel's SSE, which is their equivalent to the SSE instructions (all current AMD processors currently support both SSE and 3Dnow!).
- SSE2 : these are additional instructions added to the original SSE (in Pentium 4 CPUs) to speed up calculations in double precision. They apply to the same type of jobs (games, multimedia, computing, ...) as SSE. These instructions are used by Double Gromacs core in FAH, and it's variants (Dgromacs, Dgromacs B and Dgromacs C).
In addition to the processing units, you need to understand how a CPU gets it's data for memory. There is usually 3 "level" of memory :
- Level 1 cache
- Level 2 cache
- System memory
When the CPU is working on small data that fit in the L1 cache, there is no accesses to the other "memories". As data size grows, it will start using L2 cache, and then system memory.
Now we can talk about power consumption and stability issues. The worst case is of course when a lot of processing units are used, with a lot of data to move between CPU and memory. Here are some examples, with FAH cores and WU :
- Amber core : it's the lightest operation we can find in FAH as it only uses ALU and FPU. These unit are usually small, and won't stress caches a lot too.
- Gromacs (Dgromacs) core : it's the hardest thing to do for the CPU. It uses ALU, FPU and SSE (SSE2). If you're opted for BigWU, it will also stresses caches and memory.
- Gromacs33 is like Gromacs, but with a newer code, it tend to be more optimized and stressful.
- GroSMP is a bit different : it doesn't stresses CPU as hard as regular Gromacs because processing power is limited by data transfers between CPU cores, but it's easy to guess, it will stresses a lot the caches and memory subsystem. The A2 SMP core is progressively changing the rule as it better use the CPU cores ... So we can say this is one of the "worst" case using ALU, FPU, SSE, caches and system memory.
Tell me if there is something wrong or that you don't understand.