AVX 512
- Keygen Master
- Jun 18, 2022
- 2 min read
AVX 512: Advanced Vector Extensions 512
Advanced Vector Extensions are Intel’s instructions for carrying out vector processing. They are SIMD (Single Instruction Multiple Data) instructions that make use of 32 512-bit registers vector registers. Each of these 512-bit registers can pack 16 single-precision floating point numbers (32 bits) or 8 double-precision floating point numbers (64 bits). The registers can also operate on integers (32 or 64 bits).
SIMD means a single instruction operates on multiple pieces of data. A very simple example of that would be adding all numbers in an array. You give the instruction and the processor adds them up in huge chunks all at once.
So next time you want to add all elements in an array, check that your compiler supports vector extensions and use them. This should make your code faster.
Prefetch Instructions: Caches are bits of memory that are built into the processor and are therefore much faster to access than the RAM. Intel processors have 3 levels of caches the
L1 cache: This is the closest to the processor and the fastest to access by the processor. The L1 cache is 32kb; 16kb for code and 16kb for data.
L2 cache: These are the next level of cache following the L1 caches. It takes a bit longer to access these than to access the L1 caches. L2 caches are 128kb (combined code + data).
L3 cache: L3 caches are 3MB and access to them are much slower than the other 2 but faster than fetching from the RAM. Just like L2, this is a combined code + data cache.
Intel prefetch instructions fetch data (and code) into the L1 and L2 caches before they are needed. This makes later access to these bits of data and code much faster. Intel advices, though, that one needs not use these instructions. Intel processors already do a good job at the caching business.
The processor fetches data from the RAM into the caches in chunks called cache lines and the width of the cache line is 64 bytes. In other words, when the processor has to get stuff from the RAM, it fetches 64 bytes at once. This done by fetching data 8 times getting 8 bytes each time (this is called burst mode).
This caching mechanism exploits the concept of spatial locality of reference (when you access a piece of data or code, you will soon need to access pieces of code or data around it). This is where arrays have advantage over linked-lists. Remember that elements of an array live next to each other in RAM, so when you access an element of the array there’s a very high chance that the next element is already in the cache.
Translation Lookaside Buffer: This is a special kind of memory for caching address look up. Its like this; when the processor accesses an address in memory, it converts the given address from its virtual form into a physical address. The results of this conversion is then cached in the TLB to make the process faster next time.


Comments