Vir Campestris
2023-12-20 17:17:23 UTC
This is not the right group for this - but I don't know where is.
Suggestions on a postcard please...
For reasons I won't go into I've been writing some code to evaluate
memory performance on my AMD Ryzen 5 3400G.
It says in the stuff I've found that each core has an 8-way set
associative L1 data cache of 128k (and an L1 instruction cache); an L2
cache of 512k, also set associative; and there's an L3 cache of 4MB.
To measure the performance I have three nested loops.
The innermost one goes around a loop incrementing a series of 64 bit
memory locations. The length of the series is set by the outermost loop.
The middle one repeats the innermost loop so that the number of memory
accesses is constant regardless of the series length.
The outermost one sets the series length. It starts at 1, and doubles it
each time.
I _thought_ what would happen is that as I increase the length of the
series after a while the data won't fit in the cache, and I'll see a
sudden slowdown.
What I actually see is:
With a series length of 56 to 128 bytes I get the highest speed.
With a series length of 500B to 1.5MB, I get a consistent speed of about
2/3 the highest speed.
Once the series length exceeds 1.5MB the speed drops, and is consistent
from then on. That I can see is main memory speed, and is about 40% of
the highest.
OK so far.
But...
Series length 8B is about the same as the 56 to 128 speed. Series length
16B is a bit less. Series length 32 is a lot less. Not as slow as main
memory, but not much more than half the peak speed. My next step up is
the peak speed. Series length 144 to 448 is slower still - slower in
fact than the main memory speed.
WTF?
I can post the code (C++, but not very complex) if that would help.
Andy
Suggestions on a postcard please...
For reasons I won't go into I've been writing some code to evaluate
memory performance on my AMD Ryzen 5 3400G.
It says in the stuff I've found that each core has an 8-way set
associative L1 data cache of 128k (and an L1 instruction cache); an L2
cache of 512k, also set associative; and there's an L3 cache of 4MB.
To measure the performance I have three nested loops.
The innermost one goes around a loop incrementing a series of 64 bit
memory locations. The length of the series is set by the outermost loop.
The middle one repeats the innermost loop so that the number of memory
accesses is constant regardless of the series length.
The outermost one sets the series length. It starts at 1, and doubles it
each time.
I _thought_ what would happen is that as I increase the length of the
series after a while the data won't fit in the cache, and I'll see a
sudden slowdown.
What I actually see is:
With a series length of 56 to 128 bytes I get the highest speed.
With a series length of 500B to 1.5MB, I get a consistent speed of about
2/3 the highest speed.
Once the series length exceeds 1.5MB the speed drops, and is consistent
from then on. That I can see is main memory speed, and is about 40% of
the highest.
OK so far.
But...
Series length 8B is about the same as the 56 to 128 speed. Series length
16B is a bit less. Series length 32 is a lot less. Not as slow as main
memory, but not much more than half the peak speed. My next step up is
the peak speed. Series length 144 to 448 is slower still - slower in
fact than the main memory speed.
WTF?
I can post the code (C++, but not very complex) if that would help.
Andy