TDLR: DSS is almost entirely CPU limited, and is decently, but not perfectly multi-threaded. Modern desktop processors say 4 core or 6 core parts do very well. Higher core counts offer some increases but there are limited returns after a certain point. Clock speed is king.
I did do some tests and I could find no bottleneck with regards storage or RAM speeds. Even the terribly slow 4200rpm disk compared to a SATA3 SSD in the arrandale laptop did not seem to effect the outcome massively. An NVME SSD also offered no discernable advantage on the Ryzen. Similarly higher RAM clocks on the Ryzen changed nothing.
The Test:
10x raw light frames from a canon 700D, no darks, flats or bias. I time from registration through to displaying the final stacked image on the screen with single thread and multi thread modes
RAW/FITS - AHD debayer, no tickboxes checked for blackpoint or white balance
Register / Stacking - Register already registered frames, Auto detect hot pixels, Stack 100% of images, star detection 25% with median filter on
Stacking parameters - Result standard, no background calibration, Light = Median Kappa-Sigma*, Kappa = 2.5, Iterations = 5, Alignment = auto, Intermediate files yes FITS, Cosmetic Hot pixel detect 1px, threshold 80%, Cold pixel detect 1px, threshold 99%, Output auto save final image. Use all threads as required.
*I realise KS median is maybe not the best choice for light frames, but since I started with that I kept that setting on all test for consistency
Real World Results Single Thread Speed:
Laptop i5-520m, Arrandale architecture, 2933Mhz; 345s
Laptop i5-6200u, Skylake-u archtecture, 2700Mhz; 275s
Desktop i5-3570k stock, Ivy Bridge architecture, 3800Mhz; 210s
Desktop i5-3570k overclocked, Ivy Bridge architecture, 4500Mhz; 178s
Desktop i5-4460, Haswell architecture, 3400Mhz; 206s
Desktop AMD Ryzen 3700x stock, Zen2 architecture, 4400Mhz; 151s
Real World Results Multi Thread Speed:
Laptop i5-520m, Arrandale architecture, 2 cores 4 threads, 2700Mhz; 190s
Laptop i5-6200u, Skylake-u archtecture, 2 cores 4 threads, 2700Mhz; 141s
Desktop i5-3570k stock, Ivy Bridge architecture, 4 cores 4 threads, 3400Mhz; 97s
Desktop i5-3570k overclocked, Ivy Bridge architecture, 4 cores 4 threads, 4500Mhz; 77s
Desktop i5-4460 stock, Haswell architecture, 4 cores 4 threads, 3200Mhz; 91s
Desktop AMD Ryzen 3700x stock, Zen2 architecture, 8 cores 16 threads, 4000Mhz*; 47s
*multi-core clockspeeds for Ryzen (and indeed modern Intel parts) are variable, 4Ghz is an estimate here.
From an architecture standpoint; 2010 Arrandale architecture is at least 20% slower at least than even the 2012 Ivy Bridge clock for clock, which itself is 15% and 23% slower than 2014 Haswell and 2019 Zen2 respectively clock for clock.
From a mulithreading perspective, 2 core 4 thread laptop parts are about 1.9 times faster than the single thread speed, whereas the Ryzen part is "only" 3.2 times faster than its single thread speed with 8 cores and 16 threads.
The overclocked Ivy-Bridge parts scales near-linearly with clockspeed.
You can do some maths with simultaneous equations - not shown here! - and conclude that this DSS workload has about 77% of its work able to be multi-threaded, and 23% which remains single threaded. I can use this 77% figure to estimate the speeds of other CPUs which I dont currently have access too.
Estimated Intel i9 10900k 10c/20t all cores 4400Mhz multithread time: 40s
Estimated AMD Ryzen 9 3950x 16c/32t all cores 4200Mhz multithread time: 38s
Of course this is a test of only stacking 10 frames, if you are stacking hundreds of frames, these gains will add up, and perhaps 20% is several minutes!
As we can see we are into the territory of limited returns as both the £700 3950x and the £530 i9 are "only" ~20% faster than the £280 3700x.
Therefore I would expect, that the 4c/8t Ryzen 3300x and Intels equivalent i3 part, as well as the 6c/12t 3600 / i5 parts are likely to offer the best bang for buck for stacking.
Estimated AMD 3300x 4c/8t all cores 4200Mhz multithread time: 56s
Estimated AMD 3600 6/12t all cores 3800Mhz multithread time: 52s
Estimated Intel i3 10100 4c/8t all cores 4300Mhz multithread time: 61s
Estimated Intel i5 10600 6c/12t all cores 3300Mhz multithread time: 50s
At lot depends here on the exact multicore clock speed which are very difficult to accurately predict. But I would expect all these parts to score between 50 and 60 seconds.