Jump to content

Banner.jpg.b83b14cd4142fe10848741bb2a14c66b.jpg

I benchmarked DSS on 5 different PCs!


calorno

Recommended Posts

TDLR: DSS is almost entirely CPU limited, and is decently, but not perfectly multi-threaded. Modern desktop processors say 4 core or 6 core parts do very well. Higher core counts offer some increases but there are limited returns after a certain point. Clock speed is king.
I did do some tests and I could find no bottleneck with regards storage or RAM speeds. Even the terribly slow 4200rpm disk compared to a SATA3 SSD in the arrandale laptop did not seem to effect the outcome massively. An NVME SSD also offered no discernable advantage on the Ryzen. Similarly higher RAM clocks on the Ryzen changed nothing.

The Test:

10x raw light frames from a canon 700D, no darks, flats or bias. I time from registration through to displaying the final stacked image on the screen with single thread and multi thread modes
RAW/FITS - AHD debayer, no tickboxes checked for blackpoint or white balance
Register / Stacking - Register already registered frames, Auto detect hot pixels, Stack 100% of images, star detection 25% with median filter on
Stacking parameters - Result standard, no background calibration, Light = Median Kappa-Sigma*, Kappa = 2.5, Iterations = 5, Alignment = auto, Intermediate files yes FITS, Cosmetic Hot pixel detect 1px, threshold 80%, Cold pixel detect 1px, threshold 99%, Output auto save final image. Use all threads as required.
*I realise KS median is maybe not the best choice for light frames, but since I started with that I kept that setting on all test for consistency


Real World Results Single Thread Speed:
Laptop i5-520m, Arrandale architecture, 2933Mhz; 345s
Laptop i5-6200u, Skylake-u archtecture, 2700Mhz; 275s
Desktop i5-3570k stock, Ivy Bridge architecture, 3800Mhz; 210s
Desktop i5-3570k overclocked, Ivy Bridge architecture, 4500Mhz; 178s
Desktop i5-4460, Haswell architecture, 3400Mhz; 206s
Desktop AMD Ryzen 3700x stock, Zen2 architecture, 4400Mhz; 151s

Real World Results Multi Thread Speed:
Laptop i5-520m, Arrandale architecture, 2 cores 4 threads, 2700Mhz; 190s
Laptop i5-6200u, Skylake-u archtecture, 2 cores 4 threads, 2700Mhz; 141s
Desktop i5-3570k stock, Ivy Bridge architecture, 4 cores 4 threads, 3400Mhz; 97s
Desktop i5-3570k overclocked, Ivy Bridge architecture, 4 cores 4 threads, 4500Mhz; 77s
Desktop i5-4460 stock, Haswell architecture, 4 cores 4 threads, 3200Mhz; 91s
Desktop AMD Ryzen 3700x stock, Zen2 architecture, 8 cores 16 threads, 4000Mhz*; 47s

*multi-core clockspeeds for Ryzen (and indeed modern Intel parts) are variable, 4Ghz is an estimate here.

From an architecture standpoint; 2010 Arrandale architecture is at least 20% slower at least than even the 2012 Ivy Bridge clock for clock, which itself is 15% and 23% slower than 2014 Haswell and 2019 Zen2 respectively clock for clock.
From a mulithreading perspective,  2 core 4 thread laptop parts are about 1.9 times faster than the single thread speed, whereas the Ryzen part is "only" 3.2 times faster than its single thread speed with 8 cores and 16 threads.
The overclocked Ivy-Bridge parts scales near-linearly with clockspeed.

You can do some maths with simultaneous equations - not shown here! - and conclude that this DSS workload has about 77% of its work able to be multi-threaded, and 23% which remains single threaded. I can use this 77% figure to estimate the speeds of other CPUs which I dont currently have access too.

Estimated Intel i9 10900k 10c/20t all cores 4400Mhz multithread time: 40s
Estimated AMD Ryzen 9 3950x 16c/32t all cores 4200Mhz multithread time: 38s

Of course this is a test of only stacking 10 frames, if you are stacking hundreds of frames, these gains will add up, and perhaps 20% is several minutes!

As we can see we are into the territory of limited returns as both the £700 3950x and the £530 i9 are "only" ~20% faster than the £280 3700x.


Therefore I would expect, that the 4c/8t Ryzen 3300x and Intels equivalent i3 part, as well as the 6c/12t 3600 / i5 parts are likely to offer the best bang for buck for stacking.

Estimated AMD 3300x 4c/8t all cores 4200Mhz multithread time: 56s

Estimated AMD 3600 6/12t all cores 3800Mhz multithread time: 52s

Estimated Intel i3 10100 4c/8t all cores 4300Mhz multithread time: 61s

Estimated Intel i5 10600 6c/12t all cores 3300Mhz multithread time: 50s

At lot depends here on the exact multicore clock speed which are very difficult to accurately predict. But I would expect all these parts to score between 50 and 60 seconds.

 

  • Like 1
Link to comment
Share on other sites

Nice detailed report you have made calorno , really useful investigation too.  I guess like a lot of equipment the law of diminishing returns on outcome v cost is often overlooked.  I know I quite often get carried away by convincing myself to push the budget a bit further chasing a perceived gain.  Funny, but I was watching astrobiscuits video on his £6000 v £600 astro imaging rig challenge and I think there's a common message between your two investigations.  

 

Jim 

Link to comment
Share on other sites

Thanks Jim, I do agree mostly although this benchmark was for 10 images. Many are now stacking 100s of shorter exposure frame and perhaps a beefy processor will shave several minutes off processing time in those scenarios!

  • Like 1
Link to comment
Share on other sites

21 hours ago, calorno said:

.... are likely to offer the best bang for buck for stacking.

Interesting test, but since imaging takes hours, the best bang for bug is some patience.

Normally a huge amount of mbytes are processed. I'm surpriced HDD vs SSD doesn't makes a real difference.

Edited by han59
Link to comment
Share on other sites

On 29/06/2020 at 10:17, han59 said:

Normally a huge amount of mbytes are processed. I'm surpriced HDD vs SSD doesn't makes a real difference.

Yes I was also surprised but I have thought about this.
For this test, there are 10 raw image files at 22MB each, and then at the end DSS will write a ~200MB autosave FTS file.
These should be sequential reads and writes, and even low speed laptop spinning disks can delivery 50MB/sec sequential read and write. For a desktop spinning disk that might be closer to ~150MB/sec.
So even with a ridiculous slow disk, I would expect less than 8 seconds of time spent in i/o. That might well impact the ultra-fast machines delivering sub 1 minute scores in this test but in more modest machines its not a big factor. Nobody is running a laptop spinning disk in a high end desktop.
Of course, if we changed the test to use files from higher resolution cameras with maybe 16bit per channel colour depth, we might increase the data volumes significantly. Then SSD may show some speed up.






As per a suggestion from another user, I have now uploaded the sample images here for people to run this same test themselves, here;

https://drive.google.com/file/d/1s3ajgapHzT8iv_zZZ9P7MUTNmr68ocoa/view?usp=sharing


Additionally I have created a spreadsheet with my results in where users can add their own scores for comparison, here;
https://ethercalc.org/1h8mcjg5yngc
 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. By using this site, you agree to our Terms of Use.