Best ways of optimising processing?

vineyard · December 22, 2021

Hello,

Apologies if this has already been discussed before, I was wondering what might be ways of optimising processing.

Basically, I recently got 8h+ of data on IC410 - 60s lights coming to 11GB of raw data. After discarding some dodgy frames, I still managed to get 7h24 of data (here) which took my (admittedly v old) laptop 22hrs in APP to process (w 140GB of working memory needed on the hard drive).

I'm v glad I did it b/c its a far better image than an earlier 2.5h exploration - more data clearly make a difference.

But I'd eventually (weather permitting) like to be able to get to 24-30h of data on a target. But that would probably make my machine keel over (88hrs of integration in APP?!).

So what might be some good optimisations to go for?

Some ideas below but I've no clue whether they're daft , or which one would be better or worse.

1. Integrate in batches & then register & integrate (or pixel math) the output of each batch-integration?

2. Switch to mono, take 5-7h w each filter and then integrate each of those - then register & integrate (or pixel math)? This is a bit like 1 but perhaps this would give more holistic information because of the different filters (as opposed to just 24-30h of the same OSC filter)?

3. Take longer lights (120s instead of 60s) effectively having the total GB? Although that would introduce blow-out risk on brighter stars/nebulosities etc so might make processing slightly trickier?

4. Get a faster machine (but how much faster would that make it - would it be multiples? My mac is 10y old w 16GB of 1333 MHz DDR3 memory & 2.4GHz Intel i5 processor)

5. Forget about 24-30h b/c it would be diminishing marginal returns after X hrs? What is X though?

Any other angles I am missing?

Thank you & stay safe all,

Vin

Martin Meredith · December 22, 2021

Hi Vin

I think all those ideas sound feasible but I imagine the simplest would be to use a solid state drive if the disc is being used as working memory. But first it would be worth monitoring exactly what it is doing (in terms of disc/cpu) while processing your data, which is something you can do using Activity Monitor on the Mac.

Martin

Laurin Dave · December 22, 2021

Read and write times to disk can be a substantial portion of the total processing time so if you aren’t using one a ssd can speed things up ..

vlaiv · December 22, 2021

I agree with SSD being key component - so look into that first.

Try different stacking software - maybe give Siril a go.

APP is written in Java and Java will eat up some of resources of your computer and is not champion of speed.

Take a smaller set of subs and time:

Siril

APP

DeepSkyStacker

and see if there is significant difference in speed between programs. You may be surprised there.

drjolo · December 22, 2021

I would consider longer subframes - your setup is around f/6, so if you do not have tracking/guiding issues you can easily go to 2-3 minutes if you plan to collect many hours in total. It may require some gain adjustments.

With regards to performance I have processed a few times 300-400 subframes (16Mpx) of one target using PixInsight and that took about 1 hour of processing time on i7 / 16GB machine - aligning and stacking, I do not calibrate in Pix.

PS - I did some time ago a small comparison of the performance - you may check at https://astrojolo.com/gears/astro-applications-cpu-hunger/ , but that was for a small amount of frames.

Edited December 22, 2021 by drjolo

vineyard · December 23, 2021

Thanks all. V helpful.

Sorry I should have clarified I am already using an SSD external hard drive.

@Martin Meredith I will keep Activity Monitor running next time I process. But while looking at those aspects I checked the CFG settings on my APP instal, and it was only using 4GB of my system RAM. I'm guessing that will be dragging performance so I have upped that to 15GB (the highest choice it gives).

@Laurin Dave my old laptop only supports USB2 so I am probably constrained by that wrt the external SSD drive read/write but that is s/t to keep in mind when I eventually get round to upgrading the laptop (its so old that it can only handle High Sierra OS 10.13 - the newest PI features which need later OS can't run on it - so I know its only a matter of time when the poor thing says "enough")

@vlaiv that's a task for if the conditions remain so poor over the holidays I initially started w SiriL but found it cumbersome to use (but that may be b/c that was the first processing software I evert tried so was completely new to these aspects). I switched to DSS & found that much more user-friendly, but then noticed it consistently gave weird edge artefacts compared to APP on the widefield images I was taking with a flattened Evoguide50 so I switched to APP. I stopped using PI v quickly for anything other than post-processing b/c it just guzzled memory & time.

@drjolo yes I think I will start trying longer subs. I use guiding & my RMS is usually <1 (unless its windy and I've got the 1m+ physical length frac mounted), so I have been chewing over increasing exposure time. I currently use 125 gain b/c its just above the level where read noise steps down on an ASI294MCP, so I'm hoping that that gain is fine for longer subs too (else it'll increase read noise quite a bit if the graphs are accurate). 1hr for 400 subs of 16Mb each would be awesome! That's a v handy comparison table, thank you. Just eyeballing it, my machine is i5 w a much slower CPU (1.3GHz) & effectively APP was only using 4GB of RAM as per above so that would put it waaaay at the bottom of that table, which would explain 22h (!).

I guess I can't do anything about the CPU speed for now, so will try with a subset of lights running 15GB of memory in APP & see how fast that does it. And then also try a like-for-like comparison w DSS & SiriL - will report back if I get the chance to do that.

Thanks again for all your help - it's always worth asking on SGL - happy holidays all!

gilesco · December 23, 2021

As the processing is a one-time thing, have you thought of, rather than using a laptop, get a cloud computing account, e.g. Azure, AWS, and do the registration / stacking integration in the cloud? You only need pay for the actual time you use it, then power down the VM or discard it.

The PixInsight / Photoshopping probably needs to be done locally though, license considerations, and visual determinations can be tricky over a remote desktop session.

vineyard · December 23, 2021

Oh that's interesting @gilesco - I've never used one of those before. So it's like a VM - I'd have to upload all the data to the cloud, and also install APP/SiriL/DSS etc to the cloud machine <does APP allow that?> & then run the processing on that (ie, essentially use the much faster machines of AWS)? I've always thought of AWS as commercial pricing, didn't realise normal retail accounts could also use it.

gilesco · December 23, 2021

11 minutes ago, vineyard said:

Oh that's interesting @gilesco - I've never used one of those before. So it's like a VM - I'd have to upload all the data to the cloud, and also install APP/SiriL/DSS etc to the cloud machine <does APP allow that?> & then run the processing on that (ie, essentially use the much faster machines of AWS)? I've always thought of AWS as commercial pricing, didn't realise normal retail accounts could also use it.

Yes, you would have to upload your data, and if you don't have a fast upload speed (thankfully I do) it could take a while.

APP allows multiple installs to a limit (I have installed it on 3 machines simultaneously). If you reach the limit I think you can de-register old installs - not sure about the process for that.

I run a small VM in Azure for other purposes, it costs around £10 a month, but it runs as a server continuously, if were to shut it down then the subscription would cost close to zero, perhaps not zero though. Renting a larger VM would cost more per month, but for this purpose you only need to rent it for the processing time.

The Lazy Astronomer · December 23, 2021

Personally, I would say definitely upgrade your computer hardware. It is unfortunately not the cheap option, but if you can buy a high spec machine, you'll probably get another 10 years of use out of it.

As an example of the speed of Siril and a decent spec PC: I recently calibrated and stacked 360 frames (23mb), which I also drizzled (so ~92mb per frame). I did the whole process manually, and from start to finish, it took about 30 mins. The actual stacking part took <5mins. This is with an i9, 32gb RAM, and an M.2 NVMe SSD.

This is probably just my personal bias here, because I loathe working on laptops, but I think you get more for your money if you go for a desktop.

dannybgoode · December 23, 2021

As you are a Mac user do you have a separate monitor.

If you have get a Mac Mini M1 and that will be massively faster than what you have now and is a cost effective upgrade (as these things go).

I prefer to work locally so this would be my route to faster processing

ollypenrice · December 23, 2021

I stack and calibrate in AstroArt out of old habit, but when I see other calibration software in action I'm amazed by how slow it is compared with AA. You can try a free download which won't save.

Olly

ONIKKINEN · December 23, 2021

I would upgrade the laptop to a selfbuilt desktop PC. If assembling a PC is off the table, look into non-branded desktops. General brand name desktops are the worst moneysinks there are (HP, acer etc). Gaming PCs built by a dedicated gaming related store would be the best choice to look for less bad prebuilts.

I have an I7 6700K, decent speed 16GB DDR4 ram , GTX1080 and SSDs for storage.

Yesterday i split stacked 800 subs, 50mb each of OSC data into monochrome RGB stacks with Siril and Sirilic. Took about 3 hours and 400GB of storage, while a previous stack i did with APP in mosaic mode took maybe 6-7h for only 300 subs. Maybe siril is faster so give it a try?

Split stacking includes first calibrating as 32bit files and saving the calibrated files for later use. Then splitting the colourchannels into 4 different subs with no debayering (1R/B 2G). Then stacking those into mono RGB stacks. So the 800 subs turns into 3200 subs which are then stacked. Should give you an idea for how much faster you can get with a better PC. If you buy newer hardware it would be even faster, so definitely look into that!

vineyard · December 23, 2021

Thanks all for the further feedback, much appreciated!

Yes I fear its looking like new hardware I suspect. @dannybgoode I do have an old surplus-to-requirements HDready TV & so was indeed thinking about a MacMini - do you use one? @ollypenrice unfortunately AstroArt only seems to be Windows & I'm Mac-based (I guess I could use a virtual windows machine which is how I use DSS). @The Lazy Astronomer & @ONIKKINEN those look like so much faster speeds on SiriL - I will try that & see (btw the self-built PC idea runs into my mac restriction as well as me not being sufficiently hardware-literate).

Anyway, I left the machine running today while I did other things, and tried 193 lights of 60s with both APP & DSS. Images below are ABE & HT from PI (no other processing done), 2x downsampled & saved as JPGs.

The APP version is the first one, the DSS the second. APP took 4h7 (quite a bit faster than 22h for 444 lights!) but still quite slow - DSS whizzed through in 1h10 for th.

There does seem a difference in image-quality between the two though - DSS has a bit more pop (?) but also more noise/granularity (if you just look in and near the nebula at the dark space/cloud areas, ie ignoring the untrimmed edges from the APP mosaic since DSS clipped those off automatically).

So unless I can somehow get SiriL to work wonders, its probably looking like new hardware - joy 😕

Cheers,

Vin

dannybgoode · December 23, 2021

I’m not a Mac person however these M1 chips are seriously tempting me! My workstation has serious grunt but also eats electricity and whilst I’d need it for gaming I quite like the thought of a Mini for image processing and the like.

Much more efficient that a hulking great PC…

Ags · December 23, 2021

Just completed my first image in APP - 620 16MB (6 second) frames, took about 12 hours on a fast laptop. What worries me is I want to increase the frame count by a factor of 10!

vineyard · December 24, 2021

Crikey @Ags - 12 hrs!

Well I tried Siril last night. Downloaded the latest version & found a tutorial on the OSC processing script. Once I realised the directory structure I had to set up, moved the files across & pressed go. The image below is what came out (again just ABE & HT in PI, then 2x downsampled & PNG'd). There still seems more graininess (noise?) in the background vs APP (which returns the cleanest). But the stars are the best in the Siril image - less blocky than APP & much less blocky than DSS.

The whole thing (193 lights) took c 1h22 or so. That's not quite a LFL comparison b/c (a) it wouldn't accept the calibration master files (said s/t about the channels not matching the lights) & so I had to load all the individual calibration files & so the 1h22 includes the time for stacking those as well, and (b) my virtual windows machine decided to auto-update at one stage so that ate up some resources until I shut that.

An ideal scenario would be to combine the clean background of APP w the better stars of Siril. Would that be possible w changing Siril settings somehow (I just used the script). Also nice would be an ability to weed out individual lights in Siril (eg: while I use Blink in PI to weed out obviously bad lights, APP's workflow also allows further weeding once quality scores & shape coefficients are calculated). I guess I need to start learning a bit more about Siril's tweakability now.

Or else bite the bullet on a Windows mini-PC (cheaper than a Mac mini) just for processing (after all EKOS can work on Windows 10 too...).

Cheers & happy christmas everyone.

gilesco · December 24, 2021

@vineyard I think longer subs would help you here. These were 60s exposures? Perhaps try some small batches of 120s, 180s, 300s and compare the initial results of stacking those.

Martin Meredith · December 24, 2021

Its very interesting to see such differences in timing. I've not used any of these pieces of software myself but it would be useful to know how much time they spend on each stage of processing (might be in the logs). In my own experience coding this stuff it is the star extraction that is most compute-intensive. Registration can be pretty fast. Another issue might be whether the app needs access to all subs simultaneously (e.g. for sigma clipped stack combination) or not. Knowing some of these things will point to an optimisation strategy. E.g. if star extraction is the hold-up it won't be worth processing in batches of subs and then processing the resulting stacks, but if all subs needed to be loaded at once then the batch strategy could work.

Martin

gilesco · December 24, 2021

For APP, if you are doing multi-channel and/or multi-session, then I believe the recommended method is to stack each channel and each session separately and then stack the resulting stacks again, before finally combining the channels for your LRGBHaOIIISII results. The most recent versions of APP lead you in this direction as they ask for a session number when importing your subs.

If doing a mosaic then the recommended way is to stack each pane of the mosaic as above for each channel, register each mosaic panel, integrate the mosaics, and then conmbine LRGBHaOIIISII channels.

If you just load all the frames and try the basic workflow then you can quickly exponentially increase the time it takes, especially if you enable things like LNC, MBB etc... The APP forums have better information about this than I can probably put in to words.

vlaiv · December 24, 2021

2 hours ago, Martin Meredith said:

Its very interesting to see such differences in timing. I've not used any of these pieces of software myself but it would be useful to know how much time they spend on each stage of processing (might be in the logs). In my own experience coding this stuff it is the star extraction that is most compute-intensive. Registration can be pretty fast. Another issue might be whether the app needs access to all subs simultaneously (e.g. for sigma clipped stack combination) or not. Knowing some of these things will point to an optimisation strategy. E.g. if star extraction is the hold-up it won't be worth processing in batches of subs and then processing the resulting stacks, but if all subs needed to be loaded at once then the batch strategy could work.

Martin

For APP, I think that bottleneck is memory management under Java, or rather automatic memory management on what are in general very memory intensive algorithms.

@vineyard

Differences in results will be due to algorithms used. Choice of interpolation algorithm can have significant impact on result. Bilinear interpolation will reduce noise but it will also create softer stars - it is like adding a bit of blur to the image. Sophisticated interpolation algorithms add the least blur to the image - which results in best stars but also the least noise reduction due to said blur.

Another thing is choice of stacking algorithm and normalization method. For example - straight sum / average does not need normalization to be performed and is best for comparison purposes. Sigma reject needs normalized frames and depending on type of normalization and often changing light pollution gradient - it can reject too much data. In order to most efficient - normalization used needs to deal with LP gradients and align them.

Choice of stacking method will also have significant impact on performance of software. Straight sum / average should be the fastest, and sigma reject can incur quite high cost on memory management - as one needs complete set of samples for resulting pixel value. Here some clever optimizations can work miracles (like not trying to stack whole image at once but doing it scan line, or several of them at the time - depending on available memory).

With Java software there is interesting optimization that is often used when programming games (that need performance) - that is leaving memory allocated and then reusing buffers as to minimize memory management overhead.

vineyard · December 24, 2021

Thanks @vlaiv. So theoretically could you use one algorithm for the background (eg: bilinear) & then extract stars. And use a different algorithm for the stars, extract those & then combine the best of both worlds? Sounds v faffy, but if that works, then could this be written as a scripted process. If Siril takes 1/4 of the time of APP then you could run those two different stacks & have the best of both & still be done way before APP.

(I wonder if APP creators are aware of this difference in speed & can re-engineer the software - its got to be a competitive disadvantage for them?)

Btw, are there any lectures or videos which explain the intuitive logic between these different stacking algorithms? My maths is too rusty to understand the algebra, but if there were any intuitive explanations, it would be nice to understand the rough logic of the different approaches & settings!

Cheers & happy Christmas!

vlaiv · December 24, 2021

2 minutes ago, vineyard said:

Thanks @vlaiv. So theoretically could you use one algorithm for the background (eg: bilinear) & then extract stars. And use a different algorithm for the stars, extract those & then combine the best of both worlds? Sounds v faffy, but if that works, then could this be written as a scripted process.

I theory you could do that - but it's not stars only - everything gets a bit blurred by bilinear interpolation - that means stars are a bit fatter but also a bit of detail in target (galaxy, nebula) is lost.

In my view - it is best to use advanced interpolation algorithms that produce sharpest results. Denoising can later be employed selectively to deal with noise.

5 minutes ago, vineyard said:

(I wonder if APP creators are aware of this difference in speed & can re-engineer the software - its got to be a competitive disadvantage for them?)

To be honest - I have no idea. I guess people that were once interested in high performance software are aware of different optimization techniques but software developers in general might not be.

5 minutes ago, vineyard said:

Btw, are there any lectures or videos which explain the intuitive logic between these different stacking algorithms? My maths is too rusty to understand the algebra, but if there were any intuitive explanations, it would be nice to understand the rough logic of the different approaches & settings!

Not that I'm aware, but you can see some differences in post on related topic I made some time ago (it deals with types of interpolation when aligning images):

Sign In

Best ways of optimising processing?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Important Information