Handling and Archiving Large Datasets for Later Reuse

coatesg · May 24, 2018

Not quite sure what to put in the title here! I've been thinking about my future possible approaches to deep-sky imaging, especially looking at Emil Kraaikamp's approach in taking thousands of 1 sec images. The datasets this will produce are obviously huge - a single frame from my QHY163M is ~31.2MB in size. Even at 5 sec subs, the volumes of data are 22+GB/hr.

Now, that's fine in theory to process (though it'll take a *long* time to chew through them, even on a 8-core i7!) - sufficient diskspace is practical, and I'd scrub the unreduced files once happy anyhow. However - long term storage is an issue. I couldn't keep all the subs here, even with poor UK weather, you're potentially looking at tens of terabytes of data in relatively short term, worse when backing it all up.

Is there a feasible approach to keeping only stacked, reduced data (which is obviously much smaller) whereby you can still add to the data at a later point in time? I was thinking along the lines of:

Take x hrs - save reduced, stacked data as a single linear frame.
Later on (possibly years later!), take another y hrs of images - somehow combine with the x hrs before?
(and repeat as needed)

Could this be achieved and still allowing relevant pixel rejection, weightings, noise reduction, etc? Anyone have a PI workflow that allows this? I guess the same approach applies to standard CCD data, though the data savings are orders of magnitude less...

Thanks!

rwilkey · May 24, 2018

Hi Coatesg, you can get some quite large drives for a reasonable price these days. I recently bought an 8TB drive from Argos online, and had it delivered, not only could you specify the delivery date within a three hour time-slot but the delivery driver phoned me 15 minutes before to say he was nearby. Now that's service for you. The other thing is the drive (a Seagate 8TB Back Up Plus Desktop Hard Drive with USB Hub £189) had a couple of spare USB ports which gave you a gain of one (type 2.0) as the drive connection was USB 3.0 to 2.0 anyway.

coatesg · May 24, 2018

Throwing storage at it is one option, but having a massive storage drive also means having a massive backup drive. I suppose 6TB would give space for probably enough for a while, but still a bit of a cost.

I'd be interested if there is a way of using part stacks and adding without having to go right back to the individual subs, though struggling to see how pixel rejection would work here.

kirkster501 · May 28, 2018

A lot of people keep the *calibrated lights* from a run and once they have done the preprocessing to get those they delete the flats and original subs. I don't quite have the confidence to do that just yet at this early-ish stage of my AP career and I keep everything for now. But by keeping the calibrated lights only, filter by filter you can organise and reduce your datasets and chuck out a lot of junk.

M51 > 28 May > ATIK > Luminance > Sub1, sub2 etc

And because I have CCD's where I can control the sensor temperature I use master BIAS and dark at each binning level. So no need to do those every session and you can spare the gigabytes needed for those too. But be sure to refresh the BIAS and dark every couple of months (which I have forgotten to do). My new rule is that at the start of ever other month (Jan/March/May etc redo the darks and BIAS - 30 darks and 100 BIAS to create the masters.

As Robin and Graeme have said, you can buy huge levels of storage nowadays cheaply. I remember my first Western Digital Hard drive that was 40 megabytes in size and cost £450 in 1991..... You can now get a 8 TERAbytes - 200 thousand times more - for £150 or so.

Hope my experience/rant/lecture helps! ?

ollypenrice · May 28, 2018

On 24/05/2018 at 18:47, coatesg said:

I'd be interested if there is a way of using part stacks and adding without having to go right back to the individual subs, though struggling to see how pixel rejection would work here.

There is a way, generally known as 'stacking the stacks.' I do this routinely and it works well. There might be a small advantage in terms of the sigma rejection algorithms in restarting with 20 subs rather than stacking a pair of ten-sub stacks together but it doesn't seem to be much, from experience. If I were only stacking a small number of subs I'd want to do the whole lot on one pass but not if I had a huge number. With the number of subs in the stacks you propose to take I very much doubt that there would be any significant difference between, say, a 400 stack from scratch and a 200 stack combined with a 200 stack. I'm not a mathematician but I'd have thought that the sigma clipping would bring diminishing returns. If this is mathematically incorrect I'll happily stand corrected.

There are various ways to weight linear stacks of unequal sub count. Registar, for instance, allows you to choose a weighting for any images you combine in it so if Stack 'a' had ten subs and stack 'b' had twenty you'd weight them two to one in favour of 'b.'

Because I often find myself, as an imaging provider, with several datasets on the same target I sometimes take the shortcut of combining processed images. Clearly this isn't ideal but, since most such sets have a good depth of data anyway, it lets me make a super-clean set very painlessly. I do this in Ps by putting one on top of the other in Layers and zooming in close to look at the noise while varying the opacity of the top layer. I just judge the point of lowes noise by eye, flatten and then stretch the faint stuff a bit harder.

You could adapt this method for linear stacks quite quickly, too. Make copies of each stack and then give the first one a hefty log stretch in Curves and bring in the black point, noting the values of both actions. The stretch needs to get you into the visible noise. Do exactly the same stretch and black point reset to the second one. Paste one onto the other in Layers and adjust the opacity till, observing very closely - say 200% - you get the lowest noise. Whatever the opacity of the top layer at this point should be its weighting when you stack the linear stacks together. You can now discard the stretched copies noting just the optimal weighting. Maybe Pixinsight can do this automatically when stacking stacks. I don't use it for stacking so can't be sure.

By the way, AstroArt is the fastest stacking programme of which I'm aware. This might be of interest to you on your project. You can download a free 'no-save' trial version to see if it helps.

Olly

Xiga · June 6, 2018

Be careful with extremely large stacks. I think once you get to a certain number Fixed Pattern Noise starts to rear it's ugly head, even with regular and aggressive dithering. From memory i think someone did some analysis of this over on CN. I can't recall the number of subs were it became apparent (i think it was in the hundreds though), but certainly if you were going down the route of using extremely short subs such as 1 or 5s, then i suspect you would run into this problem.

Sign In

Handling and Archiving Large Datasets for Later Reuse

Recommended Posts

coatesg

Link to comment

Share on other sites

rwilkey

Link to comment

Share on other sites

coatesg

Link to comment

Share on other sites

kirkster501

Link to comment

Share on other sites

ollypenrice

Link to comment

Share on other sites

Xiga

Link to comment

Share on other sites

Archived

Recently Browsing 0 members

Similar Content

Noob assistance required!

Closeup of Moon 20231124-23h16m25s-lucky_imaging

M31 Andromeda retake

Improve my laptop

Star reduction via Siril using starnet

Browse

Activity

Resources

Important Information