Jump to content

Banner.jpg.39bf5bb2e6bf87794d3e2a4b88f26f1b.jpg

ATIK OSX Drivers R1.00 & Example App


NickK
 Share

Recommended Posts

Looks good James :D

At the moment I'm 200% at work (covering for someone) and DIY.. so zero progress with anything astro related! At some point I may have a chance to get out and do something with the scope...

Link to comment
Share on other sites

I have a 'free' weekend this weekend, so apart from building the garden pier former.. I will be working on the drivers and ExampleApp :D

I've had a report, via Craig Stark, that there are issues with the FTDI ChipID startup of the 16IC Colour.. I'm currently working direct with the guy to get a solution.

I have a couple of problem reports with TheSkyX - long story short, around the use of the ATIK One filter wheel - that will get progressed.

I also want to get the pipeline stacking GPU-based 'alpha' Example App out the door - it's not perfect but I've had a forced break in progress (lots of reasons) and now feeling a little refreshed.. I also have a couple of days of vacation coming up that (apart from the concrete) should see some additional coding..

Now is definitely the time if you have any issues to highlight them!

Edited by NickK
Link to comment
Share on other sites

I've had a report, via Craig Stark, that there are issues with the FTDI ChipID startup of the 16IC Colour.. I'm currently working direct with the guy to get a solution.

Seems that the serial number printed on the back label differs from the serial number reported by the camera.. hence the FTDIChipId needs to be registered with the camera reported serial number. This can be found in the brackets in the ExampleApp Live View HID window title.

So if you plug the camera in - look at the window, then quit, disconnect the camera and setup the preferences as the camera serial number, for example "A3001QH8" (the label serial number has all numbers), and your FTDIChipID, close the window and restart the ExampleApp - then it will find a match..

Hands up who has multiple legacy cameras of the same serial number?

Edited by NickK
Link to comment
Share on other sites

Hi Nick,

First of all, thanks for all of the effort you're putting into making Atiks available to us Mac fanboys!

I tried using the latest example app, but it crashes immediately after executing it. I'm using OS X 10.9.3 with an Atik 314L+ on a Mid-2012 15" Macbook Pro Retina. If you need any further details I'm happy to provide them.

Lee - I haven't received an email or exception report, if you could provide that I can look into your issue. I know there were some issues with the OpenGL/CL implementation in the early 10.9.x releases.

The 383L seems to be functioning with R1.30.. and with the latest (unreleased) ExampleApp on 10.9.5. I've had no indications of issues from the drivers themselves from Nebulosity (with the new drivers in) and TheSkyX testing for 3xx or 4xx. The only issues are todo with the old legacy ART 289 and the ATIK One's filter wheel in TheSkyX which will be resolved shortly (EFW2 works fine)..

Edited by NickK
Link to comment
Share on other sites

Updating to Yosemite.. I hope that Xcode will take the SDK for 10.8 as it appears to only have 10.9 & 10.10.. 

I'm hoping for a GPU driver update and some fixes..

Edited by NickK
Link to comment
Share on other sites

Michael has create a Richardson-Lucy iterative sharpening: http://stargazerslounge.com/topic/228050-home-brew-software-for-lr-deconvolution/ using gaussian distributions.

The fun thing is that RL is not far from the system i have (mine uses a generated airy disk rather than a gaussian as the sf parameter) so I've decided to add a quick RL into the mix. It's a simple process, works well with GPUs and is a matter of small amount of coding.. I may have this done by tonight :D

Link to comment
Share on other sites

Initial Lucy Richardson on the GPU implementation seems to return 17.1MB image in 0.4 seconds for 50 iterations using a global memory lookup. Faster optimisations would be (a) local workgroup memory and performing the kernel looping in GPU rather than loop in CPU space. Now I have to dash to get the rebar for my astro mount...

Edited by NickK
Link to comment
Share on other sites

Initial Lucy Richardson on the GPU implementation seems to return 17.1MB image in 0.4 seconds for 50 iterations using a global memory lookup. Faster optimisations would be (a) local workgroup memory and performing the kernel looping in GPU rather than loop in CPU space. Now I have to dash to get the rebar for my astro mount...

Now the problem is double is supported by OpenCL.. but not AMD GPUs before late dec 2011 (i.e. 69xxx series).. so only the newest macs laptops are likely to support.. LR has double for error estimation and so single precision can be used.. but it results in a more coarse error correction. 

So the approach I've taken is todo a small adjustment (as I've done previous with the PSF deconv).

edit: I've also found my OpenCL compilation was set to 's' (smallest) rather than fast optimisation..

Edited by NickK
Link to comment
Share on other sites

  • 2 weeks later...

R1.35 Drivers OSX 10.10

Modern driver framework: ATIKOSXDrivers.framework.zip

Legacy driver framework (requires modern): ATIKOSXLegacyDrivers.framework.zip

Release notes over the last few changes.. main one is for the legacy driver and OS X 10.10 build. These are working with the 383L, Titan etc on the ExampleApp.

// 1.31 Richard - Bugfix for filterwheel - identity read from class, FW status update requested on position() and status() calls. State sent through

// 1.32 Richard - erroneous NSLog in console

// 1.33 OSX 10.9.4 build & release

// 1.34 Legacy drivers - if ftdi chipid fails then the driver will bypass the download loop (as the camera will not provide an image). Also enhanced the log file

//      to warn of the issue if the FTDI ChipID fails.

//      Added 16IC-C colour image for icons in the ExampleApp.

//      ExampleApp deallocation check added for textures (stops crash on startup).

// 1.35 Driver fix IC24 honours ExampleApp's request for FULL precharge, driver should ignore as IPCS

I've had to nuke my OSX 10.6.8 build after having to rebuild the mini with a MS Office bootcamp (for job hunting), it has a OS X 10.10 install but I don't think I can partition again so I'll need to make a bootable external drive for 10.6.8 support.. 

Link to comment
Share on other sites

The good news also is that I have my lucy-richardson working nicely.. a little slower than previously estimated (using FFT, so it's circular rather than linear). Although not using double.. it's still returning a reasonable result.

What I may do is make a LR mode that is: 1. take x images, align, stack, then 2. LR and display. It should be possible to switch between the non-LR and LR from the last processing. Optimisations are (a) remove all the texture clears that are done to provide a air of stability with the FFT library.. ( B) optionally use a 1D PSF for a fully circular PSF then store in local memory..

Using the titan size image:

2014-11-06 18:45:32.088 ExampleApplication[3948:47690] lr start2014-11-06 18:45:32.088 ExampleApplication[3948:47690] LR iteration #0..2014-11-06 18:45:33.152 ExampleApplication[3948:47690] LR iteration #992014-11-06 18:45:33.163 ExampleApplication[3948:47690] lr end

So.. I think that could be sped up relatively easy.. especially as it's not particularly optimal atm.

Edited by NickK
Link to comment
Share on other sites

Update, two things are going on.. in advance of my week of holiday post job redundancy (final day is monday)..

1. I have started an initial IIR deconvolution using psf based routine - the benefit of this form is speed although there will be a little time for pre-processing the PSF.. but during imaging it should reach realtime speeds as there's no FFT operations in the iterations as there are now.

2. I have also started to move the pipeline into a more manageable portable C++/OpenCL form as Apple's own GCD 'help the developer' mechanisms are causing more problems than they solve. I'll also be making it easier still to add and manage functionality to the pipeline.

I hope to get these complete next week..

Edited by NickK
Link to comment
Share on other sites

Have an evening in tonight - I hope to get the first step of the OpenCL/C++ migration and testing complete.. it's annoying that OpenCL is quite rudimentary in it's host code integration.. GCD is easier but has issues for interworking with OpenCL.. getting there. Once that testing is done I'll continue the migration. It will make a nice portable library :D

Managed to finish the major part of my telescope pier so I hope that having easier access (after a bit more work) should make it easier (and faster) to setup after work. Setting up the tripod and aligning kills 30 minutes..

post-9952-0-33085600-1416420513_thumb.jp
Edited by NickK
Link to comment
Share on other sites

  • 2 weeks later...

Well it appears I've found a bug in AMD's OS X runtime they can recreate it.. after almost 4 months from originally reporting a problem  :D

So fingers crossed it gets fixed.. then apple then update their drivers.. their suggestion is to move to windows/linux as it's likely to be faster..

In the meantime I can continue with the OpenCL migration :D

Edited by NickK
Link to comment
Share on other sites

So.. the pure OpenCL pipeline lives.. still needs more work to add some of the other features and then setup some HID control..

post-9952-0-15100300-1417887587_thumb.pn

It seems nice and fast :) with the added advantage this will work on any platform - windows/linux or OS X..

Link to comment
Share on other sites

I got the LR FFT implementation migrated over last night... this is the slower version of LR (IIR being faster) - taking about 1 second for 100 iterations previously.

I'm now starting on the IIR version of the LR. This will be a little more complex as it will work out the coefficients by attempting to fit to a given PSF using a iterative brute force method at the start - using the horsepower of the GPU. Once that's done it should be obscenely quick in applying the filter iterations. Given the LR FFT performance, this shouldn't take too much time..

The beauty about applying IIR LR is that you can use local GPU memory to cache the values, so that's one global fetch and store for each pixel. Most GPUs have 64K local memory for the workgroup.

The new open pipeline is easy to program - although currently the new texture system is showing I'm using 53 textures ( :shocked:) but although it can cache all the textures are set global because I've not added a temporary texture cache.. once that's done it will be able to reassign textures. Idea here is that a pattern will allow the pipeline to discover the texture use by using an offline mechanism first which will allow the system to schedule texture assignments to minimise the memory used - just like a compiler and CPU registers.

Other ideas I'm considering (but some will not be realtime!):

* motion filtering - using optic flow to effectively give a localised quality filter - where large shifts are masked out. 

* image motion blur detection and deconvolution - effectively rebuilding the image based on detected whole screen movement.

* 3D stack analysis - hehehe ;)

* mesh based interpolation - allowing the system to remove curvature.

* attempt to get a bessel GPU coded..

Also I've decided to open source it ... probably on git hub :)

Edited by NickK
  • Like 1
Link to comment
Share on other sites

Just an example of the 'slow' part of the IIR algorithm :D I'm just attempting a simple 1D version atm.

IIR requires identifying the coefficients for a filter. This is done once and afterwards the image is processed using the coefficients and is faster than identifying the coefficients.

To create a custom 1D filter with 8 poles (i.e. eight a and eight b coefficients) with 100 iterations and a large 2000+ point fft takes... unoptimised.. 2.048 seconds ... how I love GPUs :D For now I'm just cheating for processing the image by applying the 1D in X, then Y and then combining.. the step after will be a proper 2D affair that allows non-symetric.

In the past it used to take minutes to fit a custom filter using pentium CPUs. How times have changed!

Now LR in 2D with a 2D PSF should mean that you need to perform a 2D fitting.. the system is similar but takes longer due to the size (I'm guessing). However things should be as rapid in applying too..

Edit - adding the timings for 100 iterations on an image.

So for 2x1D combine 'cheat' .. 100 iterations takes 0.2552 seconds each frame without any optimisation (code or even the OpenCL is set to debug).. did I say I love GPUs :D


I: 5485db36.eecc6 : IIR design start

I: 5485db39.4802 : IIR design end
I: 5485db39.4a2a6 : IIR start apply
I: 5485db39.4b08a : IIR end apply
I: 5485db49.204e6 : IIR start apply
I: 5485db49.20c7f : IIR end apply
I: 5485db58.ee333 : IIR start apply
I: 5485db58.eeff0 : IIR end apply
I: 5485db68.c4c38 : IIR start apply
I: 5485db68.c5759 : IIR end apply
I: 5485db78.a086c : IIR start apply
I: 5485db78.a1099 : IIR end apply
I: 5485db88.723be : IIR start apply
I: 5485db88.72c6d : IIR end apply
I: 5485db98.4c86c : IIR start apply
I: 5485db98.4d264 : IIR end apply
Edited by NickK
Link to comment
Share on other sites

So.. update on the IIR deconvolution work - the FFT based LR you've seen above...

I've now got an 8 pole IIR filter fitting it's curve against a airy disc point spread function (set larger than it should so it's not a couple of pixels!). I've had to switch to CPU to get this working.. but it takes a few seconds for 400 iterations to get fit. 

I've been looking at a 'cheat' IIR apply using a two 1D and applying to a 2D image, just so happens I've managed to make a interesting edge detection routine using the standard solar image:

post-9952-0-24108100-1418249218_thumb.pn

There's a 'real' 2D image way of doing this.. and I'll probably look at that shortly - effectively having a set of coefficients for the psf 2D image.. rather than using 1D.

Still a way togo but that's a start.

That filter would make an excellent solar guider!

Edited by NickK
Link to comment
Share on other sites

Hmm.. time for bed I think..

However I think .. not that I'm 100% sure at the moment that the 1D cheat is actually returning some detail. Here's two images - the original sub and the pipeline LR output atm after 30 iterations:

post-9952-0-70252400-1418253137.png

The one on the front (left) is the pipeline LR output.. the one on the right is the original sub. I'm not convinced yet.. as it may be down to brightness.. but I am using a PSF that's a few pixels wide.

Link to comment
Share on other sites

Hmm.. will code proper 2D version and see.. but this is output form 20 iterations. Original sub left, pipeline IIR LR right.

post-9952-0-42336600-1418255801_thumb.pn

I think there is some additional push of detail.. although there's obvious instability ringing in the way I'm applying it.. thinking about it I think I know what the problem is.. will have to investigate further tomorrow. Also this is using a 653nm psf.. but it's setup for my pentax and not ally's lunt 60 thinking about it more..

Edited by NickK
Link to comment
Share on other sites

So I've spent the day researching 2D IIR ... or should that be 2D OMG. I'm just starting to get to grips with z-transforms (i.e. appreciating there's a level of understanding required!)..

1D IIR is fairly straight forward. With 1D you have a couple of options:

option a. do a symmetrical PSF using interpolation/extrapolation of the kernel around the pixel based on radius of the gather pixel from the target pixel.

option b. use the fact that association is A*B*C etc so if you convolute the 1D horizontal and vertical you get a rough 'cheat' 2D implementation - the limitation here is that 1D means you're really being quite basic as you're only looking at top,bottom,left and right.

Fitting a 1D curve is easier.  Simply use the difference between the FFT (FFT deconvolution theory) to find the error - then brute force shaping results in a few seconds to fit a curve in 400 fitting iterations.

The beauty with the above methods is that you don't need to access a kernel, effectively removing memory reads for that. Using local memory in the GPU means that referencing the output and input can be cached - resulting in a faster algorithm.  The local memory becomes a sliding window..

2D IIR with non-symetrical deconvolution is a challenge (understatement) - from estimating through to applying.

1. Estimating - I'm looking a a FIR/IIR approach to estimate a 2D deconvolution kernel with Shur decomposition to break things down into 1D FIR.. then approximate each FIR with an IIR..

2. Applying - the resulting application for IIR would be essentially the same as the 1D - applying X then Y convolutions.

This was the state of the 1D 'cheat' (i.e. using just two directions):

post-9952-0-37568200-1418317799_thumb.pn

It shows some promise - I know the PSF is not quite correct as it should be centred around (0,0). Still needs some polish.. (this is just a single, large delta, iteration).

So tempted to implement the radius interpolation form too - this is likely to be a little slower but at the same time probably return a better image.

Edited by NickK
Link to comment
Share on other sites

So option b (using two 1D) is possible as it's a seperable projection of the 2D symmetrical PSF allowing a 1D PSF kernel to be used for both dimensions and then combined:

post-9952-0-27809300-1418330722.png

Single frame, with a bit of a stretch. This looks more like it's working now :)

So still considering the radial and researching the 2D non-symetric form.

Edited by NickK
Link to comment
Share on other sites

Ok (sorry if anyone takes offence here) - this is my LR IIR 2x1D IIR using a 'known picture':

post-9952-0-48582200-1418565420_thumb.pn

From left to right:

L = deconvolution using an airy disc

M = original image

R = convolution

So this looks like it's working.. now to refit the mechanism back to GPU - which shouldn't take long..

Edited by NickK
Link to comment
Share on other sites

A slow implementation (no optimisation) in GPU of a two 1D projection (same application) - IIR pass for the lenna picture takes 47 microseconds. (i.e. 47x10E-6 = 0.00047 seconds).

Edited by NickK
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue. By using this site, you agree to our Terms of Use.