1 : 
jhr 
233 
%!TEX root = paper.tex

2 : 


%

3 : 



4 : 


\section{A DSL for image analysis}

5 : 


\label{sec:design}

6 : 



7 : 


One of the major contributions of this research will be the design of a domainspecific

8 : 


language (DSL), which we call \lang{}, for imageanalysis algorithms.

9 : 


The goals of this design are as follows:

10 : 


\begin{enumerate}

11 : 


\item

12 : 


Abstract away from the discrete nature of the image data and present the programmer

13 : 


with a continuous model of the image.

14 : 


\item

15 : 


Support the programming of imageanalysis algorithms using the mathematical

16 : 


properties of the continuous data and its derived attributes.

17 : 


\item

18 : 


Expose the inherent parallelism of the algorithms, while abstracting away from the

19 : 


details of how computations are mapped onto parallel hardware.

20 : 


\end{enumerate}%

21 : 



22 : 


In the remainder of this section, we describe our preliminary thoughts about what the programming

23 : 


model of this language will be.

24 : 


It is important to understand that this discussion represents very preliminary ideas

25 : 


about the design space.

26 : 


In particular, there is a tradeoff between the flexibility provided by lowerlevel

27 : 


generalpurpose language constructs and the expressiveness advantages of more abstract

28 : 


higherlevel mechanisms.

29 : 


In the discussion below, we have biased the design toward more generalpurpose

30 : 


constructs, but with the expectation that the design will get higher level as

31 : 


it evolves.

32 : 



33 : 


\subsection{Design sketch}

34 : 


In this section we sketch a preliminary design for \lang{} using the examples from

35 : 


the previous section as motivation.

36 : 



37 : 


A \lang{} program has three logical parts.

38 : 


The first are global declarations of the datasets and fields that the

39 : 


analysis algorithm computes over, as well as other global variables that

40 : 


control the analysis.

41 : 


The datasets and global variables (but not the derived fields) are the

42 : 


inputs to the program.

43 : 


The second part defines \emph{actors}, which are independent computational

44 : 


agents that define the algorithm.

45 : 


The third part specifies the coordination of the actors  how they interact and

46 : 


what the global termination conditions are.

47 : 



48 : 


\subsubsection{Types}

49 : 


\lang{} is a staticallytyped language.

50 : 


It has the usual scalar types, including booleans, integer types of various sizes,

51 : 


and single and doubleprecision floatingpoint values.

52 : 


Similar to graphicsshader languages~\cite{openglshading:3ed} and

53 : 


OpenCL~\cite{openclspecv1}, it also includes smalldimension vector and matrix types.

54 : 


In addition to these basic computational types, \lang{} also includes \emph{datasets},

55 : 


\emph{fields}, and \emph{actor} types, which are discussed below.

56 : 


Our initial design does not include userdefined data structures, but we expect to

57 : 


add them at some point in the language evolution.

58 : 



59 : 


\subsubsection{Datasets}

60 : 


An imageanalysis program begins with the image data that is to be analyzed.

61 : 


In \lang{}, the image data is represented by a dataset, which is a

62 : 


multidimensional array of scalars, vectors, or tensors.

63 : 


These data sets come from scientific or medical imaging instruments.

64 : 


We provide a simple form for declaring a data set.

65 : 


For example, the following declares \texttt{data} to be a threedimensional

66 : 


array of floatingpoint values.

67 : 


\begin{quote}\begin{lstlisting}

68 : 


dataset float[][][] data;

69 : 


\end{lstlisting}\end{quote}%

70 : 


Datasets also come with their own coordinate system, which defines how

71 : 


integer indices map to world coordinates.

72 : 



73 : 


Datasets are not limited to arrays of scalars; they may also have vector or

74 : 


tensor elements.

75 : 


On disk, datasets are stored using the NRRD (Nearly Raw Raster Data) format~\cite{nrrd},

76 : 


which supports multidimensional arrays of multidimensional data, such as those produced

77 : 


by modern imaging instruments.

78 : 



79 : 


\subsubsection{Fields and kernels}

80 : 


As discussed in \secref{sec:image}, we are interested in image analysis algorithms that

81 : 


work with the continuous field that is reconstructed from discrete data.

82 : 


Our DSL supports these algorithms by including continuous fields as a primitive notion.

83 : 


Fields can be defined by applying a convolution kernel to a dataset.

84 : 


For example, here we define a threedimensional field generated from the dataset \text{data}

85 : 


using a convolution kernel named ``\texttt{cubic}''

86 : 


\begin{quote}\begin{lstlisting}

87 : 


field F = cubic (data);

88 : 


\end{lstlisting}\end{quote}%

89 : 


The dimension of the field is inferred from the dimension of the dataset.

90 : 


In our initial design, we expect to provide a predefined set of convolution kernels

91 : 


(essentially those already supported by Teem~\cite{teem}).

92 : 



93 : 


The main operation of fields is the \emph{probe}, which extracts the value at some

94 : 


location.

95 : 


This operation is similar to indexing an array, except that the indices are

96 : 


floatingpoint values.

97 : 


For example,

98 : 


\begin{quote}\begin{lstlisting}

99 : 


F@(x, y, z)

100 : 


\end{lstlisting}\end{quote}%

101 : 


evaluates to the scalar value of the field \texttt{F} at the coordinate $(\mathtt{x},\mathtt{y},\mathtt{z})$.

102 : 


We can also define new fields by applying operators to them.

103 : 


For example,

104 : 


\begin{quote}\begin{lstlisting}

105 : 


(D (D F)) @ v

106 : 


\end{lstlisting}\end{quote}%

107 : 


probes the Hessian field derived by taking the second derivative of \texttt{F} at the

108 : 


location specified by the vector \texttt{v}.

109 : 


An important feature of the design is that operations, such as $\mathrm{FA}$

110 : 


(fractional anisotropy of a tensor), are treated as

111 : 


higherorder operators that map fields to fields (\ie{}, a field of tensors to a field of

112 : 


scalars).

113 : 


% implies other fields:

114 : 


% D F  gradient of F (has type float[3])

115 : 


% D (D F)  Hessian of F (has type float[3][3])

116 : 


%

117 : 


% QUESTION: is the domain of a field always in the interval [0..1] (for each dimension)

118 : 


% or should we allow the programmer to specify the domain?

119 : 


% ANSWER: domain is part of nrrd file format.

120 : 



121 : 


\subsection{Other global definitions}

122 : 


In addition to datasets and fields, a \lang{} program may have other global

123 : 


definitions.

124 : 


These include auxiliary functions, such as the color composition and transfer

125 : 


functions for directvolume rendering (\secref{sec:dvr}),

126 : 


and program inputs, which are denoted using the \kw{in} keyword:

127 : 


\begin{quote}\begin{lstlisting}

128 : 


in float timeStep; // simulation timestep

129 : 


\end{lstlisting}\end{quote}%

130 : 


Other examples of globals include constants, such as the matrix that maps

131 : 


pixel positions to imageplane positions in directvolume rendering.

132 : 


Note that global variables are immutable; \lang{} does not have an explicit notion

133 : 


of global state.

134 : 



135 : 


\subsubsection{Actors}

136 : 


Actors are an abstraction of the computations that the image analysis performs.

137 : 


Examples include the rays in directvolume rendering (\secref{sec:dvr}),

138 : 


the fibers in fiber tractography (\secref{sec:fibers}), and the

139 : 


particles in a particle system (\secref{sec:particles}).

140 : 


A given program may have multiple types of actors, but we expect that most algorithms

141 : 


will involve only a single actor definition.

142 : 



143 : 


An actor definition is similar to a class declaration in an objectoriented

144 : 


language, and includes declarations of its local state, initialization,

145 : 


and an \kw{update} method.

146 : 


The \kw{update} methods of actors are written using a simple Cstyle language that is

147 : 


similar to the graphics shader language GLSL~\cite{openglshading:3ed}.

148 : 


For example, the direct volume rendering algorithm (Algorithm~\ref{alg:volrend})

149 : 


is implemented as the following actor definition:

150 : 


\begin{quote}\begin{lstlisting}

151 : 


actor Ray (vec2i p)

152 : 


{

153 : 


// map pixel location to 3D position in image plane

154 : 


vec3f pos = ToImagePlane * ((float)p.x, (float)p.y, 1.0);

155 : 


// ray direction

156 : 


vec3f dir = normalize(pos  eyePosition);

157 : 


float t = 0.0;

158 : 


vec4f color = initialColor;

159 : 



160 : 


update () {

161 : 


vec3f newPos = pos + t*dir;

162 : 


if (newPos.z < farClip) {

163 : 


color = compose(color, transferFn(F@newPos));

164 : 


if (color.a == 1.0)

165 : 


stable; // color is opaque, so we are done

166 : 


else

167 : 


t += timeStep;

168 : 


}

169 : 


else

170 : 


stable;

171 : 


}

172 : 


}

173 : 


\end{lstlisting}\end{quote}%

174 : 


In this example, the \emph{instance variables} \texttt{pos}, \texttt{dir},

175 : 


\texttt{t}, and \texttt{color} comprise the local state of an actor.

176 : 


The \kw{update} method evolves this state until the actor reaches an

177 : 


\emph{stable} state (denoted by executing the \kw{stable} statement).

178 : 


An actor may also declare itself dead using the \kw{die} statement and actors may

179 : 


create new actors using the \kw{new} statement.

180 : 



181 : 


For some algorithms, such as those described in \secref{sec:particles}, it is

182 : 


necessary for actors to examine the state of nearby actors.

183 : 


This behavior has two aspects: the definition of an actor's \emph{neighborhood},

184 : 


which is defined declaratively in the globalcoordination part of the program (see below),

185 : 


and the computation on neighbors.

186 : 


An actor's neighborhood can be represented as array of actors that are passed

187 : 


as a parameter to the \kw{update} method.

188 : 


The \kw{update} method can query, but not modify, the instance variables of these actors.

189 : 


% actor states: active, stable, dead

190 : 


% actors can create new actors

191 : 



192 : 


While the actor \kw{update} methods have some similarity to shaderlanguage programs

193 : 


and OpenCL kernels, there are some important differences.

194 : 


First, actors compute over the continuous fields defined in the program\footnote{

195 : 


Shader programs treat texture data as continuous, but the interpolation mechanisms

196 : 


used in graphics applications are not suitable for imageanalysis problems.

197 : 


} instead of discrete data.

198 : 


Second, actors can interact with other actors.

199 : 


Third, actors can die or become stable.

200 : 


Lastly, actors can create new actors.

201 : 



202 : 


% note: system could have multiple kinds of actors?

203 : 



204 : 


% QUESTION: how are actor interactions specified?

205 : 



206 : 


\subsubsection{Global coordination}

207 : 


The last part of a \lang{} specification is the definition of the global properties of the

208 : 


algorithm.

209 : 


These include the initial set of actors, which for the direct volume rendering

210 : 


application might be defined using a setcomprehension notation

211 : 


\begin{quote}\begin{lstlisting}

212 : 


initially { { Ray (vec2i(i, j))  i in 0..1023 }  j in 0..1023 };

213 : 


\end{lstlisting}\end{quote}%

214 : 


The global properties also include global termination conditions,

215 : 


such a limit on the number of iterations or a predicate defined on a

216 : 


global property, such as global energy, and a mechanism for reading out

217 : 


the result of the computation from the local actor states.

218 : 


Lastly, global coordination includes properties, such as the neighbors

219 : 


of a particle, that define interagent interactions.

220 : 



221 : 


\subsubsection{Semantics}

222 : 



223 : 


The semantics of a \lang{} program is based on an iterative model that is similar

224 : 


to bulksynchronous parallelism.

225 : 


Informally, the state of an actor is represented by a triple: $\langle{}a, \sigma, S\rangle{}$,

226 : 


where $a$ is a unique actor name, $\sigma$ is a mapping from instance variables to values,

227 : 


and $S$ is the actor's status, which is either $\mathbf{ACTIVE}$ or $\mathbf{STABLE}$.

228 : 


The state of a program $\mathcal{P}$ is then defined to be a set of actor states.

229 : 


The \kw{update} method is then modeled as a curried function that takes the program state

230 : 


and an actor name and produces a set of actor states.

231 : 


The \kw{update} method returns a set, since the actor may die, in which case the result is the

232 : 


empty set, or may create new actors, which will also be included in the resulting set.

233 : 


A single iteration of the program can then be defined by

234 : 


\begin{displaymath}

235 : 


\mathcal{P}_{i+1}

236 : 


= \mathrm{Stable}(\mathcal{P}_i) \cup \left( \bigcup_{\langle{}a,\sigma,S\rangle{} \in \mathrm{Active}(\mathcal{P}_i)}{U\,\mathcal{P}_i\,a} \right)

237 : 


\end{displaymath}%

238 : 


where $U$ is the function denoted by the \kw{update} method and

239 : 


\begin{eqnarray*}

240 : 


\mathrm{Active}(\mathcal{P}) & = &

241 : 


\{ \langle{}a,\sigma, \mathbf{ACTIVE}\rangle{}  \langle{}a, \sigma, \mathbf{ACTIVE}\rangle \in \mathcal{P} \} \\

242 : 


\mathrm{Stable}(\mathcal{P}) & = &

243 : 


\{ \langle{}a,\sigma, \mathbf{STABLE}\rangle{}  \langle{}a, \sigma, \mathbf{STABLE}\rangle \in \mathcal{P} \}

244 : 


\end{eqnarray*}%

245 : 


Note that this rule has the effect of discarding the dead actors.

246 : 


A program has reached a terminal state when there are no active actors left.

247 : 



248 : 


Global coordination is modeled by a program state to program state function

249 : 


that is applied between iterations.

250 : 


At this time, we also check for global termination conditions.

251 : 



252 : 


Because the \kw{update} method is given the program state $\mathcal{P}_i$ to

253 : 


compute the actor's $i+1$ state, each actor's update is independent of the others (even if

254 : 


they depend on a neighboring actor's state).

255 : 


This property makes the program amenable for efficient parallel execution.
