# comp.graphics.algorithms

## Subject: Re: 2D convolution performance expectation

François wrote:
> Hello and thanks for the answer,
>
>> Why are you using floating point ?
>
> In fact I have to read the image in float format. But I don't know if it
> is more efficient to convert it to integers and back to float (values
> normalized between 0.0 and 1.0) for write ouput to file since I will
> have to add some division operations for conversion.
>
>> You also did not say wether the image is color or Grayscale
>
> It is grayscale images.
Good it is easier to process a grayscale image
If the original image is in float format r=then you are using a
previously manipulated or processed image and you should try very very
hard to get your hands on the original image, from which this floating
point version was created.
There are all sorts of numerical issues with working on images that have
previously processed, such as numerical round off, interpolation causing
slopes to be detected where there were none in the original image, and
other problems. Do yourself a favor and spend at least a day trying to
find the original image from which this floating image was created.

>
>> the usual manner for a kernel convolution is to use integers
>> For Instance with a 7 x 7 kernel there would 49 Multiplies and
>> additions and 1 divide for nearly every point in the 1000 x 1000 image
>> array
>> If you neglect the edges then the image convolution takes place over
>> 951 x 951
>> this is 43277 multiples and adds
>> On a modern Pentium 4 with 4 MB LII cache this can all take place in
>> the cache for a grayscale image which can take 2 cycles for each
>> multiply and 1 cycle for each add and I think it is 10 cycles for a
>> divide
>
> I don't understand what do you meant by "all take place in the cache".
> Do you mean load the entire image in the memory?
> The problem is, for the moment, I can work with 1000*1000 images but the
> final goal is to process very large images (around 20000*20000) so
> performance is crucial.
A general computer has several forms of RAM
L1 cache is the RAM that is directly tied to the Processor
I have no idea of the size of this RAM
LII cache is RAM that is used more directly by the Processor, but is
usually slightly slower mostly 512 KByte some have 1Megabyte
MainBoard RAM usually in the 1 - 5 Gigabyte Range Nowadays
Virtual or Cache Memory which is actually using a swap file on the
computer Hard Drive
>
>>
>> which works out to 692432 clock cycles which on a 2 GHz machine should
>> only take about 339 Milliseconds for color the math is very similar
>
> I will try to use integers to see if there is a performance increase.
>
>
> Thanks
>
> François
>

With an image size of 382 MByte the best you can hope for is to keep the
image in Mainboard RAM. If you are using a caching system like LINUX or
Windows 2K,XP then prepare for some headaches since they will try to
swap out the image to Cache if you don't program the code in a very
specific fashion.

If you can afford a newer computer swap out th motherboard for a
dualcore AMD system they scream circles around sinple processor units,
but to get the full effect you will need to write your code in a dual
threaded fashion so that you work on 1/2 or 1/4 the image at a time.

Another way to speed things up is to realize that certain size kernel
can be relized by applying a smaller kernel multiple times fro instance
a 9x9 kernel can be achieved by using a 3x3 kernel twice
for a 7x7 kernel this optimization is not possible. Consider this
carefully since a 1000x1000 image using a 9x9 kernel requires 81
mutiplies and adds with 1 division for each pixel which works out to
While a 3x3 require 9 multiples and adds and 1 division per pixel so the
same convolution requires only 9000000 * 2 = 18000000 multiplies and