Image processing
From HP-SEE Wiki
EagleEye
Section contributed by UPB
At the current time, the I/O testing using the EagleEye framework were done on six core AMD Opteron blades @ 2.6 GHz connected through Infiniband. A number of 32 images were used for performance testing. Each image has a resolution of 5000 x 5000. Image size varies between 3.6 and 7.7 MB (in JPEG format). The images were read into memory using the Lustre FS filesystem.
In terms of speedup results are good for up to 4 total slaves. For a total number of 2 slaves, we get speedups of 1.81 and 1.83, while for 4 slaves we get 3.29, 3.28 and 3.38. The worst eficiency here is 82.11% for 2 masters and 2 slaves per master.
When going up to 8 total slaves we see an obvious decrease in performance. We get speedup values of 5.19 (2 masters), 4.73 (4 masters) and 4.53 (8 masters). Thus, the best eficiency here is only 65%. This is partially explained by the low number of input images. Having only 16 input images and 8 slaves we should get 2 images per slave for the run to be balanced. However, because the masters will always try to keep the command queues full we end up debalancing the system and give 3 images to some slaves and only one image to others. This is an unfortunate consequence of the our attempt to keep the slaves always working and not waiting for work from the master. It is our guess that for large values of the images / slaves ratio we would most likely get unbalanced work queues. Given that our image database has around 16000 images and the maximum number of slaves will most likely be in the dozens or hundreds it is unlikely that this problem will appear. We decided to see if our theory is correct by maintaining the same configurations and increasing the number of images to 32.
The differences between configurations with the same number of total slaves seem negligible - there is not clear favorite here. Although a trend can be noticed in the second set of tests - configurations with a higher number of masters and a lower number of slaves per master seem to fair better - the differences in efficiency are at most 9%. One explain here might be that for lower number of slaves per master less I/O is done at the same time from a machine - when processing the first batch of images the slave threads will all read different files from the disk at the same time and this might cause performance issues if the number of slaves is large. However, without further profiling the program or at least determining the time it takes to read a file we cannot offer a clear conclusion at this point.