FuzzyCmeans

From HP-SEE Wiki

(Difference between revisions)
Jump to: navigation, search
(Usage Example)
Line 22: Line 22:
...
...
-
== Collaborations and Beneficiaries ==
+
== Collaborations ==
This work was done in collaboration with:
This work was done in collaboration with:
Line 31: Line 31:
** Ashraf S. Hussein, ashrafh@acm.org
** Ashraf S. Hussein, ashrafh@acm.org
-
Primary beneficiaries will be our local research group from geography department but also the students that are involved on earth observation trainings or research groups.
+
== Beneficiaries ==
 +
 
 +
* Researchers from Earth Observation field
 +
 
 +
== Number of users ==
 +
 
 +
6
 +
 
 +
== Development plan ==
 +
 
 +
* Concept: Before the project has started.
 +
* Start of alpha stage: After the project has started.
 +
* Start of beta stage: M8
 +
* Start of testing stage: M10
 +
* Start of deployment stage: M18
 +
* Start of production stage: M19
 +
 
 +
 
 +
== Resource Requirements ==
 +
 
 +
* Number of cores required for a single run: ''From 1 to up to 1024''
 +
* Minimum RAM/core required: ''1 GB (up to 4GB)''
 +
* Storage space during a single run: ''850 MB (average storage; it depends on the input datasets type and size)''
 +
* Long-term data storage: ''2 TB''
 +
* Total core hours required: ''300 000''
== Technical Features and HP-SEE Implementation ==
== Technical Features and HP-SEE Implementation ==
* Primary programming language: ''C''
* Primary programming language: ''C''
-
* Parallel programming paradigm: ''MPI/MPIX (BlueGene/P)''
+
* Parallel programming paradigm: ''MPI/OpenMPI/MPIX (BlueGene/P)''
-
* Main parallel code: ''existing code''
+
* Main parallel code: ''MPI''
-
* Pre/post processing code: ''local development''
+
* Pre/post processing code: ''local developer''
-
* Application tools and libraries: ''MPICH2, OpenMPI and MPIX''
+
* Application tools and libraries: ''libtiff''
-
* Number of cores required: ''2048''
+
-
* Minimum RAM/core required: ''1GB''
+
-
* Storage space during a single run: ''850MB (average storage; it depends on the input datasets type and size)''
+
-
* Long-term data storage: ''10TB''
+
== Usage Example ==
== Usage Example ==
''Will be added when the tool goes in production''
''Will be added when the tool goes in production''
 +
 +
== Infrastructure Usage ==
 +
 +
* Home system: ''UVT InfraGRID''
 +
** Applied for access on: ''01.2011''
 +
** Access granted on: ''01.2011''
 +
** Achieved scalability: ''128 cores''
 +
* Accessed production systems:
 +
# ''UVT BG/P''
 +
#* Applied for access on: ''03.2011''
 +
#* Access granted on: ''03.2011''
 +
#* Achieved scalability: ''256 cores''
 +
* Porting activities: ''The application has been ported from x86_64 cluster with Infiniband Interconnect to IBM BlueGene/P supercomputer. Parts of the code needed some attention and the compilation process had to be rewritten to make use of the BG/P compilers acceleration options.''
 +
* Scalability studies: ''Tests on 512, 1024, 2048 and 4092 cores on IBM BlueGene/P.''
 +
 +
== Running on Several HP-SEE Centres ==
 +
 +
* Benchmarking activities and results: ''The tests were made in comparison between InfraGRID x86_64 HPC Cluster and IBM BlueGene/P supercomputer. The maxim scallability was optained at 256 cores on InfraGRID cluster and near 512 cores on the IBM BlueGene/P supercomputer.''
 +
* Other issues: ''Code corrections to optimise the communication between the nodes, especially when using higher number of cores.''
 +
 +
== Achieved Results ==
 +
Tobefilledin
== Publications ==
== Publications ==
* D.Petcu, D. Zaharie, S.Panica, A.S. Hussein, A. Sayed, H. El-Shishiny, Fuzzy Clustering of Large Satellite Images using High Performance Computing, accepted at SPIE Remote Sensing Conference: High-Performance Computing in Remote Sensing, 19-22 September 2011, Prague.
* D.Petcu, D. Zaharie, S.Panica, A.S. Hussein, A. Sayed, H. El-Shishiny, Fuzzy Clustering of Large Satellite Images using High Performance Computing, accepted at SPIE Remote Sensing Conference: High-Performance Computing in Remote Sensing, 19-22 September 2011, Prague.
 +
 +
 +
== Foreseen Activities ==
 +
Expanding the current code to support usage of higher satellite images by improving the I/O operations and by including a better memory mapping support.

Revision as of 16:31, 13 February 2012

Contents

General Information

  • Application's name: Parallel Fuzzy C Mean for classification/Feature detection category
  • Virtual Research Community: EO-Science
  • Scientific contact: Dana Petcu, petcu@info.uvt.ro
  • Technical contact: Silviu Panica, silviu@info.uvt.ro
  • Developers: Silviu Panica, Daniela Zaharie, West University of Timisoara, Romania ({silviu,dzaharie}@info.uvt.ro)
  • Web site: http://research.info.uvt.ro/

Short Description

Fuzzy clustering algorithms allow the identification of spatially continuous regions of pixels characterized by similar feature values, that’s through considering the fact that a pixel in a satellite image may contain spectral information corresponding to different ground components. Since the satellite images are usually large, designing efficient implementation of fuzzy clustering algorithms attracted the interest of researchers. Currently, there exist parallel variants of the traditional Fuzzy C-Means (FCM) algorithm, but their extension to the case of algorithms involving spatial information has not been investigated yet.

This research work aims to extend the existing parallelization of FCM to include some spatial variants (e.g. FCM with spatial information and Gaussian Kernel based FCM). The proposed parallelization is based on three basic ideas: spatial slicing of images, exploiting the collective computations, as much as possible, and reducing the communication between processors. Several slicing strategies were analyzed with respect to their ability to ensure a balanced load of processors. There were also proposed parallel variants for the computation of cluster validity indices useful in the context of semi-automatic identification of the number of classes.

Problems Solved

Fuzzy c-means solves the problem of object clustering in case of remote sensing images. This algorithm tries to identify spatially continuous regions of pixels characterized by similar feature values which most likely corresponds to similar ground cover types, e.g. generate vegetation maps of an area of interest.

Scientific and Social Impact

...

Collaborations

This work was done in collaboration with:

  • IBM Center of Advanced Studies, Egypt
    • Ahmed Sayed, asayed72@yahoo.com
    • Hisham El-Shishiny, shishiny@eg.ibm.com
  • Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
    • Ashraf S. Hussein, ashrafh@acm.org

Beneficiaries

  • Researchers from Earth Observation field

Number of users

6

Development plan

  • Concept: Before the project has started.
  • Start of alpha stage: After the project has started.
  • Start of beta stage: M8
  • Start of testing stage: M10
  • Start of deployment stage: M18
  • Start of production stage: M19


Resource Requirements

  • Number of cores required for a single run: From 1 to up to 1024
  • Minimum RAM/core required: 1 GB (up to 4GB)
  • Storage space during a single run: 850 MB (average storage; it depends on the input datasets type and size)
  • Long-term data storage: 2 TB
  • Total core hours required: 300 000

Technical Features and HP-SEE Implementation

  • Primary programming language: C
  • Parallel programming paradigm: MPI/OpenMPI/MPIX (BlueGene/P)
  • Main parallel code: MPI
  • Pre/post processing code: local developer
  • Application tools and libraries: libtiff

Usage Example

Will be added when the tool goes in production

Infrastructure Usage

  • Home system: UVT InfraGRID
    • Applied for access on: 01.2011
    • Access granted on: 01.2011
    • Achieved scalability: 128 cores
  • Accessed production systems:
  1. UVT BG/P
    • Applied for access on: 03.2011
    • Access granted on: 03.2011
    • Achieved scalability: 256 cores
  • Porting activities: The application has been ported from x86_64 cluster with Infiniband Interconnect to IBM BlueGene/P supercomputer. Parts of the code needed some attention and the compilation process had to be rewritten to make use of the BG/P compilers acceleration options.
  • Scalability studies: Tests on 512, 1024, 2048 and 4092 cores on IBM BlueGene/P.

Running on Several HP-SEE Centres

  • Benchmarking activities and results: The tests were made in comparison between InfraGRID x86_64 HPC Cluster and IBM BlueGene/P supercomputer. The maxim scallability was optained at 256 cores on InfraGRID cluster and near 512 cores on the IBM BlueGene/P supercomputer.
  • Other issues: Code corrections to optimise the communication between the nodes, especially when using higher number of cores.

Achieved Results

Tobefilledin

Publications

  • D.Petcu, D. Zaharie, S.Panica, A.S. Hussein, A. Sayed, H. El-Shishiny, Fuzzy Clustering of Large Satellite Images using High Performance Computing, accepted at SPIE Remote Sensing Conference: High-Performance Computing in Remote Sensing, 19-22 September 2011, Prague.


Foreseen Activities

Expanding the current code to support usage of higher satellite images by improving the I/O operations and by including a better memory mapping support.

Personal tools