FuzzyCmeans

From HP-SEE Wiki

Jump to: navigation, search

Contents

General Information

  • Application's name: Parallel Fuzzy C Mean for classification/Feature detection category
  • Virtual Research Community: Computational Physics. EO-Science
  • Scientific contact: Dana Petcu, petcu@info.uvt.ro
  • Technical contact: Silviu Panica, silviu@info.uvt.ro
  • Developers: Silviu Panica, Daniela Zaharie, West University of Timisoara, Romania ({silviu,dzaharie}@info.uvt.ro)
  • Web site: http://research.info.uvt.ro/

Short Description

Fuzzy clustering algorithms allow the identification of spatially continuous regions of pixels characterized by similar feature values, that’s through considering the fact that a pixel in a satellite image may contain spectral information corresponding to different ground components. Since the satellite images are usually large, designing efficient implementation of fuzzy clustering algorithms attracted the interest of researchers. Currently, there exist parallel variants of the traditional Fuzzy C-Means (FCM) algorithm, but their extension to the case of algorithms involving spatial information has not been investigated yet.

This research work aims to extend the existing parallelization of FCM to include some spatial variants (e.g. FCM with spatial information and Gaussian Kernel based FCM). The proposed parallelization is based on three basic ideas: spatial slicing of images, exploiting the collective computations, as much as possible, and reducing the communication between processors. Several slicing strategies were analyzed with respect to their ability to ensure a balanced load of processors. There were also proposed parallel variants for the computation of cluster validity indices useful in the context of semi-automatic identification of the number of classes.

Problems Solved

Fuzzy c-means solves the problem of object clustering in case of remote sensing images. This algorithm tries to identify spatially continuous regions of pixels characterized by similar feature values which most likely corresponds to similar ground cover types, e.g. generate vegetation maps of an area of interest.

Scientific and Social Impact

FuzzyCmenas provides improved means for classification of images and feature detection. Social impact through applications that can be constructed on top, e.g. for crisis management.

Collaborations

This work was done in collaboration with:

  • IBM Center of Advanced Studies, Egypt
    • Ahmed Sayed, asayed72@yahoo.com
    • Hisham El-Shishiny, shishiny@eg.ibm.com
  • Faculty of Computer and Information Sciences, Ain Shams University, Cairo, Egypt.
    • Ashraf S. Hussein, ashrafh@acm.org

Beneficiaries

  • Researchers from Earth Observation field

Number of users

6

Development plan

  • Concept: Before the project has started.
  • Start of alpha stage: After the project has started.
  • Start of beta stage: M8
  • Start of testing stage: M10
  • Start of deployment stage: M18
  • Start of production stage: M19


Resource Requirements

  • Number of cores required for a single run: From 1 to up to 1024
  • Minimum RAM/core required: 1 GB (up to 4GB)
  • Storage space during a single run: 850 MB (average storage; it depends on the input datasets type and size)
  • Long-term data storage: 2 TB
  • Total core hours required: 300 000

Technical Features and HP-SEE Implementation

  • Primary programming language: C
  • Parallel programming paradigm: MPICH2/OpenMPI (BlueGene/P)
  • Main parallel code: MPICH2
  • Pre/post processing code: C
  • Application tools and libraries: libtiff

Usage Example

In order to use FuzzyCmeans application you must obtain an access account on UVT BG/P Supercomputer. The application can be launched using:

  • mpirun directly:
    • mpirun -partition BG_PARTITION_NAME -mode BG_EXECUTION_MODE -cwd `pwd` -np CPU_NO -args "config_file M N" -exe /path/to/fuzzyCmeansBinary
  • loadleveler scheduler:
    • a job description file must be define for the choosen simulation:
#!/bin/sh
# @ job_name = sfcmGridFragm_2iunie
# @ job_type = bluegene
# @ requirements = (Machine == "$(host)")
# @ error = $(job_name)_$(jobid).err
# @ output = $(job_name)_$(jobid).out
# @ environment = COPY_ALL;
# @ notification = always
# @ notify_user = silviu@info.uvt.ro
# @ wall_clock_limit = 3:59:00
# @ class = parallel
# @ bg_size = 128
# @ queue
/bgsys/drivers/ppcfloor/bin/mpirun -mode BG_EXECUTION_MODE -cwd `pwd` -np CPU_NO -args "config_file M N" -exe /path/to/fuzzyCmeansBinary
    • submit the job using:
      • llsubmit job_descriptor

For more details on how to use the BG/P Supercomputer please read the dedicate wiki: http://hpc.uvt.ro/wiki/BlueGene

Infrastructure Usage

  • Home system: UVT InfraGRID
    • Applied for access on: 01.2011
    • Access granted on: 01.2011
    • Achieved scalability: 128 cores
  • Accessed production systems:
  1. UVT BG/P
    • Applied for access on: 03.2011
    • Access granted on: 03.2011
    • Achieved scalability: 256 cores
  • Porting activities: The application has been ported from x86_64 cluster with Infiniband Interconnect to IBM BlueGene/P supercomputer. Parts of the code needed some attention and the compilation process had to be rewritten to make use of the BG/P compilers acceleration options.
  • Scalability studies: Tests on 512, 1024, 2048 and 4092 cores on IBM BlueGene/P.

Running on Several HP-SEE Centres

  • Benchmarking activities and results: The tests were made in comparison between InfraGRID x86_64 HPC Cluster and IBM BlueGene/P supercomputer. The maxim scallability was optained at 256 cores on InfraGRID cluster and near 512 cores on the IBM BlueGene/P supercomputer.
  • Other issues: Code corrections to optimise the communication between the nodes, especially when using higher number of cores.

Achieved Results

Please check the published paper under "Publications" for detailed information about application implementation and comparative results between InfraGRID Cluster (common x86_64 architecture) and the BG/P Supercomputer.

Publications

  • D.Petcu, D. Zaharie, S.Panica, A.S. Hussein, A. Sayed, H. El-Shishiny, Fuzzy Clustering of Large Satellite Images using High Performance Computing, accepted at SPIE Remote Sensing Conference: High-Performance Computing in Remote Sensing, 19-22 September 2011, Prague.

Foreseen Activities

Expanding the current code to support usage of higher satellite images by improving the I/O operations and by including a better memory mapping support.

Personal tools