4b. Appendix II: Description of image processing and data flow for the HabCam vehicle

1969/12/31 - 20:00

1) Image acquisition [Howland, Vine, Gallager, Taylor]: Stereo images pairs (1280x1024 pixels each) are acquired at 5 to 15 frames per second and merged with georeferenced and environmental data. Raw 16 bit images are written to disk in real-time to await processing described below as an asynchronous operation.
2) Illumination and color correction [Vine, Gallager] (Fig. 10): The first step is to perform a correction for illumination uniformity using homomorphic filtering (low pass filter) followed by white balance correction to achieve ‘true’ color. Although we have settled on a certain algorithm that corrects the image due to light absorption at specific wavelengths, we realize that correction to ‘true’ color as if the water was removed from the picture, requires in situ, hyperspectral measurement of local water characteristics since water and particulate load is dynamic. We are currently developing an in situ reflectance hyperspectrometer specifically for the purpose of image color correction in real-time.
3) Segmentation [Prasad, Vine, Gallager](Fig. 11): By far the most challenging task is segmentation of the image or breaking the scene into regions of similar texture and color. We have explored several new methods of segmenting benthic images into different homogeneous textures and identifying boundaries which separate the different regions. This work is being carried out in collaboration with the Los Alamos National Laboratory. LANL brings to bear its image analysis expertise developed in the context of exploiting satellite remote sensing data. In particular, LANL has developed a new and unique framework for efficient structural feature extraction and characterization of complex imagery via polygonal representations using computational geometry and graph theory. By using image-adaptive meshing (~103 elements) instead of a regular pixel grid (~106 elements) they achieve high data reduction and information enrichment compared to the state-of-the-art. Recently, LANL succeeded in building prototype software focused on automatically detecting scallops in our HabCam image dataset. The software showcased the potential capabilities of the underlying technology in addressing the challenges offered by the dataset. More precisely, for a select sample of the image data, the software delivered scallop detection and counting accuracy of 80% with a processing rate of 1.5 frames per second on a Pentium laptop computer. This speed is two orders of magnitude greater than that of current state-of-the-art software we are using to accomplish comparable tasks (more detail available in the LANL Statement of Work, Appendix).
4) Region of Interest (ROI) extraction [Gallager]: Once the image is segmented, each target area is extracted as a rectangular ROI with dimensions of maximum height and width of the segmented area, and stored on disk.
5) Feature extraction [Gallager]: Wavelet coefficients are stored along with the ROIs and are used as multi-scale, moment invariant texture features (Tiwari and Gallager, 2004). Additional features arise from morphological analysis such as length, width, circularity, elipticity, etc.
6) Unsupervised and supervised target classification [Gallager, Prasad] (Fig. 12): The extracted features set is used to generate an unsupervised classification using Principle Components Analysis (PCA) and Clustering. A combination of these approaches will also be used to determine when the camera has encountered new habitats not yet defined in the supervised training set. The results of the unsupervised classification will dictate how many classes of targets exist in a data set thus allowing a supervised classifier such as a Support Vector Machine (SVM) to be trained to recognize elements of each class. A manually derived training set is used establish boundaries for each target class.
7) Texture energy analysis for background habitat information [Gallager, Prasad, Auster, Valentine] (Fig. 13; Table 2):When an image ribbon is created from many sub-images, we can begin to see large-scale textures that are not readily identifiable in individual sub-images. Areas of the composite image abundant in one kind of "texture element" (whether it is a inanimate background object or an organism) will present a different large-scale texture pattern than another part of the image abundant in another texture element. Thus, mud, sand, small gravel, shell aggregations, reefs, and aggregations of cobble/boulder, scallop, or sand dollar, all present different large scale homogeneous texture patterns.
For the purposes of habitat characterization, we use a modification of the general scheme proposed by Valentine et al (2005) with the following categories:
-mud/sand without emergent biological structure
-mud/sand with emergent biological structure
-small gravel (< 2cm) without emergent/attached biological structure
-small gravel (< 2cm) with emergent/attached biological structure
-shell aggregations and/or reefs w/out emergent/attached biological structure
-shell aggregations and/or reefs with emergent/attached biological structure
-cobble/boulder without emergent/attached biological structure
-cobble/boulder with emergent/attached biological structure

While realizing that substrate is a continuum, we assert that there are qualitative difference between habitats of mud and sand and the species that live there. Accordingly we seek to differentiate these bottom types and the associated biota. The categories listed here will be segmented using the approach described above and displayed by pseudo coloring the original mosaic as a function of texture type. Associations with specific targets (e.g., larval and juvenile fish, echinoderms, hydrozoans) and texture category will be made through discriminate analysis. Inclusion of diverse habitats in the analysis will allow for more robust training of the SVMs, providing information about the greatest number of environments immediately useful to fishery managers and for future study by benthic habitat researchers.
8) Statistical error analysis of classification results [Gallager]:
Each target classification is tagged with a probability of error related to a cross- validation matrix so the investigator has good estimates as to how well the classifier is performing.
9) Morphological analysis of classified targets [Gallager, Vine]: In the process of finding boundaries, segmenting targets against the background, and classifying each target, the objective of measuring major and minor axes in the stereo pairs in an automated way is nearly trivial. Morphological measurements of extracted targets will be made to the nearest mm in a field of view of 1 m2 given sufficient resolution in the original image.
10) Fusion of images with GIS information {Mayer, Vine, Lerner]: Rigorous sensor models for towed camera platform and the multi-beam acoustics will be integrated into an evolving oceanographic processing and storage system that will provide fusion services for the collected data sets. We will coordinate with the Gulf of Maine Ocean Data Partnership (http://www.gomodp.org/) to make the regional data sets discoverable, accessible and interoperable in accord with IOOS DMAC-endorsed protocols (WHOI is a partner in the development of interoperability and metadata standards). These capabilities and tools will demonstrate the OOSTethys.org framework of OGC compliant Sensor Observation Services for the integration of regional ocean observing systems.
11) Storage of raw and processed information in a relational data base [Mayer, Vine, Lerner]: The Oceanographic server will be derived and evolved from existing open source GeoServers that comply with OGC standards that will provide the ability of sharing and leveraging remote geospatial data sets through web services.