Query Specific Fusion for Large-Scale Image Retrieval


Recent image retrieval algorithms based on local features indexed by a vocabulary tree and holistic features indexed by compact hashing codes both demonstrate excellent scalability. However, their retrieval precision may vary dramatically among queries. This motivates us to investigate how to fuse the ordered retrieval sets given by multiple retrieval methods, to further enhance the retrieval precision. Thus, we propose a graph-based query specific fusion approach where multiple retrieval sets are merged and reranked by conducting a link analysis on a fused graph. The retrieval quality of an individual method is measured by the consistency of the top candidates' nearest neighborhoods. Hence, the proposed method is capable of adaptively integrating the strengths of the retrieval methods using local or holistic features for different query images without any supervision. Extensive experiments demonstrate very competitive performance on 4 public datasets, i.e., the UKbench, Corel-5K, Holidays and San Francisco Landmarks datasets.

Project page: http://www.research.rutgers.edu/~shaoting/research/nec2011/project.htm


  • Shaoting Zhang, Ming Yang, Timothee Cour, Kai Yu and Dimitris Metaxas: Query Specific Fusion for Image Retrieval, The 12th European Conference on Computer Vision (ECCV), 2012.



    Dynamic Scene Registration

    Figure. Images and its gradient distributions, (a) Before registration, (b) In the middle of registration (c) After registration

    These works address the problem of registering a sequence of images in a moving dynamic texture video. This involves optimization with respect to camera motion, the average image, and the dynamic texture model. This problem is highly illposed and almost impossible to have good solutions without priors. In this paper, we introduce powerful priors for this problem, based on two simple observations: 1) registration should simplify the dynamic texture model while preserving all useful information. It motivates us to compute a prior for the dynamic texture by marginalizing over specific dynamics in the space of all stable auto-regressive sequences; 2) the statistics of derivative filter responses in the average image can be significantly changed by registration, and better registration should lead to a sharper average image. This offers us the prior of requiring the derivative distribution of the estimated average image to be close to that learned from the input image sequence. With these priors, a new registration approach is proposed by marginalizing over the “nuisance” variables under a Bayesian framework. And superior motion estimation results are obtained by jointly optimizing over the registration parameters, the average image, and the dynamic texture model. Experimental results on real video sequences of moving dynamic textures show convincing performance of the proposed approach.


    • Junzhou Huang, Xiaolei Huang, Dimitris Metaxas, ”Optimization and Learning for Registration of Moving Dynamic Textures”, In Proc. Of IEEE Int’l Conf. on Computer Vision, ICCV’07, pp. 1-8, 2007.
    Project Homepage

        Motion Saliency Detection

        This project addresses the issue of motion saliency detection in videosequences. Saliency detection has attracted a lot of attention in recentyears. It aims at locating semantic regions in videos for further videounderstanding. This project focuses on the issue of motion saliencydetection for video content analysis.A new method Temporal Spectral Residualto capture the salient objects from video sequences is proposed. Based onthe analysis on temporal slices, it can automatically separate foregroundmotion objects from the background. It also has an effective strategy foradaptive threshold selection and noise removal. Different from conventionalbackground modeling methods with complex mathematical model, this method issimply based on Fourier spectrum analysis, which makes it simple and fast.


        • Xinyi Cui, Qingshan Liu, Dimitris N. Metaxas. "Temporal Spectral Residual: Fast Motion Saliency Detection." Proceedings of the 17th ACM international conference on Multimedia, pp.617-620. 2009.



          Transformation Invariant Sparse Representation

          Figure. Sparse representation results given unaligned test images. Training images (a), test images (b), results by Randomfaces (c), results by the proposed approach (d).

          Sparse representation in compressive sensing is gaining increasing attention due to its success in various applications. As we demonstrate in this paper, however, image sparse representation is sensitive to image plane transformations such that existing approaches can not reconstruct the sparse representation of a geometrically transformed image. We introduce a simple technique for obtaining transformation-invariant image sparse representation. It is rooted in two observations: 1) if the aligned model images of an object span a linear subspace, their transformed versions with respect to some group of transformations can still span a linear subspace in a higher dimension; 2) if a target (or test) image, aligned with the model images, lives in the above subspace, its pre-alignment versions would get closer to the subspace after applying estimated transformations with more and more accurate parameters. These observations motivate us to project a potentially unaligned target image to random projection manifolds defined by the model images and the transformation model. Each projection is then separated into the aligned projection target and a residue due to misalignment. The desired aligned projection target is then iteratively optimized by gradually diminishing the residue. In this framework, we can simultaneously recover the sparse representation of a target image and the image plane transformation between the target and the model images. We have applied the proposed methodology to two applications: face recognition, and dynamic texture registration. The improved performance over previous methods that we obtain demonstrates the effectiveness of the proposed approach.


          • Junzhou Huang, Xiaolei Huang, Dimitris Metaxas, ”Simultaneous Image Transformation and Sparse Representation Recovery”, IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008.
          Project Homepage

              Image Annotation

              Automatically assigning relevant text keywords to images is an important problem. Many algorithms have been proposed in the past decade and achieved good performance. Efforts have focused upon model representations of keywords, but properties of features have not been well investigated. In most cases, a group of features is preselected, yet important feature properties are not well used to select features. In this paper, we introduce a regularization based feature selection algorithm to leverage both the sparsity and clustering properties of features, and incorporate it into the image annotation task. A novel approach is also proposed to iteratively obtain similar and dissimilar pairs from both the keyword similarity and the relevance feedback. Thus keyword similarity is modeled in the annotation framework. Numerous experiments are designed to compare the performance between features, feature combinations and regularization based feature selection methods applied on the image annotation task, which gives insight into the properties of features in the image annotation task. The experimental results demonstrate that the group sparsity based method is more accurate and stable than others.


              • Shaoting Zhang, Junzhou Huang, Yuchi Huang, Yang Yu, Hongsheng Li, Dimitris N. Metaxas, "Automatic image annotation using group sparsity", CVPR 2010.