Thursday, June 7, 2012

Nearest Neighbor Search Using Data Projections and Hashing

1. Introduction


Random data projection with hash-based index structures has come to be the state-of-the-art method for both an efficient near neighbor search and a scalable cluster analysis for very high-dimensional data.  Data of very high dimensions pose serious computational challenge because of curse-of-dimensionality [Slide]. Random projections are also commonly used for dimension reduction.  An instance of random projection, Locality sensitive hashing, has widely attracted the attention of research community and is an active area of further research. This article gives resources for detailed reading on projection method.


2. Near Neighbor Search  


    a)  Types of queries

       1. Nearest neighbor queries
       2. Top-k nearest neighbors queries
       3. Range queries
       4. r-Near neighbor queries or ball-queries
       6. All nearest-neighbor queries  or Group queries or multi-object queries
       7. Reverse nearest neighbor queries
       8. List of other queries
       

   b) Hashing functions for different distance metrics

      6.

   c) References 

   
      2012
              1. SIMP: Accurate and Efficient Near Neighbor Search in High Dimensional Spaces ( Note: Method shown to yield very efficient search result on more than 100 million 128-dimensional vectors.  This is one of the largest dataset used for measuring the performance of a high-dimensional search algorithm). 
      2009

      2008
              2.  Modeling LSH for Performance Tuning
              3.  Lecture Notes

      2007

      2005
          
     2003
     2002
            1Similarity Estimation Techniques from Rounding Algorithms

     1998
           1.  Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
           2.  Min-Wise Independent Permutations

     1997
            1. Two algorithms for nearest-neighbor search in high dimensions

     1984
            1.  W.B. Johnson and J. Lindenstrauss. Extensions of Lipshitz mapping into Hilbert space

  

  d) Implementations

          1.  Implementation of p-Stable LSH
          2.  LSHKit
          3.  SIMP
          4.  LSB


4. Scalable Cluster Analysis using Min-Hash




No comments:

Post a Comment