A graph-based cache for large-scale similarity search engines

V. Gil-Costa, M. Marin, C. Bonacic, R. Solar

Research output: Contribution to journalArticle

Abstract

Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%. © 2017, Springer Science+Business Media, LLC, part of Springer Nature.
LanguageEnglish
Pages2006-2034
Number of pages29
JournalJournal of Supercomputing
Volume74
Issue number5
DOIs
Publication statusPublished - 2018

Fingerprint

Cache memory
Similarity Search
HIgh speed networks
Search engines
Search Engine
Cache
Large scale systems
Query
Graph in graph theory
Industry
High-speed Networks
Distance Function
Relay
Search Algorithm
Metric space
Complex Systems
Evaluation
Object

Keywords

  • Approximate similarity search
  • Distributed large-scale search engines
  • Metric space cache
  • Cache memory
  • Graphic methods
  • HIgh speed networks
  • Query processing
  • Set theory
  • Topology
  • Distance functions
  • Graph-based
  • Metric spaces
  • Scale similarity
  • Similarity search
  • Unstructured data
  • User query
  • Search engines

Cite this

A graph-based cache for large-scale similarity search engines. / Gil-Costa, V.; Marin, M.; Bonacic, C.; Solar, R.

In: Journal of Supercomputing, Vol. 74, No. 5, 2018, p. 2006-2034.

Research output: Contribution to journalArticle

Gil-Costa, V. ; Marin, M. ; Bonacic, C. ; Solar, R. / A graph-based cache for large-scale similarity search engines. In: Journal of Supercomputing. 2018 ; Vol. 74, No. 5. pp. 2006-2034.
@article{e1418523bb7a460281f6b7e1cdf1a07e,
title = "A graph-based cache for large-scale similarity search engines",
abstract = "Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60{\%}. {\circledC} 2017, Springer Science+Business Media, LLC, part of Springer Nature.",
keywords = "Approximate similarity search, Distributed large-scale search engines, Metric space cache, Cache memory, Graphic methods, HIgh speed networks, Query processing, Set theory, Topology, Distance functions, Graph-based, Metric spaces, Scale similarity, Similarity search, Unstructured data, User query, Search engines",
author = "V. Gil-Costa and M. Marin and C. Bonacic and R. Solar",
note = "Export Date: 7 June 2018 CODEN: JOSUE Correspondence Address: Gil-Costa, V.; Universidad Nacional de San LuisArgentina; email: gvcosta@unsl.edu.ar Funding details: ID15I10560 Funding details: PICT 2014 N 2014-01146 Funding details: FB0001 Funding text: Acknowledgements This research was supported by the supercomputing infrastructure of the NLHPC Chile, partially funded by CONICYT Basal Funds FB0001, Fondef ID15I10560, and partially funded by PICT 2014 N 2014-01146. References: Al-Fares, M., Loukissas, A., Vahdat, A., A scalable, commodity data center network architecture (2008) SIGCOMM Comput Commun Rev, 38 (4), pp. 63-74; Amato, G., Esuli, A., Falchi, F., Pivot selection strategies for permutation-based similarity search (2013) SISAP, pp. 91-102; Amato, G., Esuli, A., Falchi, E., A comparison of pivot selection techniques for permutation-based indexing (2015) J Inf Syst, 52 (C), pp. 176-188; Amato, G., Savino, P., Approximate similarity search in metric spaces using inverted files (2008) Infoscale, 1-28, pp. 10-28; Andoni, A., Indyk, P., Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions (2008) J Commun ACM, 51 (1), pp. 117-122; Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y., An optimal algorithm for approximate nearest neighbor searching fixed dimensions (1998) J ACM, 45 (6), pp. 891-923; Baeza-Yates, R., Ribeiro-Neto, B., (2011) Modern information retrieval, , 2, Addison-Wesley Publishing Company, Reading; Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Mar{\'i}n, M., Pedreira, O., Efficient similarity search by combining indexing and caching strategies (2015) SOFSEM, pp. 486-497; Burkhard, W.A., Keller, R.M., Some approaches to best-match file searching (1973) J Commun ACM, 4 (16), pp. 230-236; Bustos, B., Navarro, G., Ch{\'a}vez, E., Pivot selection techniques for proximity searching in metric spaces (2003) J Pattern Recognit Lett, 24 (14), pp. 2357-2366; Bustos, B., Pedreira, O., Brisaboa, N., A dynamic pivot selection technique for similarity search (2008) SISAP, pp. 394-401; Cao, W., Sahin, S., Liu, L., Bao, X., Evaluation and analysis of in-memory key-value systems (2016) Bigdata, pp. 26-33; Ch{\'a}vez, E., Figueroa, K., Navarro, G., Effective proximity retrieval by ordering permutations (2008) J Pattern Anal Manag Intell, 30, pp. 1647-1658; Ch{\'a}vez, E., Ludue{\~n}a, V., Reyes, N., Roggero, P., Faster proximity searching with the distal SAT (2016) J Inf Syst, 59, pp. 15-47; Ch{\'a}vez, E., Marroquin, J., Navarro, G., Fixed queries array: a fast and economical data structure for proximity searching (2001) J Multimed Tools Appl, 14 (2), pp. 113-135; Ch{\'a}vez, E., Navarro, G., A compact space decomposition for effective metric indexing (2005) J Pattern Recogn Lett, 26 (9), pp. 1363-1376; Chierichetti, F., Kumar, R., Vassilvitskii, S., Similarity caching (2009) SIGMOD-SIGACT-SIGART, pp. 127-136; Ciaccia, P., Patella, M., PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces (2000) ICDE, pp. 244-255; Ciaccia, P., Patella, M., Zezula, P., (1997) M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces, pp. 426-435. , VLDB; Dehne, F., Noltemeier, H., Voronoi trees and clustering problems (1988) Syntactic and Structural, Pattern Recognition, pp. 185-194; Egecioglu, {\"O}., Ferhatosmanoglu, H., Ogras, {\"U}.Y., Dimensionality reduction and similarity computation by inner-product approximations (2004) IEEE Trans Knowl Data Eng, 16 (6), pp. 714-726; Esuli, A., Mipai: Using the pp-index to build an efficient and scalable similarity search system (2009) SISAP, pp. 146-148; Esuli, A., Pp-index: Using permutation prefixes for efficient and scalable similarity search (2010) SEBD, pp. 318-325; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., A metric cache for similarity search (2008) LSDS-IR, pp. 43-50; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., Caching content-based queries for robust and efficient image retrieval (2009) EDBT, pp. 780-790; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., Similarity caching in large-scale image retrieval (2011) J Inf Process Manag, 48 (5), pp. 803-818; Faloutsos, C., Lin, K.-I., Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets (1995) SIGMOD, pp. 163-174; Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A., Approximate nearest neighbor searching in multimedia databases (2001) ICDE, pp. 503-511; Figueroa, K., Paredes, R., Boosting the permutation based index for proximity searching (2015) MCPR, pp. 103-112; Gennaro, C., Amato, G., Bolettieri, P., Savino, P., An approach to content-based image retrieval based on the lucene search engine library (2010) ECDL, pp. 55-66; Gessert, F., Wingerath, W., Friedrich, S., Ritter, N., Nosql database systems: A survey and decision guidance (2017) J Comput Sci R&D, 32 (3-4), pp. 353-365; Gil-Costa, V., Marin, M., Approximate distributed metric-space search (2011) LSDS-IR, pp. 15-20; Gil-Costa, V., Marin, M., Reyes, N., Parallel query processing on distributed clustering indexes (2009) J Discrete Algorithms, 7 (1), pp. 3-17; Gil-Costa, V., Santos, R.L.T., Macdonald, C., Ounis, I., Modelling efficient novelty-based search result diversification in metric spaces (2013) J Discrete Algorithms, 18, pp. 75-88; Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D., Do batch and user evaluations give the same results? (2000) In: SIGIR, pp. 17-24; Indyk, P., Motwani, R., Approximate nearest neighbors: Towards removing the curse of dimensionality (1998) ACM Symposium on Theory of Computing, pp. 604-613; Ingwersen, P., J{\"a}rvelin, K., (2005) The turn: integration of information seeking and retrieval in context (The Information Retrieval Series), , Springer, New York Inc, Secaucus; Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T.T., Hwang, S.J., Toderici, G., Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks (2017) Corr; Karypis, G., (2003) Cluto-Software for Clustering High-Dimensional Datasets, , http://glaros.dtc.umn.edu/gkhome/views/cluto, version 2.1.1; Lux, M., Chatzichristofis, S.A., Lire: Lucene image retrieval: An extensible java cbir library (2008) Conference on Multimedia, pp. 1085-1088; Macqueen, J.B., Some methods for classification and analysis of multivariate observations (1967) Berkeley Symposium on Mathematical Statistics and Probability, 1, pp. 281-297; Mancini, V., Bustos, F., Gil-Costa, V., Printista, A.M., Data partitioning evaluation for multimedia systems in hybrid environments (2012) 3PGCIC, pp. 321-326; Marin, M., Ferrarotti, F., Gil-Costa, V., Distributing a metric-space search index onto processors (2010) ICPP, pp. 13-16; Marin, M., Gil-Costa, V., Uribe, R., Hybrid index for metric space databases (2008) ICCS, pp. 327-336; Matej, A., Vlastislav, D., Optimizing query performance with inverted cache in metric spaces (2016) ADBIS, pp. 60-73; Mic{\'o}, M.L., Oncina, J., Vidal, E., A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements (1994) J Pattern Recognit Lett, 15 (1), pp. 9-17; Navarro, G., (2002) Searching in Metric Spaces by Spatial Approximation, pp. 28-46. , VLDB; Navarro, G., Reyes, N., Fully dynamic spatial approximation trees (2002) SPIRE, pp. 254-270; Navarro, G., Reyes, N., Dynamic spatial approximation trees for massive data (2009) SISAP, pp. 81-88; Novak, D., Batko, M., Metric index: An efficient and scalable solution for similarity search (2009) SISAP, pp. 65-73; Novak, D., Batko, M., Zezula, P., Large-scale similarity data management with distributed metric index (2012) J Inf Process Manag, 48 (5), pp. 855-872; Novak, D., Zezula, P., PPP-codes for large-scale similarity searching (2016) Database and Expert-Systems Applications on Transactions on Large-Scale Data- and Knowledge-Centered Systems, pp. 61-87; Pedreira, O., Brisaboa, N.R., Sofsem (2007) Theory and Practice of Computer Science, pp. 434-445; Ogras, {\"U}.Y., Ferhatosmanoglu, H., Dimensionality reduction using magnitude and shape approximations (2003) CIKM, pp. 99-107; Pan, Z., Lei, J., Zhang, Y., Sun, X., Kwong, S., Fast motion estimation based on content property for low-complexity H.265/HEVC encoder (2016) J IEEE Trans Broadcast, 62 (3), pp. 675-684; Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S., Nearest-neighbor caching for content-match applications (2009) WWW, pp. 441-450; Pramanik, S., Alexander, S., Li, J., An efficient searching algorithm for approximate nearest neighbor queries in high dimensions (1999) IEEE Multimed Comput Syst, 1, pp. 865-869; Raghavendra, S., Nithyashree, K., Geeta, C.M., Buyya, R., Venugopal, K.R., Iyengar, S.S., Patnaik, L.M., RSSMSO rapid similarity search on metric space object stored in cloud environment (2016) J Organ Collect Intell, 6 (3), pp. 33-49; Ruqeishi, K., Koneuay, M., Regrouping metric-space search index for search engine size adaptation (2015) Similarity Search and Applications, pp. 271-282; Saavedra, J.M., Barrios, J.M., Sketch based image retrieval using learned keyshapes (LKS) (2015) British Machine Vision Conference, pp. 1-164; Skala, M., (2009) Counting distance permutations. J Discrete Algorithms, 7 (1), pp. 49-61; Skillicorn, D.B., Hill, J.M.D., McColl, W.F., Mpeg-7 (2000) Multimedia Content Description Interfaces, Part 3: Visual. Technical Report ISO/IEC 15938-3; Skopal, T., Lokoc, J., Bustos, B., D-cache: universal distance cache for metric access methods (2012) J Trans Knowl Data Eng, 24 (5), pp. 868-881; Solar, R., Gil-Costa, V., Mar{\'i}n, M., Evaluation of static/dynamic cache for similarity search engines (2016) SOFSEM, pp. 615-627; Sadit Tellez, E., Chvez, E., The list of clusters revisited (2012) Pattern Recognition, pp. 187-196; Wang, X., Wang, J.T.L., Lin, K.-I., Shasha, D., Shapiro, B.A., Zhang, K., An index structure for data mining and clustering (2000) J Knowl Inf Syst, 2, pp. 161-184; Weber, R., B{\"o}hm, K., Trading quality for time with nearest neighbor search (2000) Extending Database Technology: Advances in Database Technology, pp. 21-35; Wei, W., Fan, X., Song, H., Fan, X., Yang, J., Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing (2017) J IEEE Trans Serv Comput PP, (99), p. 1; White, D., Jain, R., (1996) Algorithms and Strategies for Similarity Retrieval, , Technical Report VCL-96-101, Visual Computing Laboratory, University of California San Diego; Xia, Z., Wang, X., Zhang, L., Qin, Z., Sun, X., Ren, K., A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing (2016) J IEEE Trans Inf Forensics Secur, 11 (11), pp. 2594-2608; Zezula, P., Amato, G., Dohnal, V., Batko, M., (2006) Similarity search: the metric space approach, advances in database systems, , Springer, Berlin; Zhou, Z., Wang, Y., Wu, Q.M.J., Yang, C.N., Sun, X., Effective and efficient global context verification for image copy detection (2017) J IEEE Trans Inf Forensics Secur, 12 (1), pp. 48-63; Zhou, Z., Wu, Q.M.J., Huang, F., Sun, X., Fast and accurate near-duplicate image elimination for visual sensor networks (2017) J Distrib Sens Netw, 13 (2), p. 1",
year = "2018",
doi = "10.1007/s11227-017-2207-3",
language = "English",
volume = "74",
pages = "2006--2034",
journal = "Journal of Supercomputing",
issn = "0920-8542",
publisher = "Springer New York LLC",
number = "5",

}

TY - JOUR

T1 - A graph-based cache for large-scale similarity search engines

AU - Gil-Costa, V.

AU - Marin, M.

AU - Bonacic, C.

AU - Solar, R.

N1 - Export Date: 7 June 2018 CODEN: JOSUE Correspondence Address: Gil-Costa, V.; Universidad Nacional de San LuisArgentina; email: gvcosta@unsl.edu.ar Funding details: ID15I10560 Funding details: PICT 2014 N 2014-01146 Funding details: FB0001 Funding text: Acknowledgements This research was supported by the supercomputing infrastructure of the NLHPC Chile, partially funded by CONICYT Basal Funds FB0001, Fondef ID15I10560, and partially funded by PICT 2014 N 2014-01146. References: Al-Fares, M., Loukissas, A., Vahdat, A., A scalable, commodity data center network architecture (2008) SIGCOMM Comput Commun Rev, 38 (4), pp. 63-74; Amato, G., Esuli, A., Falchi, F., Pivot selection strategies for permutation-based similarity search (2013) SISAP, pp. 91-102; Amato, G., Esuli, A., Falchi, E., A comparison of pivot selection techniques for permutation-based indexing (2015) J Inf Syst, 52 (C), pp. 176-188; Amato, G., Savino, P., Approximate similarity search in metric spaces using inverted files (2008) Infoscale, 1-28, pp. 10-28; Andoni, A., Indyk, P., Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions (2008) J Commun ACM, 51 (1), pp. 117-122; Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y., An optimal algorithm for approximate nearest neighbor searching fixed dimensions (1998) J ACM, 45 (6), pp. 891-923; Baeza-Yates, R., Ribeiro-Neto, B., (2011) Modern information retrieval, , 2, Addison-Wesley Publishing Company, Reading; Brisaboa, N.R., Cerdeira-Pena, A., Gil-Costa, V., Marín, M., Pedreira, O., Efficient similarity search by combining indexing and caching strategies (2015) SOFSEM, pp. 486-497; Burkhard, W.A., Keller, R.M., Some approaches to best-match file searching (1973) J Commun ACM, 4 (16), pp. 230-236; Bustos, B., Navarro, G., Chávez, E., Pivot selection techniques for proximity searching in metric spaces (2003) J Pattern Recognit Lett, 24 (14), pp. 2357-2366; Bustos, B., Pedreira, O., Brisaboa, N., A dynamic pivot selection technique for similarity search (2008) SISAP, pp. 394-401; Cao, W., Sahin, S., Liu, L., Bao, X., Evaluation and analysis of in-memory key-value systems (2016) Bigdata, pp. 26-33; Chávez, E., Figueroa, K., Navarro, G., Effective proximity retrieval by ordering permutations (2008) J Pattern Anal Manag Intell, 30, pp. 1647-1658; Chávez, E., Ludueña, V., Reyes, N., Roggero, P., Faster proximity searching with the distal SAT (2016) J Inf Syst, 59, pp. 15-47; Chávez, E., Marroquin, J., Navarro, G., Fixed queries array: a fast and economical data structure for proximity searching (2001) J Multimed Tools Appl, 14 (2), pp. 113-135; Chávez, E., Navarro, G., A compact space decomposition for effective metric indexing (2005) J Pattern Recogn Lett, 26 (9), pp. 1363-1376; Chierichetti, F., Kumar, R., Vassilvitskii, S., Similarity caching (2009) SIGMOD-SIGACT-SIGART, pp. 127-136; Ciaccia, P., Patella, M., PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces (2000) ICDE, pp. 244-255; Ciaccia, P., Patella, M., Zezula, P., (1997) M-Tree: An Efficient Access Method for Similarity Search in Metric Spaces, pp. 426-435. , VLDB; Dehne, F., Noltemeier, H., Voronoi trees and clustering problems (1988) Syntactic and Structural, Pattern Recognition, pp. 185-194; Egecioglu, Ö., Ferhatosmanoglu, H., Ogras, Ü.Y., Dimensionality reduction and similarity computation by inner-product approximations (2004) IEEE Trans Knowl Data Eng, 16 (6), pp. 714-726; Esuli, A., Mipai: Using the pp-index to build an efficient and scalable similarity search system (2009) SISAP, pp. 146-148; Esuli, A., Pp-index: Using permutation prefixes for efficient and scalable similarity search (2010) SEBD, pp. 318-325; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., A metric cache for similarity search (2008) LSDS-IR, pp. 43-50; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., Caching content-based queries for robust and efficient image retrieval (2009) EDBT, pp. 780-790; Falchi, F., Lucchese, C., Orlando, S., Perego, R., Rabitti, F., Similarity caching in large-scale image retrieval (2011) J Inf Process Manag, 48 (5), pp. 803-818; Faloutsos, C., Lin, K.-I., Fastmap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets (1995) SIGMOD, pp. 163-174; Ferhatosmanoglu, H., Tuncel, E., Agrawal, D., El Abbadi, A., Approximate nearest neighbor searching in multimedia databases (2001) ICDE, pp. 503-511; Figueroa, K., Paredes, R., Boosting the permutation based index for proximity searching (2015) MCPR, pp. 103-112; Gennaro, C., Amato, G., Bolettieri, P., Savino, P., An approach to content-based image retrieval based on the lucene search engine library (2010) ECDL, pp. 55-66; Gessert, F., Wingerath, W., Friedrich, S., Ritter, N., Nosql database systems: A survey and decision guidance (2017) J Comput Sci R&D, 32 (3-4), pp. 353-365; Gil-Costa, V., Marin, M., Approximate distributed metric-space search (2011) LSDS-IR, pp. 15-20; Gil-Costa, V., Marin, M., Reyes, N., Parallel query processing on distributed clustering indexes (2009) J Discrete Algorithms, 7 (1), pp. 3-17; Gil-Costa, V., Santos, R.L.T., Macdonald, C., Ounis, I., Modelling efficient novelty-based search result diversification in metric spaces (2013) J Discrete Algorithms, 18, pp. 75-88; Hersh, W., Turpin, A., Price, S., Chan, B., Kramer, D., Sacherek, L., Olson, D., Do batch and user evaluations give the same results? (2000) In: SIGIR, pp. 17-24; Indyk, P., Motwani, R., Approximate nearest neighbors: Towards removing the curse of dimensionality (1998) ACM Symposium on Theory of Computing, pp. 604-613; Ingwersen, P., Järvelin, K., (2005) The turn: integration of information seeking and retrieval in context (The Information Retrieval Series), , Springer, New York Inc, Secaucus; Johnston, N., Vincent, D., Minnen, D., Covell, M., Singh, S., Chinen, T.T., Hwang, S.J., Toderici, G., Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks (2017) Corr; Karypis, G., (2003) Cluto-Software for Clustering High-Dimensional Datasets, , http://glaros.dtc.umn.edu/gkhome/views/cluto, version 2.1.1; Lux, M., Chatzichristofis, S.A., Lire: Lucene image retrieval: An extensible java cbir library (2008) Conference on Multimedia, pp. 1085-1088; Macqueen, J.B., Some methods for classification and analysis of multivariate observations (1967) Berkeley Symposium on Mathematical Statistics and Probability, 1, pp. 281-297; Mancini, V., Bustos, F., Gil-Costa, V., Printista, A.M., Data partitioning evaluation for multimedia systems in hybrid environments (2012) 3PGCIC, pp. 321-326; Marin, M., Ferrarotti, F., Gil-Costa, V., Distributing a metric-space search index onto processors (2010) ICPP, pp. 13-16; Marin, M., Gil-Costa, V., Uribe, R., Hybrid index for metric space databases (2008) ICCS, pp. 327-336; Matej, A., Vlastislav, D., Optimizing query performance with inverted cache in metric spaces (2016) ADBIS, pp. 60-73; Micó, M.L., Oncina, J., Vidal, E., A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements (1994) J Pattern Recognit Lett, 15 (1), pp. 9-17; Navarro, G., (2002) Searching in Metric Spaces by Spatial Approximation, pp. 28-46. , VLDB; Navarro, G., Reyes, N., Fully dynamic spatial approximation trees (2002) SPIRE, pp. 254-270; Navarro, G., Reyes, N., Dynamic spatial approximation trees for massive data (2009) SISAP, pp. 81-88; Novak, D., Batko, M., Metric index: An efficient and scalable solution for similarity search (2009) SISAP, pp. 65-73; Novak, D., Batko, M., Zezula, P., Large-scale similarity data management with distributed metric index (2012) J Inf Process Manag, 48 (5), pp. 855-872; Novak, D., Zezula, P., PPP-codes for large-scale similarity searching (2016) Database and Expert-Systems Applications on Transactions on Large-Scale Data- and Knowledge-Centered Systems, pp. 61-87; Pedreira, O., Brisaboa, N.R., Sofsem (2007) Theory and Practice of Computer Science, pp. 434-445; Ogras, Ü.Y., Ferhatosmanoglu, H., Dimensionality reduction using magnitude and shape approximations (2003) CIKM, pp. 99-107; Pan, Z., Lei, J., Zhang, Y., Sun, X., Kwong, S., Fast motion estimation based on content property for low-complexity H.265/HEVC encoder (2016) J IEEE Trans Broadcast, 62 (3), pp. 675-684; Pandey, S., Broder, A., Chierichetti, F., Josifovski, V., Kumar, R., Vassilvitskii, S., Nearest-neighbor caching for content-match applications (2009) WWW, pp. 441-450; Pramanik, S., Alexander, S., Li, J., An efficient searching algorithm for approximate nearest neighbor queries in high dimensions (1999) IEEE Multimed Comput Syst, 1, pp. 865-869; Raghavendra, S., Nithyashree, K., Geeta, C.M., Buyya, R., Venugopal, K.R., Iyengar, S.S., Patnaik, L.M., RSSMSO rapid similarity search on metric space object stored in cloud environment (2016) J Organ Collect Intell, 6 (3), pp. 33-49; Ruqeishi, K., Koneuay, M., Regrouping metric-space search index for search engine size adaptation (2015) Similarity Search and Applications, pp. 271-282; Saavedra, J.M., Barrios, J.M., Sketch based image retrieval using learned keyshapes (LKS) (2015) British Machine Vision Conference, pp. 1-164; Skala, M., (2009) Counting distance permutations. J Discrete Algorithms, 7 (1), pp. 49-61; Skillicorn, D.B., Hill, J.M.D., McColl, W.F., Mpeg-7 (2000) Multimedia Content Description Interfaces, Part 3: Visual. Technical Report ISO/IEC 15938-3; Skopal, T., Lokoc, J., Bustos, B., D-cache: universal distance cache for metric access methods (2012) J Trans Knowl Data Eng, 24 (5), pp. 868-881; Solar, R., Gil-Costa, V., Marín, M., Evaluation of static/dynamic cache for similarity search engines (2016) SOFSEM, pp. 615-627; Sadit Tellez, E., Chvez, E., The list of clusters revisited (2012) Pattern Recognition, pp. 187-196; Wang, X., Wang, J.T.L., Lin, K.-I., Shasha, D., Shapiro, B.A., Zhang, K., An index structure for data mining and clustering (2000) J Knowl Inf Syst, 2, pp. 161-184; Weber, R., Böhm, K., Trading quality for time with nearest neighbor search (2000) Extending Database Technology: Advances in Database Technology, pp. 21-35; Wei, W., Fan, X., Song, H., Fan, X., Yang, J., Imperfect information dynamic stackelberg game based resource allocation using hidden Markov for cloud computing (2017) J IEEE Trans Serv Comput PP, (99), p. 1; White, D., Jain, R., (1996) Algorithms and Strategies for Similarity Retrieval, , Technical Report VCL-96-101, Visual Computing Laboratory, University of California San Diego; Xia, Z., Wang, X., Zhang, L., Qin, Z., Sun, X., Ren, K., A privacy-preserving and copy-deterrence content-based image retrieval scheme in cloud computing (2016) J IEEE Trans Inf Forensics Secur, 11 (11), pp. 2594-2608; Zezula, P., Amato, G., Dohnal, V., Batko, M., (2006) Similarity search: the metric space approach, advances in database systems, , Springer, Berlin; Zhou, Z., Wang, Y., Wu, Q.M.J., Yang, C.N., Sun, X., Effective and efficient global context verification for image copy detection (2017) J IEEE Trans Inf Forensics Secur, 12 (1), pp. 48-63; Zhou, Z., Wu, Q.M.J., Huang, F., Sun, X., Fast and accurate near-duplicate image elimination for visual sensor networks (2017) J Distrib Sens Netw, 13 (2), p. 1

PY - 2018

Y1 - 2018

N2 - Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%. © 2017, Springer Science+Business Media, LLC, part of Springer Nature.

AB - Large-scale similarity search engines are complex systems devised to process unstructured data like images and videos. These systems are deployed on clusters of distributed processors communicated through high-speed networks. To process a new query, a distance function is evaluated between the query and the objects stored in the database. This process relays on a metric space index distributed among the processors. In this paper, we propose a cache-based strategy devised to reduce the number of computations required to retrieve the top-k object results for user queries by using pre-computed information. Our proposal executes an approximate similarity search algorithm, which takes advantage of the links between objects stored in the cache memory. Those links form a graph of similarity among pre-computed queries. Compared to the previous methods in the literature, the proposed approach reduces the number of distance evaluations up to 60%. © 2017, Springer Science+Business Media, LLC, part of Springer Nature.

KW - Approximate similarity search

KW - Distributed large-scale search engines

KW - Metric space cache

KW - Cache memory

KW - Graphic methods

KW - HIgh speed networks

KW - Query processing

KW - Set theory

KW - Topology

KW - Distance functions

KW - Graph-based

KW - Metric spaces

KW - Scale similarity

KW - Similarity search

KW - Unstructured data

KW - User query

KW - Search engines

U2 - 10.1007/s11227-017-2207-3

DO - 10.1007/s11227-017-2207-3

M3 - Article

VL - 74

SP - 2006

EP - 2034

JO - Journal of Supercomputing

T2 - Journal of Supercomputing

JF - Journal of Supercomputing

SN - 0920-8542

IS - 5

ER -