International Journal of Scientific & Technology Research

Home About Us Scope Editorial Board Blog/Latest News Contact Us
10th percentile
Powered by  Scopus
Scopus coverage:
Nov 2018 to May 2020


IJSTR >> Volume 4 - Issue 12, December 2015 Edition

International Journal of Scientific & Technology Research  
International Journal of Scientific & Technology Research

Website: http://www.ijstr.org

ISSN 2277-8616

Exploring The Use Of Hybrid Similarity Measure For Author Name Disambiguation

[Full Text]



Tasleem Arif



Index Terms: Name disambiguation, token-based, string-based, hybrid similarity, digital libraries, publications, metadata.



Abstract: Name disambiguation has become one of the hard to crack problem in a virtual setup. With each passing day more and more entities with identical features are emerging online making it quite difficult to distinguish them. Digital libraries face similar problems in differentiating publications of similar looking authors. This leads to incorrect attribution of publications, thus making the entire effort of indexing publications of individual authors ineffective. This paper proposes a two stage hybrid similarity computation mechanism that combines the best of both the worlds. The proposed method use a token-based similarity score in this first stage of comparison and based on the results of the first stage it uses a character-based similarity score in the second stage. Experimental results obtained on standard datasets indicate that the proposed technique shows a lot of improvements over the existing methods.



[1] Tang, L. and Walsh, J. P. (2010) “Bibliometric fingerprints: Name disambiguation based on approximate structure equivalence of cognitive maps.” Scientometrics, 84(3), pp. 763-784.

[2] Lee, D., On, B.-W., Kang, J. and Park, S. (2005) “Effective and scalable solutions for mixed and split citation problems in digital libraries.” In Proceedings of the 2nd International Workshop on Information Quality in Information Systems, Baltimore, MD, USA, ACM Press, pp. 69-76.

[3] Arif, T., Ali, R. and Asger, M. (2015) “A multistage hierarchical method for author name disambiguation.” International Journal of Information Processing, 9(3), pp. 92-105.

[4] Bilenko, M. and Mooney, R. J. (2003) “Adaptive duplicate detection using learnable string similarity measures.” In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, USA, pp. 39-48.

[5] Arif, T. (2015) “Social network extraction using web mining techniques.” Ph.D. Thesis, Department of Computer Sciences, BGSB University Rajouri. Available online at: http://shodhganga.inflibnet.ac.in:8080/jspui/bitstream/10603/56053/4/chapter-3.pdf

[6] Ferreira , A. A., Gonçalves M. A., and Laender, A.H.F. (2012) “A brief survey of automatic methods for author name disambiguation.” ACM SIGMOD Record, 41(2), pp. 15-26.

[7] Jonnalagadda S. and Topham, P. (2010). “NEMO: Extraction and normalization of organization names from PubMed affiliation strings.” Journal of Biomedical Discovery and Collaboration, 5, pp. 50-75.

[8] French, J., Powell, A., Schulman, E. and Pfaltz, J. (1997) “Automating the construction of authority files in digital libraries: a case study.” Research and Advanced Technology for Digital Libraries, Lecture Notes in Computer Science, 1324, pp. 55–71.

[9] French, J. C., Powell, A., & Schulman, E. (2000) “Using clustering strategies for creating authority files.” Journal of the American Society for Information Science, 51, pp. 774–786.

[10] Torvik, V.I., Weeber, M., Swanson, D.R., & Smalheiser, N.R. (2005). “A probabilistic similarity metric for Medline records: A model for author name disambiguation.” Journal of the American Society for Information Science and Technology, 56(2), pp. 140–158.

[11] Han, H., Zha, H. and Giles, C. L. (2005) “Name disambiguation in author citations using a k-way spectral clustering method.”In Proceedings of Joint Conference on Digital Libraries, Denver, Colorado, USA, pp. 334–343.

[12] Cota, R.G., Ferreira, A.A., Nascimento, C., Gonçalves, M.A., and Laender, A.H.F. (2010) “An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations.” Journal of the American Society for Information Science and Technology, 61(9), pp. 1853–1870.

[13] Tang, J., Fong, A.C.M., Wang, B., and Zhang, J. (2012) “A unified probabilistic framework for name disambiguation in digital library.” IEEE Transactions on Knowledge and Data Engineering, 24(6), pp. 975-987.

[14] Accomazzi, A., Eichhorn, G., Kurtz, M.J., Grant, C.S. and Murray, S.S. (1997) “The ADS article service data holdings and access methods.” In G. Hunt and H. Payne, editors, Astronomical Data Analysis Software and Systems VI, 125 of A.S.P. Conference Series, pp. 357-360.