Asian Journal of Engineering, Sciences & Technology

A Systematic Approach to Retrieve Semantically Similar Documents

Research Article 1
Asian Journal of Engineering, Sciences and Technology - Volume 5, Issue 1 2015
By Tariq Hussain Laghari, Syed Sajjad Hussain, Manzoor Hashmani

Finding required information on any particular topic give rise information retrieval (IR) problems. Information retrieval techniques resolved these issues but these techniques are unable to extract opinions from unstructured text.Hence, for this problem opinion mining techniques are utilized. Opinion mining extracts opinions from unstructured data source efficiently but it extracts opinions with polarity either negative or positive. The single topic or product where one opinion is similar to another, opinion mining is unable to detect semantic similarity between opinions which are semantically similar. There are other tools available too (Turnitin, Viper, PlagScan) for retrieving the text or documents on the basis of text syntax.However, a system is required which retrieves text with respect to semantics of the text or opinions despite of text syntax. In this study we proposed a system to retrieve text on basis of semantic similarity among text or opinions from a collection of texts (entire corpus). This system utilizes Latent Semantic Indexing to find the semantic similarity in Recognizing Textual Entailment (RTE)dataset. Features of text dataset (RTE) are term frequencies obtained from term document matrix.Implementation of our proposed system showed convincing results by retrieving semantically similar documents from RTE-3 dataset. We provide multiple queries and system retrieved documents similar to each query respectively. In addition we analysed input parameters affecting the output results which can be understood from variation in evaluation matrices that are Precision and Recall.

Share this paper


Want to publish in Asian Journal of Engineering, Sciences and Technology?
Send us your paper for review
Curl Error: Peer reports incompatible or unsupported protocol version.
296
Authors
141
Research Papers
0
Citations