ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Discourse analysis based segregation of relevant document segments for knowledge acquisition

Madhusudanan, N and Chakrabarti, Amaresh and Gurumoorthy, B (2016) Discourse analysis based segregation of relevant document segments for knowledge acquisition. In: AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 30 (4, SI). pp. 446-465.

[img] PDF
Restricted to Registered users only

Download (1MB) | Request a copy
Official URL: http://dx.doi.org/10.1017/S0890060416000408


Documents are a useful source of expert knowledge in organizations and can be used to foresee, in an earlier stage of a product's life cycle, potential issues and solutions that might occur in later stages of its life cycle. In this research, these stages are, respectively, design and assembly. Even if these documents are available online, it is rather difficult for users to access the knowledge contained in these documents. It is therefore desirable to automatically extract the knowledge contained in these documents and store them in a computer accessible or manipulable form. This paper describes an approach for the first step in this acquisition process: automatically identifying segments of documents that are relevant to aircraft assembly, so that they can be further processed for acquiring expert knowledge. Such identification of relevant segments is necessary for avoiding processing of unrelated information that is costly and possibly distracting for domain relevance. The approach to extracting relevant segments has two steps. The first step is the identification of sentences that form a coherent segment of text, within which the topic does not shift. The second step is to classify segments that are within the topics of interest for knowledge acquisition, that is, aircraft assembly in this instance. These steps filter out segments that are unrelated, and therefore need not be processed for subsequent knowledge acquisition. The steps are implemented by understanding the contents of documents. Using methods of discourse analysis, in particular, discourse representation theory, a list of discourse entities is obtained. The difference in discourse entities between sentences is used to distinguish between segments. The list of discourse entities in a segment is compared against a domain ontology for classification. The implementation and results of validation on sample texts for these steps are described.

Item Type: Journal Article
Additional Information: Copy right for this article belongs to the CAMBRIDGE UNIV PRESS, 32 AVENUE OF THE AMERICAS, NEW YORK, NY 10013-2473 USA
Department/Centre: Division of Mechanical Sciences > Centre for Product Design & Manufacturing
Date Deposited: 30 Dec 2016 05:54
Last Modified: 30 Dec 2016 05:54
URI: http://eprints.iisc.ac.in/id/eprint/55536

Actions (login required)

View Item View Item