Introduction

As the amount of electronically available data increases the need for creating human readable presentations for various purposes will increase as well. One possibility then is to use Natural Language Generation (NLG), which is the activity of generating text from some kind of source.

Within the MedView project, NLG has been used for several years to literally thousands of documents from formalised electronic medical data. Documents are created from templates that the end-users can create and modify themselves without having any knowledge of linguistics or programming. We call the text generation system developed to generate documents mGen.

mGen is a Java-based framework for generating documents from data. Since the most important goal during development has been to build a system that is easy for users to understand a rather basic approach to NLG was chosen. Close to a simple mail-merge system, mGen can be classified as a slot-and-filler, or canned-text with knowledge base references system.

There are two motivations for open-sourcing mGen. First, to let others use and adapt the framework and second to facilitate for others to use it to develop more advance language generation systems. Since mGen is implemented in Java it runs on any platform supporting Java 1.4.1 or later.

To create a document using mGen three things are needed: (i) a value container providing the data on which the document should be based (ii) a template which can be seen as a document with slots to be filled in, and (iii) a translator that maps data into words or phrases to insert into the template. The template and translator can be created programmatically or with a special editor. mGen also provides method to save and load templates and translators as XML-files.

SourceForge.net Logo