Summaries of text (summary of info received)!

From: Marcos E Casa
Subject: Summaries of text (summary of info received)!
Date: Tue, 27 Feb 1996 00:00:00 +0000
Message-ID: <4guqeg$63p@percy.cs.bham.ac.uk>

Hello again,

Recently I posted a request for information on "automatic summarization"
of natural language text. These are the contributions I've received.

Thank you all !

-----------------------
Marti Hearst gave a talk at CMU about this topic.  Try a Lycos search on
this topic and her name. --Greg Aist
·····@andrew.cmu.edu

-----------------------
British Telecom have a text summariser, but give no technical details.
You can find a demo-able version at:
    http://www.labs.bt.com/innovate/informat/netsumm/index.htm
    -- Tim
····@festival.ed.ac.uk

-----------------------
Our group has been working on automatic text summarization for last
several years. Here are a few papers:

Automatic Text Decomposition Using Text Segments and Text
Themes. Gerard Salton, Amit Singhal, Chris Buckley, and Mandar Mitra,
Hypertext '96 (to appear). Also Technical Report TR95-1555, Department
of Computer Science, Cornell University. (Available from
http://www.cs.cornell.edu)

Automatic Text Decomposition and Structuring. Gerard Salton, James
Allan, and Amit Singhal, Information Processing and Management, 32(2),
127-138, 1996.

Automatic Analysis, Theme Generation, and Summarization of Machine
Readable Texts. Gerard Salton, James Allan, Chris Buckley, and Amit
Singhal, Science 264 (3 June, 1994), 1421-1426.

Best Wishes,
- Amit
·······@CS.Cornell.EDU

-----------------------
See http://www.vyp.com/miti for a text extraction system that can
function as a summarization system.

Ken Ewell
·······@imsworld.com (Ken Ewell)

-----------------------
See,
   http://www.labs.bt.com/netsumm/index.htm

for an example of a summarizer.  There's an article in there, too.

There has been some work out of Cornell, too, from Salton's lab (from
when I was there).  Salton died this past summer, so not a lot of new
work is coming out of there, but you can get a flavor of the
approaches that were being taken from these.

    G. Salton and J. Allan.  ``Automatic Text Decomposition and
    Structuring''.  RIAO '94.

    G. Salton and J. Allan.  ``Selective Text Utilization and Text
    Traversal''.  Proceedings of the {\em Fifth Annual ACM Conference on
    Hypertext}, November 1993, pp 131-144.  Also Cornell Computer Science
    Technical Report 93--1366.

    G. Salton and J. Allan.  ``Selective Text Utilization and Text
    Traversal''.  International Journal of Human-Computer
    Studies, v.{\bf 43}, pp.~483-497, 1995.

    G. Salton, C. Buckley, and J. Allan.  ``Automatic Structuring and
    Retrieval of Large Text Files''.  {\em Communications of
    the ACM\/}, February, 1994.  Also Cornell Computer Science Technical
    Report 92--1286.

Good luck.
James Allan <·····@cs.umass.edu>

-----------------------
Here are some references to get you going, in pseudo-BiBTeX
format.

A good survey paper is:

  author="C. D. Paice",
  title="Constructing Literature Abstracts by Computer:
         Techniques and Prospects",
  journal="Information Processing and Management",
  year="1990",
  volume="26",
  number="1",
  pages="171--186"

This will suggest many other references. For a more theoretical
discussion, see:

  author="Sparck Jones, K.",
  title="What might be in a Summary?",
  booktitle="Proceedings of the German Information Retrieval
             Conference",
  year="1993"

Particular summarising methods:

word-frequency methods:

  author="H. P. Luhn",
  title="The Automatic Creation of Literature Abstracts",
  editor="Schultz",
  booktitle="H. P. Luhn: Pioneer of Information Science",
  publisher="Spartan",
  year="1968"

  author="S. Williams and K. Preston",
  title="Managing the Information Overload",
  journal="Physics in Business",
  publisher="Institute of Physics",
  year="1994"

  (this system -- BT's NetSumm -- can be tried at
     http://www.labs.bt.com/innovate/informat/netsumm/index.htm)

  author="E. F. Skorochod'ko",
  title="Adaptive Method of Automatic Abstracting and Indexing",
  booktitle="Information Processing 71",
  year="1971",
  pages="1179-1182"

  @techreport
  author="M. Benbrahim and K. Ahmad",
  title="Computer-aided Lexical Cohesion Analysis and Text Abridgement",
  series="Knowledge Processing",
  number="18",
  institution="University of Surrey",
  year="1994"


c(l)ue phrase methods:

  author="J. E. Rush and R. Salvador and A. Zamora",
  title="Automatic Abstracting and Indexing. {II}. {P}roduction of Indicative
         Abstracts by Application of Contextual Inference and Syntactic
         Coherence Criteria",
  journal="Journal of the American Society for Information Science",
  year="1971",
  month="July",
  pages="260--274"

  author="C. D. Paice",
  title="The Automatic Generation of Literature Abstracts: an Approach
         based on the Identification of Self-indicating Phrases",
  booktitle="Information Retrieval Research",
  editor="R. N. Oddy and S. E. Robertson and C. J. van Rijsbergen
          and P. W. Williams",
  year="1981",
  publisher="Butterworths",
  pages="172--191"

methods which use domain-knowledge:

  author="G. F. DeJong",
  title="An overview of the FRUMP system",
  editor="Lehnert and Ringle",
  booktitle="Strategies for Natural Language Processing",
  publisher="Erlbaum",
  address="Hillsdale HJ",
  year="1982"

  @techreport
  author="J. I. Tait",
  title="Automatic Summarizing of {E}nglish Texts",
  number="47",
  note="PhD thesis",
  institution="University of Cambridge Computer Laboratory",
  year="1983"

As far as I am aware, there are very few comparative studies.
Here is one:

  @techreport
  author="P. Gladwin and S. Pulman and Sparck Jones, K.",
  title="Shallow Processing and Automatic Summarising: a First Study",
  number="223",
  institution="University of Cambridge Computer Laboratory",
  year="1991"

You may also be interested in theories of how people summarise
text as they read it, in which case take a look at:

  author="T. A. van Dijk and W. Kintsch",
  title="Strategies of Discourse Comprehension",
  publisher="Academic Press",
  address="New York",
  year="1983"

  author="W. Kintsch and T. A. van Dijk",
  title="Toward a Model of Text Comprehension and Production",
  journal="Psychologial Review",
  year="1978",
  volume="85",
  number="5",
  pages="363--394"

For information about how professional abstracters work, there is
lots of good work by Liddy, for example:

  author="E. D. Liddy",
  title="The Discourse-level Structure of Empirical Abstracts: an
         Exploratory Study",
  journal="Information Processing and Management",
  year="1991",
  volume="27",
  number="1",
  pages="55--81"

  author="E. D. Liddy and S. Bonzi and J. Katze and E. Oddy",
  title="A Study of Discourse Anaphora in Scientific Abstracts",
  journal="Journal of the American Society for Information Science",
  year="1987",
  volume="38",
  number="4",
  pages="255--261"

Richard/
Richard Tucker <··············@cl.cam.ac.uk>