Hello again,
Recently I posted a request for information on "automatic summarization"
of natural language text. These are the contributions I've received.
Thank you all !
-----------------------
Marti Hearst gave a talk at CMU about this topic. Try a Lycos search on
this topic and her name. --Greg Aist
·····@andrew.cmu.edu
-----------------------
British Telecom have a text summariser, but give no technical details.
You can find a demo-able version at:
http://www.labs.bt.com/innovate/informat/netsumm/index.htm
-- Tim
····@festival.ed.ac.uk
-----------------------
Our group has been working on automatic text summarization for last
several years. Here are a few papers:
Automatic Text Decomposition Using Text Segments and Text
Themes. Gerard Salton, Amit Singhal, Chris Buckley, and Mandar Mitra,
Hypertext '96 (to appear). Also Technical Report TR95-1555, Department
of Computer Science, Cornell University. (Available from
http://www.cs.cornell.edu)
Automatic Text Decomposition and Structuring. Gerard Salton, James
Allan, and Amit Singhal, Information Processing and Management, 32(2),
127-138, 1996.
Automatic Analysis, Theme Generation, and Summarization of Machine
Readable Texts. Gerard Salton, James Allan, Chris Buckley, and Amit
Singhal, Science 264 (3 June, 1994), 1421-1426.
Best Wishes,
- Amit
·······@CS.Cornell.EDU
-----------------------
See http://www.vyp.com/miti for a text extraction system that can
function as a summarization system.
Ken Ewell
·······@imsworld.com (Ken Ewell)
-----------------------
See,
http://www.labs.bt.com/netsumm/index.htm
for an example of a summarizer. There's an article in there, too.
There has been some work out of Cornell, too, from Salton's lab (from
when I was there). Salton died this past summer, so not a lot of new
work is coming out of there, but you can get a flavor of the
approaches that were being taken from these.
G. Salton and J. Allan. ``Automatic Text Decomposition and
Structuring''. RIAO '94.
G. Salton and J. Allan. ``Selective Text Utilization and Text
Traversal''. Proceedings of the {\em Fifth Annual ACM Conference on
Hypertext}, November 1993, pp 131-144. Also Cornell Computer Science
Technical Report 93--1366.
G. Salton and J. Allan. ``Selective Text Utilization and Text
Traversal''. International Journal of Human-Computer
Studies, v.{\bf 43}, pp.~483-497, 1995.
G. Salton, C. Buckley, and J. Allan. ``Automatic Structuring and
Retrieval of Large Text Files''. {\em Communications of
the ACM\/}, February, 1994. Also Cornell Computer Science Technical
Report 92--1286.
Good luck.
James Allan <·····@cs.umass.edu>
-----------------------
Here are some references to get you going, in pseudo-BiBTeX
format.
A good survey paper is:
author="C. D. Paice",
title="Constructing Literature Abstracts by Computer:
Techniques and Prospects",
journal="Information Processing and Management",
year="1990",
volume="26",
number="1",
pages="171--186"
This will suggest many other references. For a more theoretical
discussion, see:
author="Sparck Jones, K.",
title="What might be in a Summary?",
booktitle="Proceedings of the German Information Retrieval
Conference",
year="1993"
Particular summarising methods:
word-frequency methods:
author="H. P. Luhn",
title="The Automatic Creation of Literature Abstracts",
editor="Schultz",
booktitle="H. P. Luhn: Pioneer of Information Science",
publisher="Spartan",
year="1968"
author="S. Williams and K. Preston",
title="Managing the Information Overload",
journal="Physics in Business",
publisher="Institute of Physics",
year="1994"
(this system -- BT's NetSumm -- can be tried at
http://www.labs.bt.com/innovate/informat/netsumm/index.htm)
author="E. F. Skorochod'ko",
title="Adaptive Method of Automatic Abstracting and Indexing",
booktitle="Information Processing 71",
year="1971",
pages="1179-1182"
@techreport
author="M. Benbrahim and K. Ahmad",
title="Computer-aided Lexical Cohesion Analysis and Text Abridgement",
series="Knowledge Processing",
number="18",
institution="University of Surrey",
year="1994"
c(l)ue phrase methods:
author="J. E. Rush and R. Salvador and A. Zamora",
title="Automatic Abstracting and Indexing. {II}. {P}roduction of Indicative
Abstracts by Application of Contextual Inference and Syntactic
Coherence Criteria",
journal="Journal of the American Society for Information Science",
year="1971",
month="July",
pages="260--274"
author="C. D. Paice",
title="The Automatic Generation of Literature Abstracts: an Approach
based on the Identification of Self-indicating Phrases",
booktitle="Information Retrieval Research",
editor="R. N. Oddy and S. E. Robertson and C. J. van Rijsbergen
and P. W. Williams",
year="1981",
publisher="Butterworths",
pages="172--191"
methods which use domain-knowledge:
author="G. F. DeJong",
title="An overview of the FRUMP system",
editor="Lehnert and Ringle",
booktitle="Strategies for Natural Language Processing",
publisher="Erlbaum",
address="Hillsdale HJ",
year="1982"
@techreport
author="J. I. Tait",
title="Automatic Summarizing of {E}nglish Texts",
number="47",
note="PhD thesis",
institution="University of Cambridge Computer Laboratory",
year="1983"
As far as I am aware, there are very few comparative studies.
Here is one:
@techreport
author="P. Gladwin and S. Pulman and Sparck Jones, K.",
title="Shallow Processing and Automatic Summarising: a First Study",
number="223",
institution="University of Cambridge Computer Laboratory",
year="1991"
You may also be interested in theories of how people summarise
text as they read it, in which case take a look at:
author="T. A. van Dijk and W. Kintsch",
title="Strategies of Discourse Comprehension",
publisher="Academic Press",
address="New York",
year="1983"
author="W. Kintsch and T. A. van Dijk",
title="Toward a Model of Text Comprehension and Production",
journal="Psychologial Review",
year="1978",
volume="85",
number="5",
pages="363--394"
For information about how professional abstracters work, there is
lots of good work by Liddy, for example:
author="E. D. Liddy",
title="The Discourse-level Structure of Empirical Abstracts: an
Exploratory Study",
journal="Information Processing and Management",
year="1991",
volume="27",
number="1",
pages="55--81"
author="E. D. Liddy and S. Bonzi and J. Katze and E. Oddy",
title="A Study of Discourse Anaphora in Scientific Abstracts",
journal="Journal of the American Society for Information Science",
year="1987",
volume="38",
number="4",
pages="255--261"
Richard/
Richard Tucker <··············@cl.cam.ac.uk>