Algorithm can sum up texts in any language

Algorithm can sum up texts in any language

Ben-Gurion University researchers say new software works automatically, in a variety of languages, for quick processing by search engines and readers

Illustrative image of a robot reading a book in library (PhonlamaiPhoto; iStock by Getty Images)
Illustrative image of a robot reading a book in library (PhonlamaiPhoto; iStock by Getty Images)

Researchers at the Ben-Gurion University of the Negev said they have developed software that can automatically summarize texts, in a variety of languages, to help readers go through articles, magazines, databases and academic research faster and more efficiently.

A huge increase of online textual data, combined with the fact that people are always short of time, has created the need for an automated method for extracting key points from texts such as articles or interviews for further processing.

Most solutions available today are language dependent and require training the algorithms on large volumes of text, the statement said.

The new software, invented by Prof. Mark Last, Dr. Marina Litvak, and Dr. Menahem Friedman at the Department of Software and Information Systems Engineering of Ben-Gurion University, provides language-independent summaries of texts, based on an optimization algorithm that uses the process of natural selection, a so-called “genetic algorithm.”

The software selects a subset of the most relevant sentences from a source text, ranks them by a relevance score that is independent of language, and selects the top-ranking sentences into a summary, said Prof. Last in the statement issued by BGN Technologies, the technology transfer arm of the university.

The ability to quickly summarize large quantities of text in a language-independent manner “is crucial” for search engines as well as other end-users, such as researchers, libraries and the media, he said.

The method, called MUSE – Multilingual Sentence Extractor, was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish. The results showed a high level of similarity to human-generated summaries, the statement said.

The scientists trained the algorithms with a group of documents that each had several human generated summaries. After the training ended, the researchers found that the software did not need to be retrained with summarized documents for each new language, and the same sentence-ranking model they developed could be used across several languages.

Zafrir Levy, senior VP Business Development, BGN Technologies, said the tool would be “a valuable addition to our ability to benefit from the vast amounts of text available online.”

BGN Technologies has filed a patent to protect the technology, and is looking for potential partners for further development and commercialization.

read more: