GALATEAS’ goal is to offer providers of digital information an innovative approach allowing them to better understand the behavior of their users via the analysis of textual information found in transactions logs. This information allows GALATEAS users to improve navigation as well as multilingual content via their web site
Objectives of GALATEAS
Analyze query logs: Analyze logs containing the queries of search engines of a content provider in order to produce progress reports on users having access to this particular aggregation. The analysis is based on linguistic and statistical information.
Translation of queries: Translating queries comes from an external search engine with several targeted languages. The engine uses these translations in order to return results in languages other than that of the initial query. The languages chosen for GALATEAS are: Italian, English, German, Dutch, Modern Arabic and Polish.
Innovation in GALATEAS
The main goal of GALATEAS is to assemble a set of innovative technologies in order to derive a simple and cost effective solution to the challenges raised by multilingual query log analysis and query translation.
To achieve this, GALATEAS proposes the development of a system based on the following three building blocks
The log analysis subsystem: this implements the LangLog service by providing language based log analysis;
The MT training subsystem: this performs machine translation training based on received query logs;
The query translation subsystem: this implements the QueryTrans service and translates queries into several languages by using the appropriately trained MT system.
Innovation Query Log Analysis
Contrary to mainstream offers in the field, GALATEAS services will not consider standard structured information of web logs (e.g. click rate, visited pages, user's paths inside the document tree) but the information contained in queries from the point of view of language interpretation.
Making sense of short queries and translating them into conceptual units will enable administrators and managers to answer questions such as: "What are the topics that are most commonly searched in my collection, in a given language?"; "How do these topics relate to my catalogue?"; "Which topics (people, places) are most popular among my users?".
Innovation in Query translation
From a machine translation point of view, GALATEAS will investigate statistical machine translation technologies with the objective of providing meaningful results for short, decontextualized texts with little sentence structure, as is the case of search engine queries.
The tight integration between the GALATEAS infrastructure and digital systems occurs by combining symbolic and statistic technologies of the automatic language treatment with systems of information extraction provided in the form of web services.
Indirect "users" of GALATEAS services are information seekers, who would benefit from improved, multilingual, search services. However, GALATEAS services are not provided directly to users but to administrators and managers of digital content federations and search engines. Thus, GALATEAS targets a high end B2B market where customers are mostly represented by organizations managing middle to large size groups of content.
The project will serve the following needs
- Need for group managers to understand what users are looking for, regardless of the content accessed.
- Need for content providers to understand how a collection should be extended.
- Need for library administrators to understand the categories in the catalogues that fit more/less the desires of users.
- Need for library managers to understand user behavior.
- Need for all of the above roles to provide multilingual information without changing the way in which documents are indexed and managed.
VISEO's role in GALATEAS
At the heart of GALATEAS, VISEO is principally implicated in the analysis of transactional logs, the search of information as well as integration activities. VISEO is strongly implicated in Business Intelligence, commercial activities and the exploitation of product tools.
The eight partners of the GALATEAS project come from five countries: France, Germany, The Netherlands, Italy and The United Kingdom.
- Xerox Research Center Europe, France (Coordinator)
- CELI - Language and Information Technology, Italy
- VISEO, France
- Bridgeman Art Library - Digital Images of Art, UK
- Gonetwork srl, Italy
- University of Amsterdam, Netherlands
- Berlin School of Library and Information Science Humboldt-Universität zu Berlin, Germany
For more information, see http://www.galateas.eu/fra/index-fr.html
Before joining the VISEO group, Frédérique SEGOND was Principal Scientist and manager of the Parsing and Semantics area (ParSem) at the Xerox Research Centre Europe. She joined Xerox as a research scientist in 1993 and worked on LOCOLEX, an intelligent dictionary lookup (European project COMPASS). Afterwards, within the PARGRAM (Parallel Grammars) project, she was responsible for the design and implementation of a French Lexical Functional Grammar (LFG). She then led the Lexical Sense Disambiguation LSD project where she worked on the EAGLES, ROMANSEVAL and EuroWordNet projects.
Throughout her research career she has developed, worked and coordinated about twenty collaborative research projects such as ALADIN, Europeanna, CACAO and Galateas.
Frédérique earned a PhD in Applied Mathematics at the Ecole des Hautes Etudes in Socil Sciences in Paris where she implemented categorical French grammar at IBM-France. After a one year post-doc at IBM-Watson in Yorktown working on the links between syntax and semantics, she was in charge of starting a research and teaching activity in a French Telecom school.
Frederique is co-author of six books, over 50 scientific papers and 5 patents. She belongs to several scientific committees. She is a member of the CONTINT steering committee of the French National Agency for Research (ANR), President of the Association for Computational Linguistics (ATALA), member of the ELRA board (European Language Resources Association), member of the board of the University Stendhal, and also provides scientific expertise to the European Commission. She is also on the “qualification list” of Professors of the University of Paris 7.