Text Analytics, Applied. In his keynote presentation on Text Analytics Applied, Seth Grimes
introduced approaches and usage scenarios of text analytics. Masses of unstructured content in
many different forms (Web pages, emails, multimedia content, social media etc.) contains named
entities, facts, relationships, sentiment information etc. Text analytics turns these into structured
information for various usage scenarios. The presentation and the related report on text analytics
provide information on application areas, the type of information actually being analysed in industry,
or current language coverage. Important points for LIDER roadmapping were:
Text analytics generates the structured information for bridging search, business intelligence
and applications.
Key real-life application areas are: business intelligence, life sciences, media & publishing,
voice of the customer, marketing, or public administration and policy making.
Users of text analytics tools need: adaptability to their content domain, customization (e.g.
import of taxonomies), flexible input & output formats and processing mechanisms (API, offline,
...).
Sentiment resolution is an important functionality for many usage scenarios.
When deciding on solutions, users take capacity (volume, performance, latency) and cost into
account.
Multilingual text analytics is an area that so far has not seen a lot of activity in industry.
Using Wikipedia for multilingual web content analytics. In this panel session, Pau Giner, Amir
Aharoni and Alolita Sharma provided details about the Wikipedia translation infrastructure. So far
users only have indicators, but no explicit provenance information about what article is a direct
translation from other languages. There is also no strict translation workflow. This is due to the
nature of the Wikipedia community, including Wikipeida editors, translators and new users who do
content creation via translation. Significant points of the session are summarized below.
For its content translation tool, Wikipedia is looking into automatic translation tools, allowing the
user to translate per pagraph and revise the result.
Handling feedback from users for the tool development is a challenge since about 200
communities have to be taken into account.
Wikipedia based machine translation tooling could help to quickly create domain specific MT.
The multilingual infrastructure of Wikipedia could be the basis for new research topics like
comparing content across cultures and in this way cultures themselves.
Growing Wikipedia editing with intelligent multi-language suggestion lists for article
translation as well as other techniques and tools. This session started with Runa Bhattacharjee,
Pau Giner and Santhosh Thottingal on a panel. It was highly interactive, which is reflected by the
summary of key points below.
Information on translation quality could help translators in Wikipedia. Such information is being
defined in the QTLaunchPad project, see the presentation at the MultilingualWeb workshop
from Arle Lommel.
Various types of background information can help translators in several ways: providing
translation suggestions, disambiguate terms, autocompletion etc.
Data models for structured information in Wikipedia, e.g. Wikidata, do not rely on the linked