Project | Paderborn University

Overview

Voice assistants are a core technology for human-machine interaction and provide access to product offerings and services via natural language. So far, companies in the US and Asia have dominated the market for voice assistance technology. However, the demand for voice assistant solutions in German production and retail industries is enormous, especially with regard to data sovereignty, as there is a need for better protection and the secure exchange of personal data. A German-made voice assistant solution would make this possible by implementing European standards of data security. At the same time, a new level of quality in human-machine communication that goes far beyond the semantic capabilities of current systems is enabling much more user-friendly systems.

To this end, experts from the fields of speech signal processing, natural language understanding, artificial intelligence and software engineering have joined forces at Fraunhofer IIS and Fraunhofer IAIS. Fraunhofer IIS already holds a world-leading position in the field of acoustic signal processing technology, which forms the basis for the high reliability and robustness of speech processing. Fraunhofer IAIS has developed leading algorithms in the field of automatic speech recognition and question answering. The goal is to further expand this technological leadership and integrate it into a scalable, multilingual and open voice assistant platform. Fraunhofer technology can then be adapted to specific company requirements and support the data sovereignty required in the production and retail industry.

As part of the »Artificial intelligence as a driver for economically relevant ecosystems« innovation competition Fraunhofer is working on a concept for SPEAKER, a large-scale research and development project supported by funding from the German Federal Ministry for Economic Affairs and Climate Action.

The SPEAKER project seeks to develop a leading German-made voice assistant platform for business-to-business (B2B) applications. This platform should be open, modular and scalable and provide technologies, services and data via service interfaces. The SPEAKER platform will be embedded in a comprehensive ecosystem made up of big industry, SMEs, start-ups and research partners who secure high innovation capabilities. The Fraunhofer Institutes for Intelligent Analysis and Information Systems IAIS and for Integrated Circuits IIS, which already possess the relevant technologies and experience in the field of voice assistant technologies, platforms (e.g. AI4EU – European AI on-demand platform) and global marketing strategies for voice and audio technologies (e.g. MP3), will ensure the development of the platform and the ecosystem.

The two Fraunhofer Institutes IIS and IAIS have conducted workshops with numerous companies to establish requirements, determine obstacles and recommend actions that will serve as a basis for platform design and development. Key arguments for a German-made voice assistant platform include data protection, security, privacy and trust. The lack of these has become evident particularly from the recently reported incidents of non-GDPR-compliant speech analysis by Google, Alexa and Siri. This is all the more applicable in the B2B environment, where internal company data needs to be protected. The SPEAKER platform therefore addresses the issues of data and technology sovereignty in this important emerging field of human-machine communication. Requirements were also identified with respect to domain-specific customizability, flexibility in the choice and use of modules, open interfaces to databases and applications, multilinguality, paralinguistics (e.g. recognizing emotions in voices) and participation and development of a user community. In parallel with the survey of requirements, current market research predicting strong growth in the voice assistant market was evaluated. On average, a 25 percent annual increase in devices with voice assistant functions is expected in the next four years.

The SPEAKER platform’s aim is to provide open, transparent and secure voice assistant applications. To achieve this, leading technologies for audio preprocessing, speech recognition, natural-language understanding (NLU), question answering (QA), dialogue management and speech synthesis by means of artificial intelligence (AI), and machine learning must be made available for simple, uncomplicated use. These key modules will be used to develop industrial voice assistant applications that, in turn, can be made available to other market players via the platform in the form of ready-made skills.

Compared with existing voice assistant environments (Alexa, Google Assistant), the following key characteristics are guaranteed and highlighted: modularity, data protection and privacy, openness with respect to technologies, connectivity and dissemination through an open ecosystem, and innovation capability. In addition, data diversity for B2B applications will be made possible by providing a data platform and integrating data and application partners. The infrastructure of the SPEAKER platform will enable data exchange (community approach), with international networks (MetaNet, European Language Grid) providing access to numerous language corpora. The SPEAKER platform will use industrial scaling mechanisms (e.g. Docker, Kubernetes, Redis). To this end, SPEAKER is working with the German company iNNOVO Cloud. This cooperation enables us to guarantee not only scalability, but also data protection based on GDPR principles. After the platform is transferred to the operating company, the public launch of the platform will help it become established quickly, setting up SPEAKER for a sustainable future. SPEAKER will be offered at a similar cost to established platforms and will focus primarily on B2B applications.