Achtung:

Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Sunny start to the new semester (April 2023). Show image information

Sunny start to the new semester (April 2023).

Photo: Paderborn University, Besim Mazhiqi

Joschka Kersting, M.Sc.

Contact
Biography
Publications
 Joschka Kersting, M.Sc.

Sonderforschungsbereich 901

Research Associate

Phone:
+49 5251 60-5669
Office:
ZM2.A.01.05.1
Web:
Web(external):
Visitor:
Zukunftsmeile 2
33102 Paderborn
 Joschka Kersting, M.Sc.
Miscellaneous
Since 04/2020

Research Assistant in the Collaborative Research Centre 901 (CRC 901): On-The-Fly Computing

Paderborn University

04/2018 - 03/2020

Research Assistant at the Chair for Digital Humanities

Paderborn University

12/2017 - 03/2018

Student Assistant at the Chair for Digital Humanities

Paderborn University

10/2015 - 03/2018

Master Studies in Management Information Systems

Paderborn University

04/2016 - 11/2017

Student Assistant at the Chair for Business Computing, especially Semantic Information Processing

Heinz Nixdorf Institute
Paderborn University

10/2016 - 12/2016

Research Stay at KISTI

Korea Institute of Science and Technology Information (KISTI)
Daejeon, Republic of Korea (South Korea)

10/2012 - 09/2015

Dual Bachelor Degree Program in International Business

Fachhochschule der Wirtschaft (FHDW), state-approved private university, Paderborn
among others in the CRM IT of arvato (Bertelsmann)

01/2014 - 02/2014

Internship abroad in Moscow

Internship at an international fashion company in Moscow, Russia

Since 04/2020

Research Assistant in the Collaborative Research Centre 901 (CRC 901): On-The-Fly Computing

Paderborn University

04/2018 - 03/2020

Research Assistant at the Chair for Digital Humanities

Paderborn University

12/2017 - 03/2018

Student Assistant at the Chair for Digital Humanities

Paderborn University

10/2015 - 03/2018

Master Studies in Management Information Systems

Paderborn University

04/2016 - 11/2017

Student Assistant at the Chair for Business Computing, especially Semantic Information Processing

Heinz Nixdorf Institute
Paderborn University

10/2016 - 12/2016

Research Stay at KISTI

Korea Institute of Science and Technology Information (KISTI)
Daejeon, Republic of Korea (South Korea)

10/2012 - 09/2015

Dual Bachelor Degree Program in International Business

Fachhochschule der Wirtschaft (FHDW), state-approved private university, Paderborn
among others in the CRM IT of arvato (Bertelsmann)

01/2014 - 02/2014

Internship abroad in Moscow

Internship at an international fashion company in Moscow, Russia


Open list in Research Information System

2023

Identifizierung quantifizierbarer Bewertungsinhalte und -kategorien mittels Text Mining

J. Kersting, 2023

Reading between the lines has so far been reserved for humans. The present dissertation addresses this research gap using machine learning methods. Implicit expressions are not comprehensible by computers and cannot be localized in the text. However, many texts arise on interpersonal topics that, unlike commercial evaluation texts, often imply information only by means of longer phrases. Examples are the kindness and the attentiveness of a doctor, which are only paraphrased (“he didn’t even look me in the eye”). The analysis of such data, especially the identification and localization of implicit statements, is a research gap (1). This work uses so-called Aspect-based Sentiment Analysis as a method for this purpose. It remains open how the aspect categories to be extracted can be discovered and thematically delineated based on the data (2). Furthermore, it is not yet explored how a collection of tools should look like, with which implicit phrases can be identified and thus made explicit (3). Last, it is an open question how to correlate the identified phrases from the text data with other data, including the investigation of the relationship between quantitative scores (e.g., school grades) and the thematically related text (4). Based on these research gaps, the research question is posed as follows: Using text mining methods, how can implicit rating content be properly interpreted and thus made explicit before it is automatically categorized and quantified? The uniqueness of this dissertation is based on the automated recognition of implicit linguistic statements alongside explicit statements. These are identified in unstructured text data so that features expressed only in the text can later be compared across data sources, even though they were not included in rating categories such as stars or school grades. German-language physician ratings from websites in three countries serve as the sample domain. The solution approach consists of data creation, a pipeline for text processing and analyses based on this. In the data creation, aspect classes are identified and delineated across platforms and marked in text data. This results in six datasets with over 70,000 annotated sentences and detailed guidelines. The models that were created based on the training data extract and categorize the aspects. In addition, the sentiment polarity and the evaluation weight, i. e., the importance of each phrase, are determined. The models, which are combined in a pipeline, are used in a prototype in the form of a web application. The analyses built on the pipeline quantify the rating contents by linking the obtained information with further data, thus allowing new insights. As a result, a toolbox is provided to identify quantifiable rating content and categories using text mining for a sample domain. This is used to evaluate the approach, which in principle can also be adapted to any other domain.


2022

Chatbot-Enhanced Requirements Resolution for Automated Service Compositions

J. Kersting, M. Ahmed, M. Geierhos, in: HCI International 2022 Posters, Springer International Publishing, 2022, pp. 419--426

This work addresses the automatic resolution of software requirements. In the vision of On-The-Fly Computing, software services should be composed on demand, based solely on natural language input from human users. To enable this, we build a chatbot solution that works with human-in-the-loop support to receive, analyze, correct, and complete their software requirements. The chatbot is equipped with a natural language processing pipeline and a large knowledge base, as well as sophisticated dialogue management skills to enhance the user experience. Previous solutions have focused on analyzing software requirements to point out errors such as vagueness, ambiguity, or incompleteness. Our work shows how apps can collaborate with users to efficiently produce correct requirements. We developed and compared three different chatbot apps that can work with built-in knowledge. We rely on ChatterBot, DialoGPT and Rasa for this purpose. While DialoGPT provides its own knowledge base, Rasa is the best system to combine the text mining and knowledge solutions at our disposal. The evaluation shows that users accept 73% of the suggested answers from Rasa, while they accept only 63% from DialoGPT or even 36% from ChatterBot.


Implicit Statements in Healthcare Reviews: A Challenge for Sentiment Analysis

J. Kersting, F.S. Bäumer, in: Proceedings of the Fourteenth International Conference on Pervasive Patterns and Applications (PATTERNS 2022): Special Track AI-DRSWA: Maturing Artificial Intelligence - Data Science for Real-World Applications, IARIA, 2022, pp. 5-9

This paper aims at discussing past limitations set in sentiment analysis research regarding explicit and implicit mentions of opinions. Previous studies have regularly neglected this question in favor of methodical research on standard-datasets. Furthermore, they were limited to linguistically less-diverse domains, such as commercial product reviews. We face this issue by annotating a German-language physician review dataset that contains numerous implicit, long, and complex statements that indicate aspect ratings, such as the physician’s friendliness. We discuss the nature of implicit statements and present various samples to illustrate the challenge described.


2021

IN OTHER WORDS: A NAIVE APPROACH TO TEXT SPINNING

F.S. Bäumer, J. Kersting, S. Denisov, M. Geierhos, in: PROCEEDINGS OF THE INTERNATIONAL CONFERENCES ON WWW/INTERNET 2021 AND APPLIED COMPUTING 2021, IADIS, 2021, pp. 221--225

Content is the new oil. Users consume billions of terabytes a day while surfing on news sites or blogs, posting on social media sites, and sending chat messages around the globe. While content is heterogeneous, the dominant form of web content is text. There are situations where more diversity needs to be introduced into text content, for example, to reuse it on websites or to allow a chatbot to base its models on the information conveyed rather than of the language used. In order to achieve this, paraphrasing techniques have been developed: One example is Text spinning, a technique that automatically paraphrases text while leaving the intent intact. This makes it easier to reuse content, or to change the language generated by the bot more human. One method for modifying texts is a combination of translation and back-translation. This paper presents NATTS, a naive approach that uses transformer-based translation models to create diversified text, combining translation steps in one model. An advantage of this approach is that it can be fine-tuned and handle technical language.


Towards Aspect Extraction and Classification for Opinion Mining with Deep Sequence Networks

J. Kersting, M. Geierhos, in: Natural Language Processing in Artificial Intelligence -- NLPinAI 2020, Springer, 2021, pp. 163--189

This chapter concentrates on aspect-based sentiment analysis, a form of opinion mining where algorithms detect sentiments expressed about features of products, services, etc. We especially focus on novel approaches for aspect phrase extraction and classification trained on feature-rich datasets. Here, we present two new datasets, which we gathered from the linguistically rich domain of physician reviews, as other investigations have mainly concentrated on commercial reviews and social media reviews so far. To give readers a better understanding of the underlying datasets, we describe the annotation process and inter-annotator agreement in detail. In our research, we automatically assess implicit mentions or indications of specific aspects. To do this, we propose and utilize neural network models that perform the here-defined aspect phrase extraction and classification task, achieving F1-score values of about 80% and accuracy values of more than 90%. As we apply our models to a comparatively complex domain, we obtain promising results.


Well-being in Plastic Surgery: Deep Learning Reveals Patients' Evaluations

J. Kersting, M. Geierhos, in: Proceedings of the 10th International Conference on Data Science, Technology and Applications (DATA 2021), SCITEPRESS, 2021, pp. 275--284


Human Language Comprehension in Aspect Phrase Extraction with Importance Weighting

J. Kersting, M. Geierhos, in: Natural Language Processing and Information Systems, Springer, 2021, pp. 231--242

In this study, we describe a text processing pipeline that transforms user-generated text into structured data. To do this, we train neural and transformer-based models for aspect-based sentiment analysis. As most research deals with explicit aspects from product or service data, we extract and classify implicit and explicit aspect phrases from German-language physician review texts. Patients often rate on the basis of perceived friendliness or competence. The vocabulary is difficult, the topic sensitive, and the data user-generated. The aspect phrases come with various wordings using insertions and are not noun-based, which makes the presented case equally relevant and reality-based. To find complex, indirect aspect phrases, up-to-date deep learning approaches must be combined with supervised training data. We describe three aspect phrase datasets, one of them new, as well as a newly annotated aspect polarity dataset. Alongside this, we build an algorithm to rate the aspect phrase importance. All in all, we train eight transformers on the new raw data domain, compare 54 neural aspect extraction models and, based on this, create eight aspect polarity models for our pipeline. These models are evaluated by using Precision, Recall, and F-Score measures. Finally, we evaluate our aspect phrase importance measure algorithm.


2020

Tag Me If You Can: Insights into the Challenges of Supporting Unrestricted P2P News Tagging

F.S. Bäumer, J. Kersting, B. Buff, M. Geierhos, in: Information and Software Technologies, Springer, 2020, pp. 368--382

Peer-to-Peer news portals allow Internet users to write news articles and make them available online to interested readers. Despite the fact that authors are free in their choice of topics, there are a number of quality characteristics that an article must meet before it is published. In addition to meaningful titles, comprehensibly written texts and meaning- ful images, relevant tags are an important criteria for the quality of such news. In this case study, we discuss the challenges and common mistakes that Peer-to-Peer reporters face when tagging news and how incorrect information can be corrected through the orchestration of existing Natu- ral Language Processing services. Lastly, we use this illustrative example to give insight into the challenges of dealing with bottom-up taxonomies.


SEMANTIC TAGGING OF REQUIREMENT DESCRIPTIONS: A TRANSFORMER-BASED APPROACH

J. Kersting, F.S. Bäumer, in: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON APPLIED COMPUTING 2020, IADIS, 2020, pp. 119--123


Aspect Phrase Extraction in Sentiment Analysis with Deep Learning

J. Kersting, M. Geierhos, in: Proceedings of the 12th International Conference on Agents and Artificial Intelligence (ICAART 2020) -- Special Session on Natural Language Processing in Artificial Intelligence (NLPinAI 2020), SCITEPRESS, 2020, pp. 391--400

This paper deals with aspect phrase extraction and classification in sentiment analysis. We summarize current approaches and datasets from the domain of aspect-based sentiment analysis. This domain detects sentiments expressed for individual aspects in unstructured text data. So far, mainly commercial user reviews for products or services such as restaurants were investigated. We here present our dataset consisting of German physician reviews, a sensitive and linguistically complex field. Furthermore, we describe the annotation process of a dataset for supervised learning with neural networks. Moreover, we introduce our model for extracting and classifying aspect phrases in one step, which obtains an F1-score of 80%. By applying it to a more complex domain, our approach and results outperform previous approaches.


Detection of Privacy Disclosure in the Medical Domain: A Survey

B. Buff, J. Kersting, M. Geierhos, in: Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2020), SCITEPRESS, 2020, pp. 630--637

When it comes to increased digitization in the health care domain, privacy is a relevant topic nowadays. This relates to patient data, electronic health records or physician reviews published online, for instance. There exist different approaches to the protection of individuals’ privacy, which focus on the anonymization and masking of personal information subsequent to their mining. In the medical domain in particular, measures to protect the privacy of patients are of high importance due to the amount of sensitive data that is involved (e.g. age, gender, illnesses, medication). While privacy breaches in structured data can be detected more easily, disclosure in written texts is more difficult to find automatically due to the unstructured nature of natural language. Therefore, we take a detailed look at existing research on areas related to privacy protection. Likewise, we review approaches to the automatic detection of privacy disclosure in different types of medical data. We provide a survey of several studies concerned with privacy breaches in the medical domain with a focus on Physician Review Websites (PRWs). Finally, we briefly develop implications and directions for further research.


Neural Learning for Aspect Phrase Extraction and Classification in Sentiment Analysis

J. Kersting, M. Geierhos, in: Proceedings of the 33rd International Florida Artificial Intelligence Research Symposium (FLAIRS) Conference, AAAI, 2020, pp. 282--285


What Reviews in Local Online Labour Markets Reveal about the Performance of Multi-Service Providers

J. Kersting, M. Geierhos, in: Proceedings of the 9th International Conference on Pattern Recognition Applications and Methods, SCITEPRESS, 2020, pp. 263--272

This paper deals with online customer reviews of local multi-service providers. While many studies investigate product reviews and online labour markets with service providers delivering intangible products “over the wire”, we focus on websites where providers offer multiple distinct services that can be booked, paid and reviewed online but are performed locally offline. This type of service providers has so far been neglected in the literature. This paper analyses reviews and applies sentiment analysis. It aims to gain new insights into local multi-service providers’ performance. There is a broad literature range presented with regard to the topics addressed. The results show, among other things, that providers with good ratings continue to perform well over time. We find that many positive reviews seem to encourage sales. On average, quantitative star ratings and qualitative ratings in the form of review texts match. Further results are also achieved in this study.


2019

Natural Language Processing in OTF Computing: Challenges and the Need for Interactive Approaches

F.S. Bäumer, J. Kersting, M. Geierhos, Computers (2019), 8(1), 22

The vision of On-the-Fly (OTF) Computing is to compose and provide software services ad hoc, based on requirement descriptions in natural language. Since non-technical users write their software requirements themselves and in unrestricted natural language, deficits occur such as inaccuracy and incompleteness. These deficits are usually met by natural language processing methods, which have to face special challenges in OTF Computing because maximum automation is the goal. In this paper, we present current automatic approaches for solving inaccuracies and incompletenesses in natural language requirement descriptions and elaborate open challenges. In particular, we will discuss the necessity of domain-specific resources and show why, despite far-reaching automation, an intelligent and guided integration of end users into the compensation process is required. In this context, we present our idea of a chat bot that integrates users into the compensation process depending on the given circumstances.


In Reviews We Trust: But Should We? Experiences with Physician Review Websites

J. Kersting, F.S. Bäumer, M. Geierhos, in: Proceedings of the 4th International Conference on Internet of Things, Big Data and Security, SCITEPRESS, 2019, pp. 147-155

The ability to openly evaluate products, locations and services is an achievement of the Web 2.0. It has never been easier to inform oneself about the quality of products or services and possible alternatives. Forming one’s own opinion based on the impressions of other people can lead to better experiences. However, this presupposes trust in one’s fellows as well as in the quality of the review platforms. In previous work on physician reviews and the corresponding websites, it was observed that there occurs faulty behavior by some reviewers and there were noteworthy differences in the technical implementation of the portals and in the efforts of site operators to maintain high quality reviews. These experiences raise new questions regarding what trust means on review platforms, how trust arises and how easily it can be destroyed.


2018

Rate Your Physician: Findings from a Lithuanian Physician Rating Website

F.S. Bäumer, J. Kersting, V. Kuršelis, M. Geierhos, in: Communications in Computer and Information Science, Springer, 2018, pp. 43-58

Physician review websites are known around the world. Patients review the subjectively experienced quality of medical services supplied to them and publish an overall rating on the Internet, where quantitative grades and qualitative texts come together. On the one hand, these new possibilities reduce the imbalance of power between health care providers and patients, but on the other hand, they can also damage the usually very intimate relationship between health care providers and patients. Review websites must meet these requirements with a high level of responsibility and service quality. In this paper, we look at the situation in Lithuania: Especially, we are interested in the available possibilities of evaluation and interaction, and the quality of a particular review website measured against the available data. We thereby identify quality weaknesses and lay the foundation for future research.


Towards a Multi-Stage Approach to Detect Privacy Breaches in Physician Reviews

F.S. Bäumer, J. Kersting, M. Orlikowski, M. Geierhos, in: Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTiCS 2018), CEUR-WS.org, 2018

Physician Review Websites allow users to evaluate their experiences with health services. As these evaluations are regularly contextualized with facts from users’ private lives, they often accidentally disclose personal information on the Web. This poses a serious threat to users’ privacy. In this paper, we report on early work in progress on “Text Broom”, a tool to detect privacy breaches in user-generated texts. For this purpose, we conceptualize a pipeline which combines methods of Natural Language Processing such as Named Entity Recognition, linguistic patterns and domain-specific Machine Learning approaches which have the potential to recognize privacy violations with wide coverage. A prototypical web application is openly accesible.



2017

Using Sentiment Analysis on Local Up-to-the-Minute News: An Integrated Approach

J. Kersting, M. Geierhos, in: Proceedings of the 23rd International Conference on Information and Software Technologies, Communications in Computer and Information Science, Springer International Publishing, 2017, pp. 528-538


Internet of Things Architecture for Handling Stream Air Pollution Data

J. Kersting, M. Geierhos, H. Jung, T. Kim, in: Proceedings of the 2nd International Conference on Internet of Things, Big Data and Security, SCITEPRESS, 2017, pp. 117-124

In this paper, we present an IoT architecture which handles stream sensor data of air pollution. Particle pollution is known as a serious threat to human health. Along with developments in the use of wireless sensors and the IoT, we propose an architecture that flexibly measures and processes stream data collected in real-time by movable and low-cost IoT sensors. Thus, it enables a wide-spread network of wireless sensors that can follow changes in human behavior. Apart from stating reasons for the need of such a development and its requirements, we provide a conceptual design as well as a technological design of such an architecture. The technological design consists of Kaa and Apache Storm which can collect air pollution information in real-time and solve various problems to process data such as missing data and synchronization. This enables us to add a simulation in which we provide issues that might come up when having our architecture in use. Together with these issues, we state r easons for choosing specific modules among candidates. Our architecture combines wireless sensors with the Kaa IoT framework, an Apache Kafka pipeline and an Apache Storm Data Stream Management System among others. We even provide open-government data sets that are freely available.


Using Sentiment Analysis on Local Up-to-the-Minute News: An Integrated Approach

J. Kersting, M. Geierhos, in: Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings, Springer, 2017, pp. 528-538

In this paper, we present a search solution that makes local news information easily accessible. In the era of fake news, we provide an approach for accessing news information through opinion mining. This enables users to view news on the same topics from different web sources. By applying sentiment analysis on social media posts, users can better understand how issues are captured and see people’s reactions. Therefore, we provide a local search service that first localizes news articles, then visualizes their occurrence according to the frequency of mentioned topics on a heatmap and even shows the sentiment score for each text.


Privacy Matters: Detecting Nocuous Patient Data Exposure in Online Physician Reviews

F.S. Bäumer, N. Grote, J. Kersting, M. Geierhos, in: Information and Software Technologies: 23rd International Conference, ICIST 2017, Druskininkai, Lithuania, October 12–14, 2017, Proceedings, Springer, 2017, pp. 77-89

Consulting a physician was long regarded as an intimate and private matter. The physician-patient relationship was perceived as sensitive and trustful. Nowadays, there is a change, as medical procedures and physicians consultations are reviewed like other services on the Internet. To allay user’s privacy doubts, physician review websites assure anonymity and the protection of private data. However, there are hundreds of reviews that reveal private information and hence enable physicians or the public to identify patients. Thus, we draw attention to the cases when de-anonymization is possible. We therefore introduce an approach that highlights private information in physician reviews for users to avoid an accidental disclosure. For this reason, we combine established natural-language-processing techniques such as named entity recognition as well as handcrafted patterns to achieve a high detection accuracy. That way, we can help websites to increase privacy protection by recognizing and uncovering apparently uncritical information in user-generated texts.


Open list in Research Information System

The University for the Information Society