Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Celebrating the future - More here: - #WirFeiernZukunft

Photo: Paderborn University

Dr. Stefan Heindorf

Dr. Stefan Heindorf

Data Science (JRG)

Head - PostDoc - Junior Research Group Leader Data Science

Technologiepark 6
33100 Paderborn
Dr. Stefan Heindorf
10/2022 - today

Junior Research Group Leader

Universität Paderborn

12/2019 - 09/2022


Paderborn University

10/2013 - 12/2019


Paderborn University

04/2011 - 09/2013

Master of Science

Paderborn University

10/2007 - 03/2011

Bachelor of Science in Computer Science

Paderborn University

Open list in Research Information System


CausalQA: A Benchmark for Causal Question Answering

A. Bondarenko, M. Wolska, S. Heindorf, L. Blübaum, A. Ngonga Ngomo, B. Stein, P. Braslavski, M. Hagen, M. Potthast, in: Proceedings of the 29th International Conference on Computational Linguistics, International Committee on Computational Linguistics, 2022, pp. 3296–3308

At least 5% of questions submitted to search engines ask about cause-effect relationships in some way. To support the development of tailored approaches that can answer such questions, we construct Webis-CausalQA-22, a benchmark corpus of 1.1 million causal questions with answers. We distinguish different types of causal questions using a novel typology derived from a data-driven, manual analysis of questions from ten large question answering (QA) datasets. Using high-precision lexical rules, we extract causal questions of each type from these datasets to create our corpus. As an initial baseline, the state-of-the-art QA model UnifiedQA achieves a ROUGE-L F1 score of 0.48 on our new benchmark.

Tab2Onto: Unsupervised Semantification with Knowledge Graph Embeddings

H.M.A. Zahera, S. Heindorf, S. Balke, J. Haupt, M. Voigt, C. Walter, F. Witter, A. Ngonga Ngomo, in: The Semantic Web: ESWC 2022 Satellite Events, Springer International Publishing, 2022

EvoLearner: Learning Description Logics with Evolutionary Algorithms

S. Heindorf, L. Blübaum, N. Düsterhus, T. Werner, V.N. Golani, C. Demir, A. Ngonga Ngomo, in: WWW, ACM, 2022, pp. 818-828

Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples (nodes in the knowledge graph), we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties.

COVIDPUBGRAPH: A FAIR Knowledge Graph of COVID-19 Publications

S.. Pestryakova, D. Vollmers, M. Sherif, S. Heindorf, M.. Saleem, D. Moussallem, A. Ngonga Ngomo, Scientific Data (2022)


ASSET: A Semi-supervised Approach for Entity Typing in Knowledge Graphs

H.M.A. Zahera, S. Heindorf, A. Ngonga Ngomo, in: Proceedings of the 11th on Knowledge Capture Conference, ACM, 2021

Neural Class Expression Synthesis

N.J. KOUAGOU, S. Heindorf, C. Demir, A. Ngonga Ngomo, in: arXiv:2111.08486, 2021

Class expression learning is a branch of explainable supervised machine learning of increasing importance. Most existing approaches for class expression learning in description logics are search algorithms or hard-rule-based. In particular, approaches based on refinement operators suffer from scalability issues as they rely on heuristic functions to explore a large search space for each learning problem. We propose a new family of approaches, which we dub synthesis approaches. Instances of this family compute class expressions directly from the examples provided. Consequently, they are not subject to the runtime limitations of search-based approaches nor the lack of flexibility of hard-rule-based approaches. We study three instances of this novel family of approaches that use lightweight neural network architectures to synthesize class expressions from sets of positive examples. The results of their evaluation on four benchmark datasets suggest that they can effectively synthesize high-quality class expressions with respect to the input examples in under a second on average. Moreover, a comparison with the state-of-the-art approaches CELOE and ELTL suggests that we achieve significantly better F-measures on large ontologies. For reproducibility purposes, we provide our implementation as well as pre-trained models in the public GitHub repository at

Drift Detection in Text Data with Document Embeddings

R. Feldhans, A. Wilke, S. Heindorf, M.H. Shaker, B. Hammer, A. Ngonga Ngomo, E. Hüllermeier, in: Intelligent Data Engineering and Automated Learning – IDEAL 2021, Springer International Publishing, 2021

Convolutional Hypercomplex Embeddings for Link Prediction

C. Demir, D. Moussallem, S. Heindorf, A. Ngonga Ngomo, in: The 13th Asian Conference on Machine Learning, ACML 2021, 2021

Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, $\mathbb{R}$ and $\mathbb{C}$. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield state-of-the-art results for link prediction. In this paper, we investigate a composition of convolution operations with hypercomplex multiplications. We propose the four approaches QMult, OMult, ConvQ and ConvO to tackle the link prediction problem. QMult and OMult can be considered as quaternion and octonion extensions of previous state-of-the-art approaches, including DistMult and ComplEx. ConvQ and ConvO build upon QMult and OMult by including convolution operations in a way inspired by the residual learning framework. We evaluated our approaches on seven link prediction datasets including WN18RR, FB15K-237 and YAGO3-10. Experimental results suggest that the benefits of learning hypercomplex-valued vector representations become more apparent as the size and complexity of the knowledge graph grows. ConvO outperforms state-of-the-art approaches on FB15K-237 in MRR, Hit@1 and Hit@3, while QMult, OMult, ConvQ and ConvO outperform state-of-the-approaches on YAGO3-10 in all metrics. Results also suggest that link prediction performances can be further improved via prediction averaging. To foster reproducible research, we provide an open-source implementation of approaches, including training and evaluation scripts as well as pretrained models.

Automatically generating instructions from tutorials for search and user navigation

S. Heindorf. Automatically generating instructions from tutorials for search and user navigation, Patent 10936684. 2021.


CauseNet: Towards a Causality Graph Extracted from the Web

S. Heindorf, Y. Scholten, H. Wachsmuth, A. Ngonga Ngomo, M. Potthast, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM 2020), 2020, pp. 3023-3030


Vandalism Detection in Crowdsourced Knowledge Bases

S. Heindorf, Universität Paderborn, 2019


Semantic Data Mediator: Linking Services to Websites

D. Wolters, S. Heindorf, J. Kirchhoff, G. Engels, in: Service-Oriented Computing -- ICSOC 2017 Workshops, Springer International Publishing, 2018, pp. 388-392

Many websites offer links to social media sites for convenient content sharing. Unfortunately, those sharing capabilities are quite restricted and it is seldom possible to share content with other services, like those provided by a user's favorite applications or smart devices. In this paper, we present Semantic Data Mediator (SDM) --- a flexible middleware linking a vast number of services to millions of websites. Based on reusable repositories of service descriptions defined by the crowd, users can easily fill a personal registry with their favorite services, which can then be linked to websites by SDM. For this, SDM leverages semantic data, which is already available on millions of websites due to search engine optimization. Further support for our approach from website or service developers is not required. To enable the use of a broad range of services, data conversion services are automatically composed by SDM to transform data according to the needs of the different services. In addition to linking web services, various service adapters allow services of applications and smart devices to be linked as well. We have fully implemented our approach and present a real-world case study demonstrating its feasibility and usefulness.


Linking Services to Websites by Leveraging Semantic Data

D. Wolters, S. Heindorf, J. Kirchhoff, G. Engels, in: 2017 IEEE International Conference on Web Services (ICWS), IEEE, 2017

Websites increasingly embed semantic data for search engine optimization. The most common ontology for semantic data,, is supported by all major search engines and describes over 500 data types, including calendar events, recipes, products, and TV shows. As of today, users wishing to pass this data to their favorite applications, e.g., their calendars, cookbooks, price comparison applications or even smart devices such as TV receivers, rely on cumbersome and error-prone workarounds such as reentering the data or a series of copy and paste operations. In this paper, we present Semantic Data Mediator (SDM), an approach that allows the easy transfer of semantic data to a multitude of services, ranging from web services to applications installed on different devices. SDM extracts semantic data from the currently displayed web page on the client-side, offers suitable services to the user, and by the press of a button, forwards this data to the desired service while doing all the necessary data conversion and service interface adaptation in between. To realize this, we built a reusable repository of service descriptions, data converters, and service adapters, which can be extended by the crowd. Our approach for linking services to websites relies solely on semantic data and does not require any additional support by either website or service developers. We have fully implemented our approach and present a real-world case study demonstrating its feasibility and usefulness.

Overview of the Wikidata Vandalism Detection Task at WSDM Cup 2017

S. Heindorf, M. Potthast, G. Engels, B. Stein, in: WSDM Cup 2017 Notebook Papers, 2017

We report on the Wikidata vandalism detection task at the WSDM Cup 2017. The task received five submissions for which this paper describes their evaluation and a comparison to state of the art baselines. Unlike previous work, we recast Wikidata vandalism detection as an online learning problem, requiring participant software to predict vandalism in near real-time. The best-performing approach achieves a ROC-AUC of 0.947 at a PR-AUC of 0.458. In particular, this task was organized as a software submission task: to maximize reproducibility as well as to foster future research and development on this task, the participants were asked to submit their working software to the TIRA experimentation platform along with the source code for open source release.

Proceedings of the WSDM Cup 2017: Vandalism Detection and Triple Scoring

M. Potthast, S. Heindorf, H. Bast, in: arXiv:1712.09528, 2017

The WSDM Cup 2017 was a data mining challenge held in conjunction with the 10th International Conference on Web Search and Data Mining (WSDM). It addressed key challenges of knowledge bases today: quality assurance and entity search. For quality assurance, we tackle the task of vandalism detection, based on a dataset of more than 82 million user-contributed revisions of the Wikidata knowledge base, all of which annotated with regard to whether or not they are vandalism. For entity search, we tackle the task of triple scoring, using a dataset that comprises relevance scores for triples from type-like relations including occupation and country of citizenship, based on about 10,000 human relevance judgements. For reproducibility sake, participants were asked to submit their software on TIRA, a cloud-based evaluation platform, and they were incentivized to share their approaches open source.


Vandalism Detection in Wikidata

S. Heindorf, M. Potthast, B. Stein, G. Engels, in: Proceedings of the 25th International Conference on Information and Knowledge Management (CIKM 2016), 2016, pp. 327--336

Wikidata is the new, large-scale knowledge base of the Wikimedia Foundation. Its knowledge is increasingly used within Wikipedia itself and various other kinds of information systems, imposing high demands on its integrity.Wikidata can be edited by anyone and, unfortunately, it frequently gets vandalized, exposing all information systems using it to the risk of spreading vandalized and falsified information. In this paper, we present a new machine learning-based approach to detect vandalism in Wikidata.We propose a set of 47 features that exploit both content and context information, and we report on 4 classifiers of increasing effectiveness tailored to this learning task. Our approach is evaluated on the recently published Wikidata Vandalism Corpus WDVC-2015 and it achieves an area under curve value of the receiver operating characteristic, ROC-AUC, of 0.991. It significantly outperforms the state of the art represented by the rule-based Wikidata Abuse Filter (0.865 ROC-AUC) and a prototypical vandalism detector recently introduced by Wikimedia within the Objective Revision Evaluation Service (0.859 ROC-AUC).



Optimized XPath evaluation for Schema-compressed XML data

S. Böttcher, R. Hartel, S. Heindorf, in: ADC, Australian Computer Society, 2012, pp. 137-144

Open list in Research Information System

The University for the Information Society