Sie haben Javascript deaktiviert!
Sie haben versucht eine Funktion zu nutzen, die nur mit Javascript möglich ist. Um sämtliche Funktionalitäten unserer Internetseite zu nutzen, aktivieren Sie bitte Javascript in Ihrem Browser.

Die Universität Paderborn im Februar 2023 Bildinformationen anzeigen

Die Universität Paderborn im Februar 2023

Foto: Universität Paderborn, Hannah Brauckhoff

Matthew Caron, M.Sc.

 Matthew Caron, M.Sc.

Wirtschaftsinformatik, insb. Data Analytics

Wissenschaftlicher Mitarbeiter

+49 5251 60-5102
+49 5251 60-3542

Montags, 08:00-12:00 Uhr 

(nach vorheriger Anmeldung)

Zukunftsmeile 2
33102 Paderborn
 Matthew Caron, M.Sc.
01/2019 - heute

Wissenschaftlicher Mitarbeiter an der Professur für Wirtschaftsinformatik, insb. Data Analytics

Universität Paderborn
Paderborn, Deutschland


DeepLearn 2019 - 3rd International Summer School on Deep Learning

Institute for Research Development, Training and Advice (IRDTA)
Warsaw, Poland


Preis der Universitätsgesellschaft für herausragende Abschlussarbeiten aus dem Jahr 2017/2018

Universität Paderborn
Paderborn, Deutschland

08/2018 - 12/2018

Wissenschaftlicher Mitarbeiter bei AG Semantische Informationsverarbeitung (Digitale Kulturwissenschaften)

Universität Paderborn
Paderborn, Deutschland

10/2016 - 08/2018

Masterstudium Management Information Systems (M.Sc.)

Universität Paderborn
Paderborn, Deutschland

10/2017 - 06/2018

Wissenschaftliche Hilfskraft an der Juniorprofessur für Wirtschaftsinformatik, insb. Semantische Informationsverarbeitung

Universität Paderborn | Heinz Nixdorf Institut
Paderborn, Deutschland


Sonderstipendium des Kreises Paderborn

Stiftung Studienfonds OWL


International Summer School on Data Science

University of Zagreb | Research Unit for Data Science
Split, Kroatien

03/2017 - 09/2017

Wissenschaftliche Hilfskraft an der Professur für Wirtschaftsinformatik, insb. Betriebliche Informationssysteme

Universität Paderborn
Paderborn, Deutschland

04/2016 - 01/2017

Werkstudent - Global Customer Care Center

Diebold Nixdorf
Paderborn, Deutschland

04/2015 - 02/2016

Werkstudent - Business Suite 4 SAP HANA (Go-To-Market)

Walldorf, Deutschland

2012 - 2016

Bachelorstudium International Management (B.A.)

Hochschule Worms
Worms, Deutschland

08/2013 - 06/2014

Austauschprogramm (International Business)

Umeå University | School of Business & Economics
Umeå, Schweden

Liste im Research Information System öffnen


Shortcut Learning in Financial Text Mining: Exposing the Overly Optimistic Performance Estimates of Text Classification Models under Distribution Shift

M. Caron, in: 2022 IEEE International Conference on Big Data (IEEE BigData 2022), IEEE, 2022

In recent years, many cases of deep neural networks failing dramatically when faced with adversarial or real-world examples have been reported. Such failures, which are quite hard to detect, are often related to a generalization problem known as shortcut learning. Yet, with state-of-the-art transformer models now being ubiquitous in financial text mining, one cannot help but wonder how reliable the results conveyed in the ever-growing literature genuinely are. Against this background, we expose, in this work, how vulnerable contemporary financial text mining approaches are to shortcut learning. Focussing on the common learning task of financial sentiment classification, we assess, using two entity-based sampling strategies and our publicly-available dataset, the discrepancies between i.i.d. and o.o.d. performance estimates of four transformer models. Our results reveal that o.o.d. performance estimates are consistently weaker than those of their i.i.d. counterparts, with the error rate increasing by as much as 29.7%, thus, demonstrating how this issue can, when overlooked, lead to misleading evaluations. Moreover, we show how additional preprocessing steps, such as entity removal and vocabulary filtering, can help reduce the effects of shortcut learning by filtering out entity-related linguistic cues.

Towards Automated Moderation: Enabling Toxic Language Detection with Transfer Learning and Attention-Based Models

M. Caron, F.S. Bäumer, O. Müller, in: 55th Annual Hawaii International Conference on System Sciences (HICSS 2022), 2022

Our world is more connected than ever before. Sadly, however, this highly connected world has made it easier to bully, insult, and propagate hate speech on the cyberspace. Even though researchers and companies alike have started investigating this real-world problem, the question remains as to why users are increasingly being exposed to hate and discrimination online. In fact, the noticeable and persistent increase in harmful language on social media platforms indicates that the situation is, actually, only getting worse. Hence, in this work, we show that contemporary ML methods can help tackle this challenge in an accurate and cost-effective manner. Our experiments demonstrate that a universal approach combining transfer learning methods and state-of-the-art Transformer architectures can trigger the efficient development of toxic language detection models. Consequently, with this universal approach, we provide platform providers with a simplistic approach capable of enabling the automated moderation of user-generated content, and as a result, hope to contribute to making the web a safer place.

Towards a Reliable & Transparent Approach to Data-Driven Brand Valuation

M. Caron, C. Bartelheimer, O. Müller, in: Proceeding of the 28th Americas Conference on Information Systems (AMCIS), 2022

Now accounting for more than 80% of a firm's worth, brands have become essential assets for modern organizations. However, methods and techniques for the monetary valuation of brands are still under-researched. Hence, the objective of this study is to evaluate the utility of explanatory statistical models and machine learning approaches for explaining and predicting brand value. Drawing upon the case of the most valuable English football brands during the 2016/17 to 2020/21 seasons, we demonstrate how to operationalize Aaker's (1991) theoretical brand equity framework to collect meaningful qualitative and quantitative feature sets. Our explanatory models can explain up to 77% of the variation in brand valuations across all clubs and seasons, while our predictive approach can predict out-of-sample observations with a mean absolute percentage error (MAPE) of 14%. Future research can build upon our results to develop domain-specific brand valuation methods while enabling managers to make better-informed investment decisions.


PIVOT: A Parsimonious End-to-End Learning Framework for Valuing Player Actions in Handball using Tracking Data

O. Müller, M. Caron, M. Döring, T. Heuwinkel, J. Baumeister, in: 8th Workshop on Machine Learning and Data Mining for Sports Analytics (ECML PKDD 2021), 2021

Over the last years, several approaches for the data-driven estimation of expected possession value (EPV) in basketball and association football (soccer) have been proposed. In this paper, we develop and evaluate PIVOT: the first such framework for team handball. Accounting for the fast-paced, dynamic nature and relative data scarcity of hand- ball, we propose a parsimonious end-to-end deep learning architecture that relies solely on tracking data. This efficient approach is capable of predicting the probability that a team will score within the near future given the fine-grained spatio-temporal distribution of all players and the ball over the last seconds of the game. Our experiments indicate that PIVOT is able to produce accurate and calibrated probability estimates, even when trained on a relatively small dataset. We also showcase two interactive applications of PIVOT for valuing actual and counterfactual player decisions and actions in real-time.

To the Moon! Analyzing the Community of “Degenerates” Engaged in the Surge of the GME Stock

M. Caron, M. Gulenko, O. Müller, in: 42nd International Conference on Information Systems (ICIS 2021), 2021

In early 2021, the finance world was taken by storm by the dramatic price surge of the GameStop Corp. stock. This rise is being, at least in part, attributed to a group of Redditors belonging to the now-famous r/wallstreetbets (WSB) subreddit group. In this work, we set out to address if user activity on the WSB subreddit is associated with the trading volume of the GME stock. Leveraging a unique dataset containing more than 4.9 million WSB posts and comments, we assert that user activity is associated with the trading volume of the GameStop stock. We further show that posts have a significantly higher predictive power than comments and are especially helpful for predicting unusually high trading volume. Lastly, as recent events have shown, we believe that these findings have implications for retail and institutional investors, trading platforms, and policymakers, as these can have disruptive potential.


Hardening Soft Information: A Transformer-Based Approach to Forecasting Stock Return Volatility

M. Caron, O. Müller, in: 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020, pp. 4383-4391

Historically, the field of financial forecasting almost exclusively relied on so-called hard information – i.e., numerical data with well-defined and unambiguous meaning. Over the last few decades, however, researchers and practitioners alike have, following the advances in natural language understanding, started recognizing the benefits of integrating soft information into financial modelling. In line with the above, this paper examines whether contemporary attention-based sequence-to-sequence models, known as Transformers, can help improve stock return volatility prediction when applied to corporate annual reports. Using a publicly available benchmark dataset, we show, in an empirical analysis, that out-of-the-box Transformer models have the ability to outmatch current state-of-the-art results and, more importantly, that our proposed feature-based Transformer approach can outperform a robust numerical baseline. To the best of our knowledge, this is the first empirical study focusing on stock return volatility prediction (1) to ever experiment with state-of-the-art Transformer architectures and (2) to demonstrate that a model based solely on soft information can surpass its numerical counterpart. Furthermore, we show that by including an additional numerical feature into our best text-only model, we can push the performance of our model even further, suggesting that soft and hard information contain different predictive signals.

Liste im Research Information System öffnen

Die Universität der Informationsgesellschaft