Academic Work
Learning from Implicit User Feedback, Emotions and Demographic Information in Task-Oriented and Document-Grounded Dialogues
Dominic Petrak, Thy Thy Tran, Iryna Gurevych. 2024. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 4573--4603, Miami, Florida, USA. Association for Computational Linguistics.
Implicit user feedback, user emotions and demographic information have shown to be promising sources for improving the accuracy and user engagement of responses generated by dialogue systems. However, the influence of such information on task completion and factual consistency, which are important criteria for task-oriented and document-grounded dialogues, is not yet known. To address this, we introduce FEDI, the first English task-oriented and document-grounded dialogue dataset annotated with this information. Our experiments with Flan-T5, GPT-2 and Llama 2 show a particularly positive impact on task completion and factual consistency. Participants in our human evaluation reported that the responses generated by the feedback-trained models were more informative (Flan-T5 and GPT-2), relevant and factual consistent (Llama 2).
Systematic analysis of requirements for socially acceptable service robots
Andrea Ruo, Simone Arreghini, Luca Capra, Rosario De Chiara, Valeria Di Pasquale, Alessandro Giusti, Cristina Iani, Antonio Paolillo, Dominic Petrak, Alexander Plaum, Megha Quamara, Lorenzo Sabattini, Viktor Schmuck, Paolo Servillo, Francesco Zurolo, Valeria Villani. 2024. Preprint, arXiv:2409.08677
In modern society, service robots are increasingly recognized for their wide range of practical applications. In large and crowded social spaces, such as museums and hospitals, these robots are required to safely move in the environment while exhibiting user-friendly behavior. Ensuring the safe and socially acceptable operation of robots in such settings presents several challenges. To enhance the social acceptance in the design process of service robots, we present a systematic analysis of requirements, categorized into functional and non-functional. These requirements are further classified into different categories, with a single requirement potentially belonging to multiple categories. Finally, considering the specific case of a receptionist robotic agent, we discuss the requirements it should possess to ensure social acceptance.
Learning From Free-Text Human Feedback – Collect New Datasets Or Extend Existing Ones?
Dominic Petrak, Nafise Moosavi, Ye Tian, Nikolai Rozanov, Iryna Gurevych. 2023. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16259–16279, Singapore. Association for Computational Linguistics.
Continuous learning from free-text human feedback, such as error corrections, new knowledge, or alternative responses, is essential for today’s chatbots and virtual assistants to stay up-to-date, engaging, and socially acceptable. However, for research on methods for learning from such data, annotated data is scarce. To address this, we examine the error and user response types of six popular dialogue datasets from various types, including MultiWoZ, PersonaChat, Wizards-of-Wikipedia, and others, to assess their extendibility with the needed annotations. For this corpus study, we manually annotate a subset of each dataset with error and user response types using an improved version of the Integrated Error Taxonomy and a newly proposed user response type taxonomy. We provide the resulting dataset (EURTAD) to the community. Our findings provide new insights into dataset composition, including error types, user response types, and the relations between them.
Lessons Learned from a Citizen Science Project for Natural Language Processing
Jan-Christoph Klie, Ji-Ung Lee, Kevin Stowe, Gözde Şahin, Nafise Sadat Moosavi, Luke Bates, Dominic Petrak, Richard Eckart De Castilho, Iryna Gurevych. 2023. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3594–3608, Dubrovnik, Croatia. Association for Computational Linguistics.
Many Natural Language Processing (NLP) systems use annotated corpora for training and evaluation. However, labeled data is often costly to obtain and scaling annotation projects is difficult, which is why annotation tasks are often outsourced to paid crowdworkers. Citizen Science is an alternative to crowdsourcing that is relatively unexplored in the context of NLP. To investigate whether and how well Citizen Science can be applied in this setting, we conduct an exploratory study into engaging different groups of volunteers in Citizen Science for NLP by re-annotating parts of a pre-existing crowdsourced dataset. Our results show that this can yield high-quality annotations and at- tract motivated volunteers, but also requires considering factors such as scalability, participation over time, and legal and ethical issues. We summarize lessons learned in the form of guidelines and provide our code and data to aid future work on Citizen Science.
Arithmetic-Based Pretraining - Improving Numeracy of Pretrained Language Models
Dominic Petrak, Nafise Sadat Moosavi, Iryna Gurevych. 2023. In Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023), pages 477–493, Toronto, Canada. Association for Computational Linguistics.
State-of-the-art pretrained language models tend to perform below their capabilities when applied out-of-the-box on tasks that require understanding and working with numbers (usually referred to as numeracy). Recent work suggests two main reasons for this: (1) popular tokenisation algorithms have limited expressiveness for numbers, and (2) common pretraining objectives do not target numeracy. Approaches that address these shortcomings usually require architectural changes or pretraining from scratch. In this paper, we propose a new extended pretraining approach called Arithmetic-Based Pretraining that jointly addresses both in one extended pretraining step without requiring architectural changes or pretraining from scratch. Arithmetic-Based Pretraining combines contrastive learning to improve the number representation, and a novel extended pretraining objective called Inferable Number Prediction Task to improve numeracy. Our experiments show the effectiveness of Arithmetic-Based Pretraining in three different tasks that require improved numeracy, i.e., reading comprehension in the DROP dataset, inference-on-tables in the InfoTabs dataset, and table-to-text generation in the WikiBio and SciGen datasets.
Relations Extraction using Indicators (Master
Thesis)
Dominic Petrak; RheinMain
University of Applied Science, Wiesbaden, Germany;
2021
Relations between entities are a key for
understanding the semantic context in natural language, and
therefore for popular tasks like question answering, information
extraction or knowledge graph generation. State-of-the-art
approaches for machine learning-based relations classification
encode the entire sentence using pre-trained models of
transformers, without further consideration of syntactic
indicators like certain phrases or words, or prepositions, which
are more informative than other words and may be beneficial for
identifying semantic relations. In this thesis, the effect of
additionally using those indicators for relations extraction is
investigated.
Semantic Code Search with Neural Bag-of-Words and
Graph Convolutional Networks
Anna Abad
Sieper, Omar Amarkhel, Savina Diez and Dominic Petrak; SKILL
Student Conference @ INFORMATIK 2020 - awarded as best
paper
Approach to semantic code search. We
investigated two ideas for retrieving code that best matches a
natural language query. The first idea was to expand a neural
Bag-of-Words encoder with TF-IDF weighting. The second idea was
to additionally utilize the call hierarchies by using a Graph
Convolutional Network trained on corresponding caller graphs.
The Java and Python datasets from GitHub's CodeSearchNet
challenge were used as the data basis. Call hierarchies have
been added to the Java datasets.
Bug Localization (Bachelor
Thesis)
Dominic Petrak; RheinMain University of
Applied Science, Wiesbaden, Germany;
2019
Localization of faulty source code in software
based on human-written bug reports (Static Bug Localization) has
long been the subject of research in the area of information
retrieval and machine learning. In 2018, Bench4BL, a benchmark dataset, and evaluation framework for this task has
been proposed. Using that, the authors compared state-of-the-art
approaches to this task. In this thesis, those approaches have
been studied and the most promising ideas have been combined.
The resulting approach was trained and evaluated by using
Bench4BL. Finally, the results were compared.