Accepted Papers

Accepted Papers and Short Papers

Session 1: Business Process Modelling

Daniel Schuster, Sebastiaan J. van Zelst and Wil M.P. van der Aalst
Process discovery aims to learn a process model from observed process behavior. From a user’s perspective, most discovery algorithms work like a black box. Besides parameter tuning, there is no interaction between the user and the algorithm. Interactive process discovery allows the user to exploit domain knowledge and to guide the discovery process. Previously, an incremental discovery approach has been introduced where a model, considered to be “under construction”, gets incrementally extended by user-selected process behavior. This paper introduces a novel approach that additionally allows the user to freeze model parts within the model under construction. Frozen sub-models are not altered by the incremental approach when new behavior is added to the model. The user can thus steer the discovery algorithm. Our experiments show that freezing sub-models can lead to higher quality models.
Cinzia Cappiello, Barbara Pernici and Monica Vitali
Information from social media can be leveraged by social scientists to support effective decision making. However, such data sources are often characterised by high volumes and noisy information, therefore data analysis should be always preceded by a data preparation phase. Designing and testing data preparation pipelines requires considering requirements on cost, time, and quality of data extraction. In this work, we aim to propose a methodology for modeling crowd-enhanced data analysis pipelines using a goal-oriented approach, including both automatic and human-related tasks, by suggesting the kind of components to include, their order, and their parameters, while balancing the trade-off between cost, time, and quality of the results.
Johannes De Smedt, Anton Yeshchenko, Artem Polyvyanyy, Jochen De Weerdt and Jan Mendling
Process analytics is an umbrella of data-driven techniques which includes making predictions for individual process instances or overall process models. At the instance level, various novel techniques have been recently devised, tackling next activity, remaining time, and outcome prediction. At the model level, there is a notable void. It is the ambition of this paper to fill this gap. To this end, we develop a technique to forecast the entire process model from historical event data. A forecasted model is a will-be process model representing a probable future state of the overall process. Such a forecast helps to investigate the consequences of drift and emerging bottlenecks. Our technique builds on a representation of event data as multiple time series, each capturing the evolution of a behavioural aspect of the process model, such that corresponding forecasting techniques can be applied. Our implementation demonstrates the accuracy of our technique on real-world event log data.
Anna Kalenkova, Artem Polyvyanyy and Marcello La Rosa
Process models automatically discovered from event logs represent business process behavior in a compact graphical way. To compare process variants, e.g., to explore how the system’s behavior changes over time or between customer segments, analysts tend to visually compare conceptual process models discovered from different “slices” of the event log, solely relying on the structure of these models. However, the structural distance between two process models does not always reflect the behavioral distance between the underlying event logs and thus structural comparison should be applied with care. This paper aims to investigate relations between structural and behavioral process distances and explain when structural distance between two discovered process models can be used to assess the behavioral distance between the corresponding event logs.
Pavani Vemuri, Yves Wautelet, Stephan Poelmans, Simon Verwimp and Samedi Heng
Business process modeling (BPMo) is of primary importance for assessing the current state of an organizations’ practices to discover inefficiencies, redesign business processes, and build software solutions. High-quality representations best capture the true nature of the organization. This paper investigates the hypothesis of whether Business Process Modeling Notation (BPMN), Business Process Diagrams (BPDs) created through a Top-Down Modeling Approach (TDMA) are of higher quality than those made from an operational perspective only. An experiment was conducted where novice modelers were to model a case based on a textual description. The test group used the TDMA by first modeling strategic, tactical aspects using a Business Use-Case Model (BUCM) before the operational realization with BPMN BPDs. In contrast, the control group did not use the BUCM. Representations were then evaluated for overall semantic and syntactic quality by extracting metrics from known literature. Both groups have similar syntactic quality at a granular level. Nevertheless, BPMN BPDs created using TDMA are more complete: required tasks in process execution are significantly more present. An increase in completeness can be beneficial in understanding complex organizations and facilitate modular software development. Alternatively, the diagrams were significantly more complex with more linearly independent paths within workflows than needed.

Session 2: Data Modelling 1

Pol Benats, Maxime Gobert, Loup Meurice, Csaba Nagy and Anthony Cleve
Managing data-intensive systems has long been recognized as an expensive and error-prone process. This is mainly due to the often implicit consistency relationships that hold between applications and their database. As new technologies emerged for specialized purposes (e.g., graph databases, document stores), the joint use of database models has also become popular. There are undeniable benefits of such multi-database models where developers combine various technologies. However, the side effects on design, querying, and maintenance are not well- known yet. In this paper, we study multi-database models in software systems by mining major open-source repositories. We consider four years of history, from 2017 to 2020, of a total number of 40,609 projects with databases. Our results confirm the emergence of hybrid data-intensive systems as we found (multi-) database models (e.g., relational and non- relational) used together in 16% of all database-dependent projects. One percent of the systems added, deleted, or changed a database during the four years. The majority (62%) of these systems had a single database before becoming hybrid, and another significant part (19%) became “mono- database” after initially using multiple databases. We examine the evolution of these systems to understand the rationale of the design choices of the developers. Our study aims to guide future research towards new challenges posed by those emerging data management architectures.
Mohamed-Amine Bazizi, Dario Colazzo, Giorgio Ghelli, Carlo Sartiani and Stefanie Scherzinger.
We study the usage of negation in JSON Schema data modeling. Negation is a logical operator rarely present in type systems and schema description languages, since it complicates decision problems: many software tools, but also formal frameworks for working with JSON Schema, do not fully support negation. This motivates us to study whether negation is actually used in practice, for which aims, and whether it could — in principle — be replaced by simpler operators. We have collected a large corpus of 80k open source JSON Schema documents from GitHub. We perform a systematic analysis, quantify usage patterns of negation, and also qualitatively analyze schemas. We show that negation is indeed used, albeit infrequently, following a stable set of patterns.
Maxime Gobert, Loup Meurice and Anthony Cleve
An increasing number of organisations rely on NoSQL technologies to manage their mission-critical data. However, those technologies were not intended to replace relational database management systems, but rather to complement them. Hence the recent emergence of heterogeneous database architectures, commonly called hybrid polystores, that rely on a combination of several, possibly overlapping relational and NoSQL databases. Unfortunately, there is still a lack of models, methods and tools for data modeling and manipulation in such architectures. With the aim to fill this gap, we introduce HyDRa, a conceptual framework to design and manipulate hybrid polystores. We present the HyDRa textual modeling language allowing one to specify (1) the conceptual schema of a polystore, (2) the physical schemas of each of its databases, and (3) a set of mapping rules to express possibly complex correspondences between the conceptual schema elements and the physical databases.
Ronaldo dos Santos Mello, Geomar Andre Schreiner, Cristian Alexandre Alchini, Gustavo Gonçalves dos Santos, Vania Bogorny and Chiara Renso
Trajectories of moving objects are usually modeled as sequences of space-time points or, in case of semantic trajectories, as labelled stops and moves. Data analytics methods on these kinds of trajectories tend to discover geometrical and temporal patterns, or simple semantic patterns based on the labels of stops and moves. A recent extension of semantic trajectories is called multiple aspects trajectory, i.e., a trajectory associated to different semantic dimensions called aspects. This kind of trajectory increases in a large scale the number of discovered patterns. This paper introduces the concept of dependency rule to represent patterns discovered from the analysis of trajectories with multiple aspects. They include patterns related to a trajectory, trajectory points, or the moving object. These rules are conceptually represented as an extension of a conceptual model for multiple aspects trajectories. A case study shows that our proposal is relevant as it represents the discovered rules with a concise but expressive conceptual model. Additionally, a performance evaluation shows the feasibility of our conceptual model designed over relational-based database management technologies.

Session 3: Data Modelling 2

Gustavo L. Guidoni, Joao Paulo A. Almeida and Giancarlo Guizzardi
Forward engineering relational schemas based on conceptual models is an established practice, with a number of commercial tools available and widely used in production settings. These tools employ automated transformation strategies to address the gap between the primitives offered by conceptual modeling languages (such as ER and UML) and the relational model. Despite the various benefits of automated transformation, once a database schema is obtained, data access is usually undertaken by relying on the resulting schema, at a level of abstraction lower than that of the source conceptual model. Data access then requires both domain knowledge and comprehension of the (non-trivial) technical choices embodied in the resulting schema. We address this problem by forward engineering not only a relational schema, but also creating an ontology-based data access mapping for the resulting schema. This mapping is used to expose data in terms of the original conceptual model, and hence queries can be written at a high level of abstraction, independently of the transformation strategy selected.
Andrea Hillenbrand, Stefanie Scherzinger and Uta Störl
During the development of NoSQL-backed software, the database schema evolves naturally alongside the application code. Espe- cially in agile development, new application releases are deployed frequently. Eventually, decisions have to be made regarding the migration of versioned legacy data which is persisted in the cloud-hosted production database. We address this schema evolution problem and present results by means of which software project stakeholders can manage the operative costs for schema evolution and adapt their software release strategy accordingly in order to comply with service-level agreements regarding the competing metrics of migration costs and latency. We clarify conclusively how schema evolution in NoSQL databases impacts these metrics while taking all relevant characteristics of migration scenarios into account. As calculating all combinatorics in the search space of migration scenarios by far exceeds computational means, we use a probabilistic Monte Carlo method of repeated sampling, serving as a well-established method to bring the complexity of schema evolution under control.
Qiao Gao, Mong Li Lee and Tok Wang Ling
Temporal keyword search enables non-expert users to query temporal relational databases with time conditions. However, aggregation and group-by are currently not supported in temporal keyword search, which hinders users from querying statistical information in temporal databases. Simply combining non-temporal keyword search with aggregates, group-by, and temporal aggregate operators may lead to incorrect and meaningless aggregate results as a result of data duplication over time periods.
This work develops a framework to process temporal keyword search with aggregate functions, group-by and time conditions.
Our framework utilizes Object-Relationship-Attribute (ORA) semantics to identify duplicate information of objects, relationships and their attribute values in the intermediate relation which would lead to incorrect aggregate results.
We also consider the time period in which temporal attributes occur when computing aggregate to return meaningful results.
Experiment results demonstrate the importance of these steps to retrieve correct results for keyword queries over temporal databases.
Alberto Hernández Chillón, Diego Sevilla and Jesus Garcia-Molina
The emergence of NoSQL databases and polyglot persistence demands to address classical research topics in the context of new data models and database systems. Schema evolution is a crucial aspect in database management to which limited attention has been paid for NoSQL systems. The definition of a taxonomy of changes is a central issue in the design of any schema evolution approach. Proposed tax- onomies of changes for NoSQL databases have considered simple data models, which significantly reduce the set of considered schema change operations. In this paper, we present a unified logical data model that includes aggregation and reference relationships, and takes into account the structural variations that can occur in schemaless NoSQL stores. For this data model, we introduce a new taxonomy of changes with opera- tions not considered in the existing proposed taxonomies for NoSQL. A schema definition language will be used to create schemas that conform to the generic data model, and a database-independent language, created to implement this taxonomy of changes, will be shown. We will show how this language can be used to automatically generate evolution scripts for a set of NoSQL stores, and validated on a case study for a real dataset.

Session 4: Goals and Requirements

Xavier Franch and Marcela Ruiz
Most computer science curricula include a compulsory course on data structures. Students are prone to memorise facts about data structures instead of understanding the essence of underlying concepts. This can be explained by the fact that learning the basics of each data structure, the difference with each other, and the adequacy of each of them to the most appropriate context of use, is far from trivial. This paper explores the idea of providing adequate levels of abstractions to describe data structures from an intentional point of view. Our hypothesis is that adopting a goal-oriented perspective could emphasise the main goals of each data structure, its qualities, and its relationships with the potential context of use. Following this hypothesis, in this paper we present the use of iStar2.0 to teach and understand data structures. We conducted a comparative quasi-experiment with undergraduate students to evaluate the effectiveness of the approach. Significant results show the great potential of goal modeling for teaching technical courses like data structures. We conclude this paper by reflecting on further teaching and conceptual modeling research to be conducted in this field.
Maxim Bragilovski, Yifat Makias, Moran Shimshila, Roni Stern and Arnon Sturm
As knowledge increases tremendously each and every day, there is a need for means to manage and organize it, so as to utilize it when needed. For example, for finding solutions to technical/engineering problems. An alternative for achieving this goal is through knowledge mapping that aims at indexing the knowledge. Nevertheless, searching for knowledge in such maps is still a challenge. In this paper, we propose an algorithm for knowledge searching over maps created by ME-MAP, a mapping approach we developed. The algorithm is a greedy one that aims at maximizing the similarity between a query and existing knowledge encapsulated in ME-maps. We evaluate the efficiency of the algorithm in comparison to an expert judgment. The evaluation indicates that the algorithm achieved high performance within a bounded time. Though additional examination is required, the sought algorithm can be easily adapted to other modeling languages for searching models.
Glenda Amaral, Renata Guizzardi, Giancarlo Guizzardi and John Mylopoulos
The advent of socio-technical, cyber-physical and artificial intelligence systems has broadened the scope of requirements engineering, which must now deal with new classes of requirements, concerning ethics, privacy and trust. This brings new challenges to Requirements Engineering, in particular regarding the understanding of the non-functional requirements behind these new types of systems. To address this issue, we propose the Ontology-based Requirements Engineering (ObRE) method, which aims to systematize the elicitation and analysis of requirements, by using an ontology to conceptually clarify the meaning of a class of requirements, such as privacy, ethicality and trustworthiness. We illustrate the working of ObRE by applying it to a real case study concerning trustworthiness requirements.
Wilco Engelsman, Jaap Gordijn, Timber Haaker, Marten van Sinderen and Roel Wieringa
For many companies, information and communication technology (ICT) is an essential part of the value proposition. Netflix and Spotify would not have been possible without internet technology.
Business model up-scaling often requires a different ICT architecture, because an up-scaled business model imposes different performance requirements. This new architecture needs investments and has different operational expenses than the old architecture and requires recalculation of the business model. Investment decisions, in turn are guided by performance requirements.
There are currently no methods to align a quantified business value model of a company with performance requirements on the enterprise architecture. In this paper, we show how to derive performance requirements on an enterprise architecture (EA) specified in ArchiMate from a quantification of a business model specified in e3 value. Second, we show how we can aggregate investments and expenses from an ArchiMate model and insert these into an e3 value model.

Session 5: Modelling the IoT

Fabian Becker, Pascal Bibow, Manuela Dalibor, Aymen Gannouni, Viviane Hahn, Christian Hopmann, Matthias Jarke, István Koren, Moritz Kröger, Johannes Lipp, Judith Maibaum, Judith Michael, Bernhard Rumpe, Patrick Sapel, Niklas Schäfer, Georg J. Schmitz, Günther Schuh and Andreas Wortmann
Smart manufacturing demands to process data in domain- specific real-time. Engineering models created for constructing, commissioning, planning, or simulating manufacturing systems can facilitate aggregating and abstracting the wealth of manufacturing data to faster processable data structures for more timely decision making. Current research lacks conceptual foundations for how data and engineering models can be exploited in an integrated way to achieve this. Such research demands expertise from different smart manufacturing domains to harmonize the notion space. We propose a conceptual model to describe digital shadows, data structures tailored to exploit models and data in smart manufacturing, through a metamodel and its notion space. This conceptual model was established through interdisciplinary research in the German excellence cluster ”Internet of Production” and evaluated in various real-world manufacturing scenarios. This foundation for an understanding helps to manage complexity, automated analyses, and syntheses, and, ultimately, facilitates cross-domain collaboration.
Dietrich Steinmetz, Sven Hartmann and Hui Ma
The rapid advancements in autonomous driving enable the formation of vehicle groups with small distances between vehicles, known as platooning. This technology has attracted research interest as it offers great benefits for future transportation, e.g., fuel economy, reduced CO2 emissions, increased road capacity and improved road safety. Previous works lack unified concepts and are therefore incompatible with each other. This work provides a conceptual model and operations for unified planning of platooning routes. Specifically, this work provides concepts for routing, scheduling, and platoon lifecycles.
Marc Vila, Maria-Ribera Sancho, Ernest Teniente and Xavier Vilajosana
There are a large number of Internet of Things (IoT) devices that transmit information over the Internet, each with a different data format to denote the same semantic concept. This often leads to data incompatibilities and makes it difficult to extract the knowledge underlying that data. The only way to close this gap is to establish a common vocabulary in order to achieve interoperability between different sources and semantic data integration. This is the main goal of our proposal: to specify a general semantics for IoT sensing that allows the management of data between gateways, nodes, and sensors in a homogeneous way. Our proposal builds upon the joint Semantic Sensor Network and Sensor-Observation-Sample-Actuator (SSN/SOSA) ontology.
Maximilian Völker and Mathias Weske
Promising automation of repetitive tasks and release of manpower, Robotic Process Automation (RPA) continues to be a fast-growing market in the IT industry. The industry-driven advancement also comes with disadvantages. Each vendor established their own terminology and ecosystem, impeding communication, integration, and comparisons between RPA systems. In consequence, terminology and concepts are heterogeneous and not well understood. As a result, the scientific exchange lacks a consistent vocabulary. This paper proposes a vendor-independent conceptualization of RPA robots, their constituent parts and the relationships between those. It aims at providing a conceptual foundation for the further scientific investigation of RPA robots.

Session 6: Ontologies

Ivars Blums and Hans Weigand
In the last decade, several UFO-grounded economic exchange ontologies have been developed, notably COFRIS, OntoREA, REA2, and ATE. It is time to take the next step in the direction of corporate reporting standard setters, for which an ontological approach is of high potential value. In this paper, we first describe the foundational assumptions for exchange conceptualization and consolidate the latest developments in COFRIS - a core ontology for financial reporting information systems, within the most recent versions of the UFO theories and the OntoUML tool. We then confront COFRIS with the conceptual framework and standards for accounting and financial reporting and compare it with other UFO grounded exchange ontologies.
Sotirios Liaskos, John Mylopoulos and Shakil M. Khan
Developing and representing conceptualizations is a critical element of conceptual modeling language design. Designers choose a set of suitable concepts for describing a domain and make them the core of the language using suggestive terms that convey their meaning to language users. Additional documentation and training material, such as examples and guides, aims at ensuring that the chosen terms indeed evoke the concepts designers intended. However, there is no guarantee that language designers and users will eventually understand the correspondence between terms and concepts in the same way. This paper proposes a framework for empirically evaluating the vocabulary appropriateness of modeling languages and characterizing its absence in terms of established language design issues. The framework is based on the definition of a set of abstract empirical constructs that can be operationalized into different concrete measures, depending on study requirements and experimental design choices. We offer examples of such measures and demonstrate how they inform language design through a hypothetical language design scenario using a mix of realistic and simulated data.
Giancarlo Guizzardi, Anna Bernasconi, Oscar Pastor and Veda C. Storey
Inspired by the need to understand the genomic aspects of COVID-19, the Viral Conceptual Model captures and represents the sequencing of viruses. Although the model has already been successfully used, it should have a strong ontological foundation to ensure that it can be consistently applied and expanded. We apply an ontological analysis of the Viral Conceptual Model, using OntoUML, to unpack and identify its core components. The analysis illustrates the feasibility of bringing ontological clarity to complex models. The process of revealing the ontological semantics of a data structuring model provides a fundamental type of explanation for symbolic models, including conceptual models.
Atilio A. Dadalto, Joao Paulo A. Almeida, Claudenir M. Fonseca and Giancarlo Guizzardi
The distinction between types and individuals is key to most conceptual modeling techniques. Despite that, there are a number of situations in which modelers navigate this distinction inadequately, leading to problematic models. We show evidence of a large number of modeling mistakes associated with the failure to employ this distinction in the Wikidata knowledge graph, which can be identified with the incorrect use of instantiation, which is a relation between an individual and a type, and specialization (or subtyping), which is a relation between two types.

Session 7: Ontologies & Enterprise Modeling

Suryamukhi K, Vivekananda P.D and Manish Singh
Community Question Answer (CQA) sites are very popular means for knowledge transfer in the form of questions and answers. They rely on tags to connect the askers with the answerers. Since each CQA site contains information about a wide range of topics, it is difficult for users to navigate through the set of available tags and select the best ones for their question annotation. At present, CQA sites present the tags to the users using simple orderings, such as order by popularity and lexical order. This paper proposes a novel unsupervised method to mine different types of relationships between tags and then create a forest of ontologies to representing those relationships. Extracting the tag relationships will help users to understand the tags meanings. Representing them in a forest of ontologies will help the users in better tag navigation, thereby providing the users a clear understanding of the tag usage for question annotation. Moreover, our method can also be combined with existing tag recommendation systems to improve them. We evaluate our tag relationship mining algorithms and tag ontology construction algorithm with the state-of-the-art baseline methods and the three popular knowledge bases, namely DBpedia, ConceptNet, and WebIsAGraph.
Carl Corea, Michael Fellmann and Patrick Delfmann
In theory, ontology-based process modelling (OBPM) bares great potential to extend business process management. Many works have studied OBPM and are clear on the potential amenities, such as eliminating ambiguities or enabling advanced reasoning over company processes. However, despite this approval in academia, a widespread industry adoption is still nowhere to be seen. This can be mainly attributed to the fact, that it still requires high amounts of manual labour to initially create ontologies and annotations to process models. As long as these problems are not addressed, implementing OBPM seems unfeasible in practice. In this work, we therefore identify requirements needed for a successful implementation of OBPM and assess the current state of research w.r.t. these requirements. Our results indicate that the research progress for means to facilitate OBPM are still alarmingly low and there needs to be urgent work on extending existing approaches.
Muhamed Smajevic and Dominik Bork
A core strength of enterprise architecture (EA) models is their holistic and integrative nature. With ArchiMate, a de-facto industry standard for modeling EAs is available and widely adopted. However, with the growing complexity of enterprise operations and IT infrastructures, EA models grow in complexity. Research showed that ArchiMate as a language and the supporting EA tools lack advanced visualization and analysis functionality. This paper proposes a generic and extensible framework for transforming EA models into graph structures to enable the automated analysis of even huge EA models. We show how enterprise architects can benefit from the vast number of graph metrics during decision-making. We also describes the implementation of the extensible Graph-based Enterprise Architecture Analysis (eGEAA) Cloud platform that supports the framework. The evaluation of our approach and platform confirms feasibility and interoperability with third-party tools.
Fathi Jabarin, Alan Hartman, Iris Reinhartz-Berger and Doron Kliger
IT departments in Multi-Business Organizations (MBOs) face challenges when providing services to satisfy business needs. In many cases, the services provided by an IT department do not address all the requirements of the relevant business units and hence are only partially adopted by a subset of units. While existing research on enterprise architecture and service provision focuses on business-IT alignment and optimization of quality or efficiency, our objective is to maximize the number of stakeholders and business units fully adopting the services provided by the IT department. In this paper, we introduce a conceptual model which comprises organizational and IT service-related concepts. With this underlying model, we propose a method for improving the cooperation among IT departments and business units in order to increase the adoption of the most appropriate services taking into account the variation in business unit characteristics and performance indicators. We describe how the analysis and presentation of the information gathered from the stakeholders can support decision makers and advance the adoption goals of the IT department and the whole MBO. We present the results of a case study whose aim is to determine the feasibility of the approach. The case study deals with a business need for scheduling meetings between customers and bankers in a large bank.

Session 8: Social Aspects of Conceptual Modelling

Diogo Albuquerque, Ana Moreira, João Araujo, Catarina Gralha, Miguel Goulão and Isabel Sofia Brito
Sustainability poses key challenges in software development for its complexity. Our goal is to contribute with a reusable sustainability software requirements catalog. We started by performing a systematic mapping to elicit and extract sustainability-related properties, and synthesized the results in feature models. Next we used iStar to model a more expressive configurable catalog with the collected data, and implemented a tool with several operations on the sustainability catalog. The sustainability catalog was qualitatively evaluated regarding its readability, interest, utility, and usefulness by 50 participants from the domain. The results were encouraging, showing that, on average, 79% of the respondents found the catalog “Good” or “Very Good” in endorsing the quality criteria evaluated. This paper discusses the social and technical dimensions of the sustainability catalog.
Inês Nunes, Ana Moreira and Joao Araujo
As technologies revolutionize the way we live, overlooking gender perspectives in the development of digital solutions results in gender-biased technology that, instead of advancing gender inclusion, creates new barriers in achieving it. This paper proposes a conceptual model for gender inclusion in software development. We started by performing a systematic mapping study to gather the relevant concepts from the existing body of knowledge. This served as groundwork for the definition of a conceptual model of gender-inclusive requirements.
Solomon Antony and Dharmendar Salian
'Open data' is the term used describe the concept that data is available freely for anyone to use for research and publication. Most governments in the world have taken steps to publish data under the 'Open data' umbrella. While it is a welcome step that can enable data analysis by citizen scientists, the quality and accessibility of open data datasets the process a bit challenging for those not trained in data management. Employing the concepts of data normalization, an analysis of a random sample of datasets was conducted. Particularly, a measure for the usability of datasets is proposed. The measures are based on the ease of use of importing into popular data management and data analysis tools. Based on the findings, a list of steps for data preparation is recommended for publishers and users datasets.


Accepted Posters and Demos

Manfred Jeusfeld
Multilevel modeling aims at improving the expressiveness and conciseness of conceptual modeling languages by allowing to express domain knowledge at higher abstractions levels. In this demonstration, we go thru two variants of multilevel extensions for the ConceptBase system, which had originally been used more for the design of domain-specific conceptual modeling languages. The demonstration highlights the partial evaluation feature of the deductive rule engine of ConceptBase. It also shows how multilevel modeling is essentially about a better understanding how instantiation, specialization, and attribution relate to each other in conceptual modeling.
Gongsheng Yuan and Jiaheng Lu
Considering relational databases having powerful capabilities in handling security, user authentication, query optimization, etc., several commercial and academic frameworks reuse relational databases to store and query semi-structured data (e.g., XML, JSON) or graph data (e.g., RDF, property graph). However, these works concentrate on managing one of the above data models with RDBMSs. That is, it does not exploit the underlying tools to automatically generate the relational schema for storing multi-model data. In this demonstration, we present a novel reinforcement learning-based tool called MORTAL. Specifically, given multi-model data containing different data models and a set of queries, it could automatically design a relational schema to store these data while having a great query performance. To demonstrate it clearly, we are centered around the following modules: generating initial state based on loaded multi-model data, influencing learning process by setting parameters, controlling generated relational schema through providing semantic constraints, improving the query performance of relational schema by specifying queries, and a highly interactive interface for showing query performance and storage consumption when users adjust the generated relational schema.
Evelina Rakhmetova, Carlo Combi and Andrea Fruggi
In this paper, we demonstrate an application of a recently proposed comprehensive UML-based approach to the conceptual modelling of log files. On the real example, we built an ad hoc UML-based (class) diagram for representing the main features of the logs nested structure and generated an artifact (a template in JSON) based on ECS. We also describe plans for the design of a specialized tool through a conjunction of the already developed artifacts. Presented work is a part of a wider study on the proposed initiative for the general concept of log files standardization. A clear structure of log data would allow the more systematic development and more straightforward implementation and employment of the new information systems, minimize anomalies, errors and time delays.
Lyes Attouche, Mohamed-Amine Baazizi, Dario Colazzo, Yunchen Ding, Michael Fruth, Giorgio Ghelli, Carlo Sartiani and Stefanie Scherzinger
JSON is a very popular data exchange format, and JSON Schema an increasingly popular schema language for JSON. Evidently, schemas play an important role in implementing conceptual models. For JSON Schema, there is a first generation of tools for checking whether one schema is contained in another. Testing whether tool implementations are correct is difficult, since writing test cases requires a deep understanding of the JSON Schema language, and the conceptual models described. In this demo, we present the first systematically generated test suite for JSON Schema containment. This test suite consists of pairs of schemas where the containment relationship is known \emph{by construction}. Our test suite aims at coverage of all language features of JSON Schema. Using our test suite on existing containment checkers (including our own implementation), we discovered implementation bugs not known to us. In this paper, we present our test suite to the research community as well as to tool developers, hoping to proliferate the development of JSON Schema containment checkers.
Benjamin Ternes, Kristina Rosenthal and Stefan Strecker
Identifiers of model elements convey semantics of conceptual models essential to interpretation by human viewers. Devising meaning- ful identifiers for model elements has repeatedly been shown to challenge data modelers from early learning stages to advanced levels of modeling expertise, constituting one of the most common difficulties data modelers face. We demonstrate the Automated Assistant, an integrated modeling tool support combining natural language processing techniques and data modeling heuristics to provide data modelers with modeling-time feed- back on identifying and signifying entity types, relationship types, and attributes with meaningful and expedient identifiers. Different from other approaches to automating assistance for data modelers, the Automated Assistant implementation does not rely on fixed reference solutions for modeling tasks and processes (m)any natural language descriptions of modeling tasks. We report on the current state of prototype development, discuss the Automated Assistant implementation and its evaluation in typical application scenarios, and outline future work.
Simon Curty, Felix Härer and Hans-Georg Fill
The design of blockchain-based applications requires today in-depth technical knowledge of the underlying technologies and software frameworks. In order to investigate how enterprise modeling approaches can aid in designing such applications and aligning their structure and behavior with business needs, we conduct a comparison of two types of blockchain platforms using the ArchiMate modeling language. Based on a use case for Non-fungible Tokens for digital image licensing, we derive models for a software application using public and permissioned blockchain platforms. This permits us to gain first insights into the adequacy of ArchiMate for representing blockchain-based applications and for highlighting the architectural differences of public and permissioned blockchain approaches from a conceptual modeling perspective.
André Conrad, Sebastian Gärtner and Uta Störl
Non-relational systems are essential to manage large amounts of semi- or unstructured data. To use the optimal data storage at a given time, it may be necessary to change the data model during the lifetime of an application. This paper offers an visionary approach that provides an automated schema migration and optimization between different NoSQL data stores. By means of data and query analysis, optimizations for all existing cardinalities can be achieved with respect to good query performance with minimal redundancy. First performance measurements prove the increase in performance.
Damianos Chatziantoniou and Verena Kantere
DataMingler is a prototype tool that implements a novel conceptual model, the Data Virtual Machine (DVM) and can be used for agile just-in-time modeling of data from diverse sources. The DVM provides easy-to-understand semantics and fast and flexible schema ma- nipulations. An important and useful class of queries in analytics environments, dataframes, is defined in the context of DVMs. These queries can be expressed either visually or through a novel query language, DVM-QL. We demonstrate DataMingler’s capabilities map relational sources and queries on the latter in a DVM schema and augment it with information from semi-structured and unstructured sources. We also show how to express on the DVM easily complex relational queries or queries on structured, semi-structured and unstructured sources combined.
Fabian Muff and Hans-Georg Fill
One current challenge in enterprise modeling is to establish it as a common practice in everyday work instead of its traditional role as an expert discipline. In this paper we present rst steps in this direction through augmented and virtual reality-based conceptual modeling. For this purpose we developed a novel meta-metamodeling framework for augmented and virtual reality-based conceptual modeling and implemented it in a prototypical tool. This permits us to derive further requirements for the representation and processing of enterprise models in such environments.


Accepted Tutorials

Heinrich Mayr and Bernhard Thalheim
Models are the fundamental human tools for managing complexity and understanding. As such, they play a key role in all scientific and engineering disciplines as well as in everyday life. Many modelling paradigms have evolved over time in various disciplines, a central one in computer science being that of conceptual modelling. This is especially indispensable in the development of database and information systems. However, as a universal tool, conceptual models also play an important role in other areas of computer science, although they are not often referred to as such.
Despite the widespread usage of conceptual modeling, the community is still tinkering with foundations and theories - and also with the notion itself: more than 60 definitions can be found in the database field alone - with the somewhat sobering result that in concrete situations one has often difficulties deciding whether a conceptual model is present or not.
On the other hand, conceptual modeling has a history of at least 4,000 years, with many success stories in a wide variety of fields. Therefore, we should strive to develop a common understanding of the notion and essence of modeling, and conceptual modeling in particular, as well as a (mathematical) toolbox for modeling activities.
With this tutorial we want to show some puzzle pieces to answer these research challenges and to stimulate the discussion. In particular, we address the "anatomy" of conceptual models and show how they can be characterized by a signature and thus distinguished from other types of models.
We combine this with a more transparent explanation of the nature of conceptual models, seeing them as a link between the dimension of linguistic terms and the encyclopedic or ontological dimension of notions. As a paradigm, we use the triptych, whose central tableau represents the model dimension. We base our explanation on several examples of conceptual modeling from very different subject areas and with recourse to different modeling languages.
Wil van der Aalst
Process mining is the missing link between model-based process analysis and data-oriented analysis techniques. Through concrete data sets and easy-to-use software, the tutorial provides concepts and tools that can be applied directly to analyze and improve processes in a variety of domains.
The course explains the key analysis techniques in process mining. Participants will learn about process discovery techniques. These can be used to automatically learn process models from raw event data. Various other process analysis techniques that use event data will be presented, including conformance checking. Moreover, the course will provide access to easy-to-use software, real-life data sets, and practical skills to directly apply the theory in a variety of application domains.
The course is relevant for ER participants because there is a direct connection to conceptual modeling. A conceptual model is a representation of a system or process, made of the composition of concepts which are used to help people to better know, understand, simulate, and improve the system of process the model represents. In the context of the larger Business Process Management space, conceptual models play an important role. See, for example, the Business Process Modelling Notation (BPMN) notation and the process-oriented diagrams in UML (activity diagrams, statecharts, sequence diagrams, class diagrams, etc.). It is not just about the modeling of processes but also the data.
Process mining provides a range of techniques to relate data in information systems to conceptual models. In recent years, we could witness a spectacular uptake in process mining. There used to be a gap between process science (i.e., tools and techniques to improve operational processes) and data science (i.e., tools and techniques to extract value from data). Mainstream machine learning and data mining techniques do not consider operational processes. Business Process Management (BPM) and Operations Research (OR) tend to start from models rather than data. Process mining bridges this gap. Currently, there are over 35 commercial process mining vendors (ABBYY Timeline, ARIS Process Mining, BusinessOptix, Celonis Process Mining, Disco/Fluxicon, Everflow, Lana, Mavim, MPM, Minit, PAFnow, QPR, etc.) and process mining is applied in most of the larger organizations. Example application domains include: finance (Rabobank, Hypovereinsbank, etc.), telecom (Deutsche Telekom, Vodafone, etc.), logistics (Vanderlande, etc.), production (BMW, Siemens, Fiat, Bosch, etc.), food (Edeka, etc.), fashion (Zalando, etc.), energy (E-on, etc.), transport (Uber, DB, Lufthansa, etc.), healthcare (AstraZenica, Medtronic, etc.), consulting (Deloitte, EY, KPMG, etc.), and IT systems (Dell, IBM, ServiceNow, etc.).
Process mining provides not only a bridge between data mining and business process management; it also helps to address the classical divide between "business" and "IT". Evidence-based business process management based on process mining helps to create a common ground for business process improvement and information systems development. Therefore, the tutorial is interesting for participants that are interested in both technical aspects and applications.
The tutorial will be at an introductory level. Participants should have a basic understanding of process and data modeling without knowing a specific notation or language.
Anna Bernasconi and Pietro Pinoli
Genomics is an extremely complex domain, in terms of concepts, their relations, and their representations in data. The proposed tutorial will introduce the use of ER models in the context of genomic systems: conceptual models are of great help for simplifying this domain and making it actionable. We will carry out a review of successful models presented in literature for representing biologically-relevant entities and grounding them on databases. We will draw a difference between conceptual models that aim to explain the domain and conceptual models that aim to support database design and heterogenous data integration. Genomic experiments and/or sequences are described by several metadata, specifying information on the sampled organism, on the used technology, and on the organizational process behind the experiment. Instead, we call data the actual regions of the genome that have been read by sequencing technologies and encoded into a machine-readable representation. First, we show how data and metadata can be modeled, then we exploit the proposed models for designing search systems, visualizers, and analysis environments. Both domains of viral genomics and human genomics are addressed, surveying several use cases and applications of a broader public interest. The tutorial is relevant to the ER community because it demonstrates the usefulness of conceptual models’ principles within a very relevant and current domain; in addition, it offers a concrete example of conceptual models’ use, setting the premises for interdisciplinary collaboration with a greater public (possibly including life science researchers).
Victoria Lemieux
A blockchain is a ‘distributed ledger with confirmed blocks organized in an append-only sequential chain using cryptographic links’ (International Organization for Standardization (ISO), 2020a, s. 3.6), with distributed ledgers being defined as a ‘ledger that is shared across a set of [distributed ledger technology (DLT)] nodes and synchronized between the DLT nodes using a consensus mechanism’ (ISO, 2020a, s. 3.22). This technology, which emerged in 2009 in parallel with the bitcoin cryptocurrency out of an assemblage of pre-existing technologies, is now transforming industries through the unique means by which it fills trust gaps or replaces traditional trust anchors between transacting parties to enable complex forms of inter-personal and organizational trust, coordination and collaboration that are driving business, economic and social value. Despite the rapid growth in blockchain and distributed ledger systems, work on developing a suitable framework for designing such systems is still in nascent form (see, for example, Porru, et al, 2017; Kannengiesser, 2019; Udowku and Norta, 2021). A key challenge for blockchain and distributed ledger systems designers, as with many of enterprise systems, has been how to achieve conceptual coordination among experts contributing to the design of an information system (i.e., the coordination problem). This problem is particularly acute in the design of blockchain and distributed ledger systems wherein the knowledge of a wide variety of experts, each with their own theories, principles, methods, and terminology, is necessary for the building of a successful system that will also avoid harmful unintended consequences. To address this problem, this tutorial will present and discuss the application of a novel high-level conceptual model (“The Three-Layer Model”) and accompanying question-led framework for the design of blockchain and distributed ledger systems (Lemieux and Cheng, 2021). The model, which draws upon an integration of social, archival, and natural world ontologies and general and complex systems theory, was developed by a multidisciplinary group of international experts using a special adaptation of the strategic design methodology (Lemieux and Bravo, 2021) and represents a theory of blockchain and distributed ledger systems that can be used to create a common conceptual framework and language among design experts, articulate system requirements, predict missing or misaligned elements of proposed systems and in the evaluation of existing design artefacts or systems.
Bing Li, Yaoshu Wang and Wei Wang
Entity resolution (ER) (a.k.a., entity matching, record linkage, and duplicate record detection) aims at identifying entity records that refer to the same real-world entity from different data sources. As a fundamental essence for data cleaning and data integration, entity resolution has been widely applied in knowledge graph construction, e-commerce, etc. Since its inception, it has been extensively studied by means of various methodologies such as declarative rules, crowd-sourcing, and machine learning. Over the past few years, deep learning (DL) has are promoting fast-paced advances for many established fields such as CV, NLP, as well as for data management. This trend brings new opportunities and challenges to ER tasks. Many DL-based ER models have emerged to tackle this long-standing problem.
This tutorial aims to provide a comprehensive review of recent advances for entity resolution, especially the DL-based ER solutions. As a typical AI for DB problem, the studies of DL-based ER solution attracted widely attentions from both AI and DB communities. The core of the interplay is how to model the table schema and training pipeline. In this sense, it is not only interesting to see how DL techniques are applied to a typical DB problem, but also how it in turn changes the way DB, as its origin, dealing with the problem. We first discuss the importance of entity resolution in data cleaning and data integration, and then review the recent DL-based entity matching models regarding different schema modeling, i.e., schema-agnostic, hard-schema, and soft-schema. We analyze their strengths and weaknesses. Moreover, we show two types of ER pipeline, blocker and matcher pipleline and joint-learning paradigm. We wish this tutorial could be an impetus towards more AI for DB applications.

PhD Symposium

Session 1

Rosa Velasquez Universitat Politecnica de Valencia, Spain

Context. Understandability is one of the most important quality criteria in business process models (BPMs). While the experimental study of the factors that affect understandability is an ongoing research, current initiatives are focused on a limited set of factors. An open challenge is to explore the relationships among several of these factors using automated statistical techniques. Machine Learning (ML) has been applied to generate statistical models, based on the combination of multiple factors, and to find relationships to predict indicator’s values. Objective. This thesis addresses the design of a method to assess the understandability of BPMs based on ML in order to predict whether a model could be understandable. This method will be implemented in an assisted modelling tool. Method. Using the design science methodology, the research aims to identify the factors that influence the understandability, their relationship, and how to measure them. This way we can correlate these factors and know which of them most affect the com- prehensibility of the BPMs. Our final target is to provide an automatic evaluation of understandability. Results. The expected contributions are 1. the design of an understandability automatic evaluation model and 2. an assisted modelling tool that incorporates the evaluation model to provide real-time guidance for more understandable models. Conclu sion. We aim to demonstrate that ML techniques can be used to predict BPMs understandability automatically.

Discussant: Pnina Soffer, University of Haifa

Darien Craft University of Connecticut, USA

For predictive and experimental methods alike, discovering the structure and biological mechanisms of proteins is vital to our fundamental understanding of life. Driven by the vast number of solved protein structures through x-ray crystallography and Nuclear Magnetic Resonance, as well as advances in machine learning and neural networking that enable us to predict a protein's structure based solely on its amino acid sequence, this project lays the groundwork for predicting what would be the observed experimental NMR data of a protein based on its structure. Outlined is our ongoing conceptual model, implemented as relational databases, used for our work ow-based approach to solving the Forward Modeling problem of NMR. This approach will support ongoing machine learning approaches in predicting protein-ligand binding mechanisms and other kinetic studies.

Discussant: Oscar Pastor, Universitat Politècnica de València

Session 2

Sanaz Nabavian Memorial University of Newfoundland, Canada

A quintillion bytes of data are created every day. Reusing the collected data for different purposes is a better option in many cases than gathering new data. However, preparing existing data to match the requirements of new uses can be difficult. This research aims to give some guidelines for designing a dataset which is more repurposable. As conceptual modeling is the heart of designing an Information System, I will focus on how defining the self-defining concepts could help datasets to be reused in other concepts and to improve the datasets connect ability to other datasets.

Discussant: Wolfgang Maass, Saarland University

  • Matthias Jarke, RWTH Aachen, Germany
  • Sudha Ram, University of Arizon, USA
  • Yair Wand, University of British Columbia, Canada


Main Conference Panel

Erik Proper  Luxembourg Institute of Science and Technology and University of Luxembourg

Over the past decennia, the fields of enterprise modelling, enterprise engineering, and enterprise architecture (EMEA) have provided interesting fields of application for conceptual modelling. Enterprise engineering & architecting rely on the use of (conceptual) enterprise models to represent different aspects of the existing/desired design of an enterprise (including companies, government agencies, smart cities, etc). The models used, typically range (at least) across the entire value-proposition-to-business-services-to-business-processes-to-information-systems-to-IT stack of an enterprise.

Meanwhile, the EMEA fields seem to be "under pressure". New enabling information technologies such as AI, Digital Twins, Block Chain, IoT, etc, appear to have a stronger interest and appreciation from funding sources. At the same time, there is evidence, from industrial practice, that the coherence, coordination and integration oriented perspectives of EMEA (captured in conceptual models) is direly needed. Now more than ever. The goal of this panel discussion is to explore the future role of EMEA as a field of research, and the role of conceptual modelling with(in) it.

This panel discussion is actually part of an ongoing discussion on the future of EMEA. The kick-off of this discussion took place during this year's IEEE CBI conference. At ER 2022, we plan to focus the discussion on the role of (conceptual) enterprise models. The broader discussion on the future of EMEA will be continued during working sessions at the IEEE EDOC conference and the IFIP 8.1 PoEM working conference later this year.

If you are active in the E.M.E.A. field, and care about its future, and the role of conceptual modelling within it, then please join the discussion.

  • Ulrich Frank, University of Duisburg-Essen, Germany
  • Asif Gil, University of Technology Sydney, Australia
  • Giancarlo Guizzardi, University of Twente, Netherlands and University of Bolzano, Italy
  • Robert Winter, University of St. Gallen, Switzerland

Signal Hill Campus

Get Directions