Generic Solution Architecture Design of Regulatory Technology (RegTech)

ABSTRACT


Background
The banking world experienced significant changes after the financial crisis in 2008 [1], [2]. Almost all countries in the world strengthened their economic and regulatory policies significantly to prevent postcrisis risks from occurring [1]. Von solm [1] stated that the financial industry faces challenges due to the volume and complexity of regulations that must be followed. At the same time, digital transformation has also occurred in the financial industry, including banking, with the emergence of Financial Technology (FinTech) companies and applications [2]. The rapid evolution of FinTech is driving increased operational data and the emergence of new, more challenging risks such as fraud and cybersecurity [2]. Butler and Brien [3] stated that more than 50,000 regulations were issued between 2009 and 2012 at the G20. One of the major impacts of these challenges is the costs incurred to achieve regulatory compliance [1], [3]. Von Solms [1] also stated that based on research results from "The Trade", banks spent over 100 billion dollars on regulatory compliance in 2016. These costs will continue to increase because the process is still manual to comply with regulations, such

Research Problem
RegTech solutions have the potential to be applied in a variety of industrial environments or regulated entities that require strict monitoring and reporting other than financial industry, such as health [15], pharmaceutical [16], real estate [13], and charitable organizations [14]. In addition, RegTech solutions can also be applied to non-profits or government agencies. It is also supported by Johansson et al. [12] by stating that one of the characteristics of RegTech is providing intensive services to various industries following the regulations that govern them, not only in the financial sector.
Currently, research related to RegTech is still very dominant in the financial sphere [9], such as the use of semantic technology in RegTech for the compliance reporting process in the banking industry [3], the application of RegTech to prevent money laundering [17], the application of RegTech in online financial services [2], and the application of RegTech for regulatory management so that support the management of treasury activities as well as demands for reporting as a form of compliance in the banking industry [1]. Therefore, to support and realize the application of RegTech in various industrial environments, it is necessary to analyze the specificity of RegTech in the financial industry, what the industry needs in general, and the more generic solution architecture.
Based on the problems, challenges, and several studies carried out by various parties and places, until now, there has yet to be an answer to the RegTech generic solution architecture [9], [10]. Therefore, this research was conducted to fill this gap. So, the problem discussed in this study is how to design a generic RegTech solution architecture. The architecture can be applied across various industries or regulated entities to achieve regulatory compliance more efficiently.

Objective
To answer the previously defined research problem, we determined that this study aimed to design a generic RegTech solution architecture. Solution architecture deals with the design and definition of IT solutions so that these solutions can be implemented, used, operated, and supported safely and efficiently [18]. The proposed architecture was designed by analyzing the specificity of RegTech in the financial industry, and the industry needs in general. In addition, the architecture is not specific to certain industries and designed at a high level. The results of the proposed architectural design are expected to contribute to the application of RegTech in various industrial fields or regulated entities in realizing regulatory compliance more efficiently.

Methodology
This research principally carries out the process of designing an artefact, namely the solution architecture. The research method used in this study refers to the Design Science Research Methodology (DSRM) proposed by Peffers et al. [19]. Figure 1 shows the stages of the activities carried out in this study. Figure 1. DSRM process model [19], [20] In general, the DSRM stages consist of six parts, namely (1) problem identification and motivation; (2) objective definition of a solution; (3) design and development; (4) evaluation; (5) demonstrations; and (6) communication. The first section of this paper, the introduction, which comprises the background, research problem, and objective, explains the first and second stages. The third stage entails creating artefacts, including the RegTech generic solution architecture. The fourth stage evaluated the proposed artefacts' suitability for application in various case simulations [21]. Furthermore, in the fifth stage, artefact measurements were carried out regarding time efficiency [20]. The last step is to describe and present the research's findings by producing this article.

Design Method
This section describes in more detail the third stage of DSRM: creating and producing generic RegTech solution architecture artefacts. The architectural design adopts the enterprise architecture (EA) layer division model in the TOGAF framework. The division of these layers includes business, data, applications, technology or infrastructure and security, as shown in Figure 2. In general, there are two main stages in designing the architectural artefacts of this solution, namely requirements analysis and architectural modelling. In the first stage, a needs analysis is carried out for each layer, including solution components, including people, organization, process, information or data, and technology [22]. Next, modelling the solution architecture at each layer is carried out based on the results of identifying needs that have been analyzed previously. The modelling of the adopted EA artefacts can be in the form of catalogues (lists of things), matrices (showing relationships between things), and diagrams (pictures of things) [23]. Lovatt [22] states that the solution components intersect with the four EA domains, as shown in Figure  3. Therefore, these components are also used to form the solution architecture in each corresponding domain. The difference is that the scope of the EA domain is for enterprises, while the solution architecture domain is only for certain specific solutions. In this study, it is the RegTech solution. Solution architecture is part of EA and must continue to coordinate with EA in its components to remain consistent.The details of the generic RegTech solution architecture design stage in each layer follow.
1. A generic regulatory compliance business process analysis is carried out in the business layer to obtain what is needed, including processes, activities, actors, and organizations. Based on the needs analysis results, business architecture modeling is carried out. 2. In the data layer, a needs analysis is carried out by identifying data entities and data storage needs in supporting the generic business processes of the RegTech solutions identified previously. Furthermore, data architecture modeling is carried out. 3. At the application layer, a needs analysis is carried out by identifying applications that support business processes and manage data previously identified at the business and data layers. Based on these needs, data architecture modeling is carried out. 4. Next, a needs analysis is conducted at the infrastructure layer by identifying a catalog of basic and new IT infrastructure needed to support applications and data management. Next, infrastructure architecture modeling is completed. 5. Finally, the security layer analyzes the security requirements based on the infrastructure architecture to support the security of applications and data and the running of business processes in the generic RegTech solution. A high-level security architecture modeling is performed based on the needs analysis results.

Evaluation
This section describes stages four and five of the DSRM: demonstration and evaluation. The demonstration was carried out by using artifacts to illustrate logical scenarios in the non-financial industry. Peffers et al. [21] stated that architectural artifacts could be evaluated using scenario illustrations. An illustration of the scenario can be seen in Table 1. Furthermore, from the scenario illustration, a simulation and evaluation of business processes are carried out using the time analysis feature with the help of the Bizagi Modeler application. It aims to see the time efficiency of the regulatory compliance process before and after implementing the RegTech solution. Sonnenberg et al. [20] stated that the efficiency variable could be used to carry out a measurable evaluation of architectural artifacts. ABC Ltd. is one of the cosmetic manufacturing companies in Indonesia and holds several leading brands. In carrying out its operations, ABC Ltd. is supervised by the Food and Drug Supervisory Agency (In Indonesia, it is known as BPOM). BPOM has a role in ensuring that cosmetic products sold in Indonesia are safe and comply with applicable regulations.
ABC Ltd. wants to develop and implement RegTech solutions to ensure operations related to the manufacture and distribution of cosmetics comply with regulations issued by BPOM. With the RegTech solution, it is hoped that non-compliance with these operations can be detected earlier and in real-time so that preventive decisions can be taken immediately. In addition, the RegTech solution is also expected to help automate the creation of compliance reports.

RESULTS AND DISCUSSION
The architecture of the generic RegTech solution is covered in this section, alongside the results of the proposed and evaluated artefacts in this study. In addition, it also discussed related future research perspectives.

Business Architecture
At this stage, the focus is on the three main components of the solution architecture design, namely, people, organization, and business processes. Business process analysis is carried out based on literature studies related to regulatory compliance business processes by applying RegTech to the financial industry as described by [1], [3], [25], [26]. Furthermore, the business process flow is adopted and adjusted to become a generic regulatory compliance business process in regulated entities. Generic business processes are general and do not refer to a particular industry or entity. Based on the literature study, it was concluded that regulated entities' regulatory compliance business process consists of four processes: data collection, data processing, data analysis, and data presentation [10]. Furthermore,the needs of each process are identified to support generic regulatory compliance by implementing RegTech solutions. The results of identifying these needs can be seen in Table 2. Automation of regulatory (real-time) and organizational operational (real-time and batch) data collection.

Data Processing
Automation of regulatory (real-time) and organizational operational (real-time and batch) data extraction and transformation.

Data Analytics
Automation of regulatory (real-time) and compliance (real-time and batch) data analytics using AI/ML Data Presentation Automate the presentation of regulatory data analytics results, compliance analytics results, and compliance reporting. This activity can be displayed in real-time and batch.
Based on the needs analysis in Table 2, a business architecture model is carried out in the form of a diagram covering the components of business processes, people, and organizations. The division of business process levels adopts the hierarchical process model made by Mahal [27]. The results of business architecture modelling can be seen in Figure 4. There are two levels in the generic business architecture of RegTech solutions ( Figure 4). First is level 1 business processes, including data collection, processing, analytics, and presentation. Each of these processes is supported by a data store. Second, level 2 business processes include all sub-processes in each level 1 business process. Altogether there are nine sub-processes. Each subprocess has several interrelated activities to fulfil regulatory compliance using RegTech.
Actors involved in the generic business processes of RegTech solutions are categorized into external and internal ( Figure 4). External actors are regulators/supervisors who have a role as parties that issue various regulations and receive compliance reports submitted by regulated entities as a form of compliance. Internal actors are regulated entities or organizations whose business activities are regulated by regulation. Ideally, this entity has a compliance unit or compliance work unit that is focused on regulatory compliance issues. Some of the roles contained in this actor are regulatory/legal specialist, data engineer, database administrator, data analyst, and data scientist.

Data Architecture
At this stage, the focus is on analysing data components, including data entities and storage. Data entity identification was carried out for each generic business process RegTech solution in the early stages based on the previous business architecture design results. Each business process identified data entities involved at a high level. Next, place every business process that requires data storage. Results identification of data entities and data storage requirements in generic business processes RegTech solutions can be seen in Table 2. Input : Regulation (regulatory data published by regulator/supervisor), organizational operational (organizational operational data contained in various internal sources of the organization), and metadata. Output : Regulation (regulatory data that has been collected and stored in storage), organizational operational (organizational operational data that has been collected and stored in storage), and metadata.
• Storage of regulatory data published by regulators/supervisors (raw data) and metadata.
• Storage of organizational operational data collected from various internal sources of the organization (raw data) and metadata.

Data Processing
Input : Regulation (regulation data already collected), organizational operational (operational data of the organization that has been collected), and metadata. Output : Regulation (regulation data that has been transformed), organizational operational (transformed organizational operational data), and metadata.
• Storage of transformed regulatory data (valid or clean data) and metadata. • Formatted and standardized store of organizational operational data for compliance analytics and metadata.

Data Analytics
Input : Regulation (transformed regulatory data), organizational operational (transformed organizational operational data that conforms to formats and standards for compliance analytics), model (AI/ML model for compliance analytics), and metadata. Output : Regulation (regulatory data analysis results), compliance analytics result, compliance report, and metadata.
• Storage of AI/ML models for compliance analytics. • Data storage of compliance analysis results. • Metadata storage.

Data Presentation
Input : Regulation (regulatory data analysis results), compliance analytics result, compliance report, and metadata.

Output : Metadata
Storage of metadata related to data presentation.
Based on the analysis of high-level identification of data entities and data storage needs, the following are the general requirements that must be met by the generic data architecture of the RegTech solution that will be designed.
1. Generally, the data entities must be available are regulation, organizational operational, model, compliance analytics result, compliance report, and metadata. 2. Have a data storage system that has general capabilities as follows: (i) supports storage and handling of data in structured, semi-structured, and unstructured formats in large volumes; (ii) supports data processing in real-time and batch; (iii) supports data/metadata catalog storage; (iv) supports on-demand scalability so that capacity is easily added according to various needs; and (v) support access by systems that require stored data, such as data science, visualization, and reporting tools. Furthermore, high-level relationship modeling between entities and data architecture modeling of RegTech generic solutions are carried out based on the needs analysis results. Figure 5 shows the relationship between data entities that have been identified previously. Modeling the relationship is done conceptually using entity relationship (ER) diagrams. Meanwhile, RegTech generic data architecture modeling can be seen in Figure 6. The data storage system adopts the data lakehouse architecture. A data lakehouse can be defined as "a data management system based on low-cost, directly accessible storage that also provides database management system performance and management features for traditional analytics such as ACID (atomicity, consistency, isolation, durability) transactions, data versioning, auditing, indexing, caching, and query optimization" [28]. Lakehouse combines the main benefits of data lakes and data warehouses, namely low-cost storage in an open format that various systems from the data lake and the data warehouse's power management and optimization features can access [28]. Data lakehouses can store data in structured, semi-structured and unstructured formats [29]. The data sources that RegTech inputs are regulatory data published by regulators/supervisors and various organizational operational data. The process of ingestion or collecting data can be batch and real-time. Regulatory data is collected in real-time so that every time there is a change in regulations, the data will be processed immediately. Organizational operational data are identified according to the needs related to the collection in real-time or batch.
The data lakehouse divides the data storage zone into three, namely: bronze, silver, and gold zone [29]. All raw data collected is stored in the bronze zone. Data from the bronze zone transformed through the ELT (Extract, Load, Transform)/ETL (Extract, Transform, Load) process is stored in the silver zone. The gold zone stores the data transformation results from the silver zone through the ELT/ETL process. Data in the gold zone has been aggregated or has a certain scheme for business needs. The catalogue layer is in the form of data management and governance, such as metadata so that it can manage and organize data, such as a data warehouse. This is what distinguishes data lakehouses from data lakes. This layer supports storing all data formats and incorporates best practices from database management features to support data management, governance, and securing data. The data analytics layer performs analytical processing for regulation and compliance in real-time and batch. This process can use AI/ML models for real-time monitoring related to compliance or certain query schemes. Analytical result data is presented or displayed in the presentation or consumption layer using the application programming interface (API) as a data exchange.

Application Architecture
At this stage, an analysis of how software applications support business processes (business architecture) and help manage data (data architecture) in RegTech generic solutions is carried out. The first thing to do is to identify applications that support the generic business processes of RegTech solutions that have been defined earlier: data collection, data processing, data analytics, and data presentation. The identification results and the data entities involved can be seen in Table 3 below. Based on the results of the application identification described in Table 3, the following are the general requirements that must be met by the generic application architecture of the RegTech solution to be designed.
• Supports automatic data collection from certain sources in real-time and batch. • Supports automated data processing for real-time and batch regulatory and compliance analytics.
• Supports automating regulatory and compliance analytics processes in real-time and batch.
• Provides data analytics functionality using AI/ML. • Provides functionality for AI/ML model development.
• Provides functions to share or exchange data.
• Provides functions for data storage and management.
• Support the process of making compliance reports and submitting compliance reports automatically to regulators/supervisors. • Provides functions to present data on regulatory and compliance analytics results such as alerts, realtime monitoring, dashboards, and reporting. Furthermore, the generic application architecture modelling of the RegTech solution is carried out based on the results of the needs analysis identified in the previous stage. The architectural modelling is done by grouping applications based on groups according to the main function requirements (Figure 7) and diagrams that explain the interactions of these various applications (Figure 8).  Tables 4 and 5 show the results of identifying basic and new IT infrastructure catalogues needed to implement the business architecture, data architecture, and application architecture in RegTech generic solutions. The network allows various application components at RegTech to communicate with each other, such as access to compliance analysis results in data stored in storage via a server for visualization or reporting. The network allows data exchange between devices, such as sending compliance reports to regulators/supervisors.

Compute and Storage
Compute devices are used to process regulatory and compliance analytics from data stored in storage, including regulatory and organizational operational data. In addition, computing devices can be used for other services supporting RegTech, such as web servers. This component must have good scalability to handle growing or increasing workloads.

Container
This virtualization technology can package RegTech applications and all the dependencies they need in one remote unit so that these applications can run in various environments without different configurations.

Data Center
Physical facilities are used to store applications and data. The main components of a data centre design include various networks, compute and storage devices. This facility must have a good and tested disaster recovery plan (DRP) and strategy before a disaster or disturbance occurs.
Based on the results of the analysis in Table 4, it is concluded that four main components are needed in the basic IT infrastructure that supports the implementation of data and application architectures in RegTech generic solutions, namely networks, compute and storage that has good scalability, container, and data center with disaster recovery plan (DRP).

Big Data Analytics
Big Data Analytics is the main infrastructure that RegTech must adopt. This technology can process and analyse very large and complex organisational operational data into information related to regulatory compliance, such as real-time compliance monitoring and reporting. This technology is used for every RegTech business process, namely data collection, data processing, data analytics, and data presentation. Various applications or big data platforms are components of RegTech applications in supporting business processes and data management. AI/ML AI/ML supports advanced analytics in big data such as diagnostic, predictive, and prescriptive related to regulatory compliance. This technology finds patterns related to data that do not comply with regulations, makes predictions, and makes decisions regarding the data. These technologies, such as NLP and text mining, can also extract information from regulatory documents into machine-readable rules.
RPA RPA helps automate repetitive tasks such as generating and submitting compliance reports to regulators/supervisors.

Edge Computing
This technology can reduce the amount of data traffic on the network. Collecting and processing data on RegTech solutions can be done close to the data source or on edge computing devices if the data volume is still very large. Furthermore, data processing results are only forwarded to the main compute and storage devices for analysis so that data traffic to these main devices is manageable and bandwidth can be used optimally.

API
The API provides a communication channel allowing applications or systems to communicate and share data. In RegTech solutions, this technology provides data or information on the results of these analytics for visualization systems or applications such as real-time compliance monitoring and reporting.

Cloud Technology
Cloud technology can be used in various forms of services, such as infrastructure as a service (IaaS), platform as a service (PaaS), and software as a service (Saas). RegTech can run on top of this technology according to service needs, including application components, data, and various other supporting technologies that support the business processes of RegTech solutions.
Based on the analysis results in Table 5, it is concluded that six main components are needed in the new IT infrastructure that supports data and application architectures, namely big data analytics, AI/ML, RPA, edge computing, API, and cloud technology. Furthermore, infrastructure architecture modelling is carried out that supports the implementation of the data architecture and generic application architecture RegTech solutions that have been designed previously. Figure 9 shows the generic infrastructure architecture model for RegTech solutions at a high level by adopting a layer three-tier architecture and adding a basic layer as the basic architecture. The following describes the layers in the RegTech generic infrastructure architecture solution ( Figure 9).
1. The base layer provides the basic IT infrastructure, including software and hardware, that supports the data, application, and presentation layers. This layer includes network, compute and storage, and data center. 2. The data layer stores all the data required by a RegTech solution, such as regulatory data, organizational operations, and various analytical results, including regulatory and compliance analytics. The data will later be processed at the application layer and forwarded to the presentation layer to be displayed to the user. This layer uses data lakehouse technology to store various data for further use as needed, such as regulatory and compliance analytics. 3. All the RegTech logic functionality resides in the application layer. RegTech implements a big data analytics architecture consisting of four stages: data ingestion, data processing, data analytics, and data presentation. All these stages correspond to the generic business processes of the RegTech solution. Edge computing can be applied if the volume of data processed is very large by carrying out the collection and processing stages close to the data source so that the data sent for the analytical process in the main data center is not too large. RegTech uses an AI/ML model for compliance and regulatory analytics. In addition, RPA also supports the process of automatically creating and sending regulatory reporting to regulators/supervisors. RegTech uses API technology for data exchange, such as sending data from analytics results to the presentation layer. 4. The presentation layer provides the end-user interface for interacting with RegTech. The compliance unit can see various compliance analysis results and notifications of regulatory changes in real-time through the visualization tool on this layer. Data scientists can also interact with various data to create AI/ML models using the data scientist tools at this layer. The API provides data processed by various logical functions in the application layer and sends it to the presentation layer. Various devices can access the RegTech interface, such as desktops, laptops, and mobiles. Cloud technology also provides internet-based services, such as data storage, applications, and basic IT infrastructure, that can be adopted for implementing RegTech solutions. Cloud services can be implemented for the base, data, and application layers. Cloud services are managed by certain providers so that users only receive various services according to their needs.

Security Architecture
Based on the generic infrastructure architecture of the RegTech solution in Figure 9, security includes four layers: basic, data, application, and presentation. Security requirements at this layer can be seen in Table  6 below.

Data Center Security
If we maintain our own data center, the physical security of the data center is very important. Data center access from unauthorized parties must be blocked. Logical access to the server must be secured by network security which can be done by configuring the appropriate firewall. So, data center security depends on network security.

Network Security
Network security helps prevent overall IT resources supporting implementations of RegTech solutions from being open to external users. Network security helps prevent unauthorized system access, host vulnerabilities, and port scanning. Intrusion Detection System (IDS) and Intrusion Prevention System (IPS) can support network security using firewalls.

Data Security
Data is a core component of the RegTech solution because everything is sourced from data, processed, and analyzed to produce various information related to regulatory compliance. Here are some solutions for data security: • Data Classification This solution is practiced by classifying data based on the sensitivity so that it is easy to plan data protection, data encryption, and data access requirements.
• Data Encryption Practice this solution by converting data from plaintext to ciphertext format, which is encoded using the encryption key so that only authorized users will have access to the decryption key.
• Data Encryption at rest and in transit Every layer where data is located or temporarily resides must be secured, such as data exchange and when data is in the database. When data is in transit (data-in-transit), it can be secured using Secure Socket Layer/Transport Layer Security (SSL/TLS) and (a security certificate). Meanwhile, when data is not active or is in the database, it can be secured by encryption.

User Access Security
RegTech is accessible to users according to their respective roles in the regulated organization or entity. Therefore, it is very important to manage this access so that it does not interfere with RegTech's operations. Solutions that can be applied are authentication and authorization. Authentication means determining who can access the system, and authorization is applied to the user's activities after logging into the system or application. The solution that can be used for this security is user access via SSO (Single Sign-On) or active directory.
Based on the results of the analysis of security needs that have been identified previously, then the generic security architecture modeling of RegTech solutions is carried out at a high level. The architectural modeling can be seen in Figure 10. The security architecture includes four layers, namely basic, application, data, and presentation. The security at the base layer consists of data center and network security. Data center security depends on network security. Security at the base layer supports data security. Data security can be addressed with data classification, data encryption, and data encryption solutions at rest and in transit. Next is security at the application layer using a Web Application Firewall (WAF) solution. Lastly is security at the presentation layer, covering user access security with authentication and authorization solutions and endpoint security with antivirus, anti-malware, and firewall solutions.

Evaluation Results
At this stage, generic RegTech solution architecture artefacts generated in the scenario illustration described in section 2.3 will be implemented. The application of these artefacts includes business architecture, data, applications, infrastructure, and security. The modelling results can be seen in Figures 11 to 14.   The security architecture can adopt the generic security architecture of the RegTech solution by considering the security of the data center, network, data, applications, endpoints, and users. Various solutions for each security can be seen in Figure 10, previously related to the generic security architecture diagram of RegTech solutions. Based on Figures 11 to 14, the artefacts from this research can be applied to logical scenarios outside of finance. Next, a time analysis is performed using the Bizagi Modeler application. Business processes from scenarios before (As-Is) and after (To-Be) implementing RegTech were modelled with BPMN (Business Process Model and Notation). The business process simulation results can be seen in Figures 15 to  22 below. The results of evaluating business processes using time analysis with the Bizagi Modeler can be seen in Table 7.  Based on the table above, regulatory compliance business processes by implementing RegTech (To-Be) have an excellent time efficiency value compared to the old method (As-Is). The RegTech solution has the potential to provide a time efficiency of 95.16%. Regulatory interpretation and compliance analytics using AI/ML significantly impact RegTech implementation to achieve regulatory compliance more efficiently.

CONCLUSION
This research has produced a proposed RegTech Solution Architecture design by adopting the layer division model and EA artifacts referring to the TOGAF framework, including business, data, application, infrastructure, and information security architectures. The evaluation results by applying case studies to industries other than finance show that the proposed RegTech Solution Architecture is generic. Furthermore, an evaluation of the comparison of regulatory compliance business processes without and by implementing RegTech can produce a time efficiency of 95.16%. These results show that RegTech solutions can achieve regulatory compliance more efficiently.
The results of the RegTech generic solution architecture design can be applied as a case study in subsequent research for the development of RegTech in more specific regulated entities, especially other than financial entities or industries. This architectural proposal includes business, data, application, infrastructure, and security architectures that are still conceptual or high-level design, so when implemented in more specific industries or certain regulated entities, it is advisable to decompose business processes, data, applications, infrastructure, and security in more detail according to the needs of the industry or regulated entity. The various technological solutions defined in this architectural proposal can still change along with the increasingly rapid development of technological trends.