BUSINESS INTELLIGENCE A2
Big Data Analytics and Data Warehousing
Big data analytics refers to the process of examining a large set of data “big data” in order to identify information that is not clearly visible. In this case, data analysts may be interested in identifying any correlations, market trends, customers’ preferences or any other useful information. A data warehouse is a simply a database used by firms to capture all its transactions and customer data in a single location for analysis. (Keneth and Traver, 2016, p.431). With the increase in online activities and the importance of online trade, big data analytics is paramount for any business that wants to have detailed and elaborate information on their clients’ behavior and its impact on their business (Yue, 2017, pp. 232). Through big data analytics, businesses can extract important information that is stored in data warehouses, which can enable them to formulate sound strategies.
Incorporation of Big Data Analytics in Data Warehouse Architecture
With regards to big data, it entails more than simply large volumes of data. Rather, it entails the variety, velocity, veracity, value, and the variability of data besides its volume. There big data analytics considers these factors when analyzing the data that is stored in warehouses. Therefore, data in data warehouses must be effectively stored in a manner that is both effective and efficient for analysis (Jian-bo, & Chong-jon, 2012, pp. 109). With regards to data warehouse architecture, a business must consider the time, cost, development difficulty, size, pre-requisite data sharing, and sources of the data during its development (Anand & Gunadhi, 1996, pp. 380). In addition, a person must also consider its usage level, operating system, and technology to be used. When deciding the data warehouse architecture, developers consider the database management system (DBMS) that will be used, the processing method for the system, the data migration tools used in loading the data and also those used in retrieving and analyzing data. To handle the large volumes of structured and unstructured data, in “big data,” organizations use Hadoop (Keneth and Traver, 2016, p. 435). It is an open source software work from Apache Software Foundation. Hadoop facilitates a parallel processing of huge volumes of data by breaking it into sub-problems and processing it into smaller data set that is easy to analyze (Keneth and Traver, 2016, p. 436).
Data Warehouse Architecture for a Water Utility
Figure 1.1 Big Data Analytics and Data Warehouse Combined.
ETL & ODS: Extraction, Transformation, Loading & Operational Data Source
CMMS: Computerized Maintenance Management System
CMS: Condition Monitoring System
EAM: Enterprise Asset Management
ERP: Enterprise Resource Planning
External: Outside Organizational Data
GIS: Geographic Information System
Intranet: Inter-enterprise document
SCADA: Supervisory Control & Data Acquisition
Description of the Warehouse Architecture
There are various data warehouse structures that an organization can use depending on the data and the information it is processing. On the overall, the type of warehouse architecture used by a business depends on its suitability in enabling it to extract relevant information. (Madlberger & Matook, 2017, pp. 1429) For a small enterprise, a data mart may be suitable in enabling it to store all the information it needs (Gamst & Lawrence, 2015, pp. 118). Others may have operational data stores (ODS). This system has used an operational data store (ODS) for data warehousing. Generally, the data warehouse system has the following layers:
- Data Source Layer
- Data Extraction Layer
- Staging Area
- ETL Layer
- Data Storage Layer
- Data Logic Layer
- Data Presentation Layer
- Metadata Layer
- System Operations Layer (1keydata.com, 2017; Singh & Singh, 2012, pp. 229)
Data source layer is derived from the extraction of data from various departments in the water company. External sources, such as customers and suppliers also contribute in this layer. The data extraction layer entails cleaning of the data so that is it appropriate for storage. In the staging layer, the data awaits transfer to the data warehouse. Importantly, this area ensures there is an easy integration of data. In the ETL layer, logical processes are applied that transforms the data from transactional to analytical form (1keydata.com, 2017). Data storage layer provides an area where the cleansed and transformed data stays depending on its functions. Data mart, data warehouse, and operational data store are the entities found in this layer. In data logic layer, there are rules on how the report on the presentation of the transformed data in form of a report. Data presentation layer is simply the final report that is issued to users. The metadata layer simply shows the information of where data is stored in the warehouse. Finally, a system operations layer has information on how the data warehouse operates (1keydata.com, 2017).
In a water utility company, there are various departments that act as important sources of information for the company. These departments are always semi-autonomous in the decisions they make. The proposed structure in figure 1.1 has multiple data marts for each unique department of the company. These data marts are subdivisions of the company’s overall warehouse data (Hu, Wen, Chua, & Li, 2014, pp. 170). For example, data from the finance department is categorized as financials. The data from the human resource department is categorized as resources while the data from marketing department is categorized as relationship. Similarly, data from the logistics department is categorized as Geographic Information System. Data from all these departments enters the company’s main data warehouse from the staging layer. The data follows all the layers involved in a data warehouse system so that the needed information can be easily extracted when needed.
Key Security Privacy and Ethical Concerns for Organizations
With the increase in the use of digitalized systems, there has also been a rise in the need to secure most of the private information stored in these systems. The trend of using digitalized systems has grown in all spheres of life; from social media, online banking, to the use of e-commerce. With such a growth, the risk of hacking and loss of private information is now great more damaging.
There are various security concerns associated with data being stored in a data warehouse.
Access of Sensitive Information: There is always a risk that unauthorized persons may access some sensitive information of the company. Normally, this problem is due to sharing of passwords, which can compromise the entire system. In such a scenario, there might be a compromise on the use of the database warehouse (Albright & Winston, 2015, pp. 118). For example, in a banking system, users have a limit to the information they can access and activities they can conduct. Tellers normally do not have the right to process a loan; however, they can make withdrawals to a customer based on the amount on his/ her account. If there is a breach in such a system, the teller can issue a loan and withdraw the payments for a customer without getting approval from the credit department.
Theft of Sensitive Information. One of the major security concern with regards to data is theft of sensitive information. With the increase in online activities, there is always a risk that a person or a business can lose some private information to hackers. In companies, such a breach may lead to them losing precious resources such as discoveries of new technologies or disclosure of their strategies to competitors.
Loss of Control Over Personal Data. The careless disclosure of personal information in social media or to unverified websites can lead to its loss to hackers and thieves. Generally, these individuals can use this information to disguise their identity. Accordingly, the real owners of this information can lose control on how this information is used (Chen, Chiang, & Storey, 2012, pp. 1180). In some cases, the hackers can make fake credit cards and use them to make withdrawals from their victims’ accounts.
Breakdown of Cloud Computing. Recently, there has been an increase in the use of crowd computing due to its affordability and ease in enabling individuals to access their information wherever they are. Unfortunately, these systems are also prone to failures and breakdowns just like all machines. The disconnection from important data, even for a few minutes, in some institutions, such as banking and networking companies, can lead to major losses. For example, in 2011, Amazon and Microsoft suffered cloud outages due to power failure (CRN, 2011).
Ethical and Social Challenges
The following are some of the ethical and social challenges with regards to data warehousing.
Surveillance. One of the major concerns in the collection of data and its use is surveillance. Big Data has been common in its use by security agents and law enforcement agencies to survey criminals. Similar companies may use this information to determine the type of advertisements to show or the products to sell. However, such excessive surveillance may infringe on people’s privacy (Gosain & Arora, 2015, pp. 154).
Information Asymmetry. One of the main challenges with data analytics is that it can lead to concentration of information among few people who are privileged to access this data (Wasserman, 2013, pp. 10). The excessive concentration of information among few people may lead to concentration of power among a few individuals (Kimball, & Ross, 2011, pp. 121). For example, if through the use of data analytics a trader can discover he/ she is the only supplier with reserves of a certain essential commodity, he/ she may hike the price and customers may lose their negotiating power.
Albright, Christian & Winston, Wayne, 2015, Business analytics: Data analysis and decision making, Cengage, New York, NY.
Anand, V.J. & Gunadhi Himawa, 1996, Data warehouse architecture for DSS applications, Australasian Journal of Information Systems, 4(1), pp. 375-386.
Chen, H., Chiang, R.H. & Storey, V.C., 2012. Business intelligence and analytics: From big data to the big impact. MIS quarterly, 36(4), pp.1165-1188.
CRN, 2011, Cloud outages: Cloud services downtime and the lasting impact. Available from http://www.crn.com/news/cloud/index/cloud-outages-cloud-services-downtime.htm?itc=refresh
Gamst Glenn, & Lawrence Guarino, 2015, Performing data analysis using IBM SPSS. John Wiley, Hoboken, NJ.
Gosain, Anjan & Arora Amar, 2015, Security issues in data warehouse: A systematic review, Procedia Computer Science, 48(1), pp. 149-157.
Hu, H., Wen, Y., Chua, T.S., & Li, X., 2014. Toward scalable systems for big data analytics: A technology tutorial. IEEE Access, 2, pp.652-687.
Jian-bo, Wang & Chong-jun, Fan, 2012, Research on Airport Data Warehouse Architecture, International Journal of Business, Humanities and Technology, 2(4), pp. 107-111.
Keneth, Laudon & Traver Carol, 2016, E-commerce 2016: Business, technology, society, global edition. Pearson, New York, NY.
Kimball, R. and Ross, M., 2011. The data warehouse toolkit: the complete guide to dimensional modeling. John Wiley & Sons.
Madlberger, Maria, & Matook, Sabine, 2017, Theorizing e-commerce business models: On the impact of partially and fully supported transaction phases on customer satisfaction and loyalty, Australasian Journal of Information Systems, 21(1), pp. 1426-1437.
Singh, Jagir & Singh, Greeshma, 2012, Data warehousing. Business Intelligence Journal, 5(2), pp. 224-235.
Wasserman, Rachel, 2013, Ethical issues and guidelines for conducting data analysis in psychological research, Ethics & Behavior, 23(1), pp. 3-15.
Yue, Huang, 2017, Clustering multi-typed objects in extended star-structured heterogeneous data, Intelligent Data Analysis, 21(2), pp. 225-241.
1keydata.com, 2017, Data warehouse architecture. Available from http://www.1keydata.com/datawarehousing/data-warehouse-architecture.html