ETL processes and ETL tools are essential to systems that help executives and senior managers make informed decisions. For more than two decades, these tools and processes have been responsible for extracting data from applications and other source systems, transforming disparate data for better integration, and ultimately loading the clean and transformed data into data warehouses to successfully drive the business intelligence solutions used by decision makers.
Lately, however, these tools and processes have begun to face new challenges; challenges that threaten their very existence. Interestingly, these threats also hold great potential for providing valuable business insights. In this post, we’ll take a look at two of the biggest challenges faced by ETL processes and tools and try to understand the factors that are introducing them into today’s business environment.
1. Emergence of new data sources
Today, data is no longer only produced by the usual suspects. Technological advancements have created new sources of data. These include: social media sites, web logs, sensors, instant messaging systems, and mobile devices (e.g. smartphones and tablets), to mention a few.
Data coming from these sources is produced at an exponential rate. If collected, it can quickly fill up traditional storage systems in a relatively short period of time. In addition, a large majority of this data is multi-structured and even unstructured, making it unsuitable for the usual relational database systems.
This new breed of data is collectively known as big data. Incidentally, big data is believed to be valuable and can potentially provide business insights such as customer buying behaviors, marketing campaign success metrics, and system efficiency metrics, among many others. Hence, big data cannot be ignored.
Unfortunately, like data warehouses, traditional ETL systems aren’t built to handle data with large magnitudes of volume, velocity, and variety.
2. Emergence of new data platforms
Because relational database systems and data warehouses aren’t designed to efficiently handle big data and because big data is too valuable to ignore, new platforms have emerged to address these issues. Two of these platforms are Hadoop and NoSQL systems.
These platforms can handle unstructured data with relative ease. Hadoop even allows businesses to store and process big data in clusters of commodity hardware. These clusters are highly scalable, so you can start small. Thus, not only are these systems capable of handling big data, they can also do so in a very cost-effective manner.
The arrival of Hadoop has led certain sectors to predict the impending departure of ETL systems. This kind of prediction may be too rash. Companies aren’t just going to dump a tried-and-tested legacy system especially if they had to invest a fortune for the needed infrastructure and manpower skills. Besides, Hadoop is only good for certain tasks like batch-processing, so it isn’t exactly the best solution for other tasks like low-latency and interactive reports, which can be handled quite well by data warehouses and ETLs.
At this point, Hadoop and data warehouse systems need to co-exist. But, in order to continue providing optimal value to business, ETL tools have to be capable of handling big data and connecting a range of analytical platforms, including data warehouses and Hadoop systems.