
System integration: ETL Tools Development and Processes Implementation
Extract Transform and Load Tools play a key role in integrating computer systems. The process of extracting data from system A to load it into an Oracle database usually follows the same steps:
- Determine what data needs to be loaded and from what system it will be coming from as well as a process to follow to load the data into the Oracle tables (How to validate the "candidate data" is coherent? How to "clean up" the "candidate data"? What needs to be loaded first? What missing data should be reported? What defines a successful load?)
- Define a file format to use to transfer the data (will the data be transmitted in a flat file or an XML file? If it is a flat file, will it be following a fixed size convention or use a separator between the fields? Will we use ASCII or Unicode?)
- Write the program(s) that extract the data regularly in the source systems.
- On the Oracle side, define the structure of the staging tables where the raw data will be loaded in bulk.
- Write SQL Loader control files to drive the load of the files generated by the source system.
- Write scripts (for instance UNIX shell scripts) to automate the load of those files into the staging table of the receiving Oracle database.
- Develop PL/SQL procedures to implement the actual clean up of the data to load and to load the production tables.
The actual load is usually constrained by a time window that the business users define. This creates a deadline when the load process must be complete. An ETL process can potentially have to load millions of records within a constrained time frame. Writing high performance code is essential to the success of an ETL project.
Elevata has extensive experience implementing custom built ETL processes with Oracle. Oracle SQL Loader
SQL Loader is a powerful tool that is designed to load data directly into Oracle tables. Its impressive performance makes it suitable to load massive amounts of data from a "flat file" that follows a known pattern. The load is driven by a "control file" that defines the structure of the file (either fixed length or using a separator between fields) and the mapping to table fields. SQL Loader also allows calls to PL/SQL functions thus enabling very sophisticated checks to be implemented during the load.
Elevata Incorporated has used SQL Loader successfully to integrate external systems with an Oracle database.
Load automationWe also have developed our own set of tools that work in conjunction with SQL Loader to automate the process of updating an Oracle database regularly, as well as our own "Control file generator" that simplifies greatly the definition of "Control files".
Combining our experience and the tools we developed over the years allows us to rapidly deliver a set of fast working scripts that are specifically tailored for your environment.
Performance
When the volume of data is large, Elevata Incorporated recommends using sophisticated techniques to improve the throughput to meet the time constraints. Elevata Incorporated has successfully used parallelization techniques to speed up the load.
Data scrubbing and Data validation
In addition to SQL Loader scripts, we can develop and test PL/SQL packages that will load your database from the staging area (where SQL Loader will load the raw data) to your production tables.
As part of the load, we usually recommend adding a "data scrubbing" step that will enforce business rules that data must meet in order to be allowed to be loaded. What happens to the data that doesn't meet the rules is a corporate choice. It can simply be discarded, we can alert someone automatically when exceptions are found (via an automated email for instance) or we can generate reports that can then be used to "fix" the data source.
Elevata Incorporated has extensive experience designing and implementing a data scrubbing step in the general data load process and can help you make sure that your database will not get corrupted because of data that doesn't follow your business rules.
|