The development cycle of a
typical data warehouse or data mart includes various stages such as data
requirement analysis, modeling, extraction, transformation, loading, report development
and testing.
This cycle typically involves a
significant number of challenges such as extraction from different data
sources, missing data, etc., which results in significant loss of time,
developer productivity and other resources. These challenges are well documented
in popular books, article and on websites such as rkimball.com and billinmon.com.
In addition to these challenges,
the development team is left with having to make decisions regarding the purchasing
of a variety of tools such as query tools, distribution tools, process
automation and scheduling tools, ETL tools, etc. from the ever-increasing pool
of softaware vendors.
Prototyping is one of the
best-suggested approaches in mitigating the adverse effects of such unforeseen
problems and in facilitating decision-making.
In this series of articles we
shall restrict our focus to prototyping the ETL stage of the DW life cycle since
data warehouse legend states
that on average, 80% of the efforts of building a data warehouse go into the
ETL related tasks.
In an Oracle DW environment, the
ETL stage can be implemented using the “home grown” approach (PL/SQL, UNIX,
etc.), by third party ETL tools (Informatica, Ab Initio, etc.) or by using
Oracle Warehouse Builder (OWB).
In the series of articles that
follow, we will discuss the advantages of using OWB over the home grown
approach and third party ETL tools. We will develop a prototype ETL process
using OWB and discuss the many features of OWB that have been added over
different versions of OWB.
Oracle launched the first
production version of OWB 2.0.4.7 sometime during the year 2000, followed by
OWB 2.1.1 and OWB 3i versions in year 2001 and OWB 9i releases in Year
2002/2003. In addition, Oracle has added many new features to OWB over the
versions listed above and is continuously adding more and more enhancements in
line with industry requirements. As such, these enhancements make OWB an
appropriate ETL tool in an Oracle data warehouse environment. As we go forward
with developing the prototype, we will explore how OWB provides leverage in a
typical ETL process and how it uses the database Oracle9i ETL toolkit
functionality such as External tables, MERGE statement, etc.
Why OWB?
In a typical Oracle DW environment,
it is not uncommon to have all the processes developed using PL/SQL and UNIX
and managed by the DBA’s and developers. To keep this discussion brief, it can
be stated in a few words that possibly the most important advantage in using an
ETL tool is the automation and management of the complex ETL process(es),
especially in a large environment which leads to faster development cycles and
the ability to produce prototypes in the least possible time.
Some of the factors that
possibly favor using OWB in an Oracle DW environment are:
-
Most
of the development team is familiar with PL/SQL (or Java) and UNIX (this gives
the ability for developers to create user defined transformations in PL/SQL or
Java, in addition to the more than 150 preset transformations provided by OWB) -
OWB’s
tight integration with Oracle 9i server and its ability to optimally utilize
the ETL toolkit functionality that Oracle 9i provides like “Table functions”,
“external tables”, “Multi-table inserts”, Partition Exchange loading etc. -
Seamless
integration with other Oracle BI tools such as Discoverer, Oracle Designer,
Oracle Work Flow, and OEM. -
And
ofcourse, its ability to extract data from disparate source systems such as SAP R/3, DB2 etc. using integrators, transparent
gateways etc. -
Ability to generate code (PL/SQL, ABAP).
For
delving more into OWB features go to Oracle
OWB data sheet.
In
the case of choosing OWB over other ETL tools such as Informatica, the primary
advantage, in addition to the above features, is the option to use the powerful
features of Oracle Application Server (OracleiAS) for publishing web based meta
data reports, integrating with Oracle Work flow Server, etc.
On
the other hand, other factors might possibly influnce the choice of not
using OWB. However, this usually happens in the case of a non-Oracle
target environment, or when multiple non-Oracle source systems and complex
integration and transformations are involved.
In most cases though, if the development team is comprised of Oracle
skills, OWB would be the most appropriate ETL tool.
OWB components
Some of the key OWB components
are:
-
OWB
repository that stores Meta definitions used in the DW and the
transformation library, which contains more than 150, preset transformations.
-
OWB
client that is used to design the DW and the ETL processes. This component
also includes bridges, integrators, and editors and acts as a point of access
to other OWB components.
-
OWB
design browser client to view metadata, run web reports, perform lineage,
and impact analysis on your metadata.
-
OWB
runtime audit browser client where you run reports on the audit and error
information captured when you run jobs to load and refresh your ETL processes.
-
OWB
runtime assistant that helps you to set up a runtime repository and target
schema and maintain the deployment and runtime audit/error information.
Some of the other components that will be discussed
in later articles along with the above are the repository and browser
assistants, file upgrade utility, etc.
Conclusion
In my forth-coming articles, we
will briefly discuss the OWB architecture, followed by a prototype developed
using OWB, some of the most commonly applicable transformations and the various
features of OWB that can be utilized in a typical ETL scenario.