You learned in A Vision for the Agile Data Method that agile methodologies such as eXtreme Programming (XP) (Beck 2000) and DSDM (Stapleton 2003) take an iterative and incremental approach to software development. Application developers on XP and DSDM projects typically forsake big design up front (BDUF) approaches in favor of emergent approaches where the design of a system evolves throughout the life of the project. On an agile development project the final design often isn’t known until the application is ready to be released. This is a very different way to work for many experienced IT professionals to work.
The implication is that the traditional approach of creating a (nearly) complete set of logical and physical data models up front isn’t going to work. The main advantage of the traditional approach is that it makes the job of the process database administrator (DBA) much easier – the data schema is put into place early and that’s what people use. However there are several disadvantages. First, it requires the designers to get it right early, forcing you to identify most requirements even earlier in the project, and therefore forcing your project team into taking a serial approach to development. Second, it doesn’t support change easily. As your project progresses your project stakeholders understanding of what they need will evolve, motivating them to evolve their requirements. The business environment will also change during your project, once again motivating your stakeholders to evolve their requirements. In short the traditional way of working simply doesn’t work well in an agile environment. If Agile DBAs are going to work on and support project teams that are following agile methodologies they need to find techniques that support working iteratively and incrementally. My experience is that one critical technique is database refactoring.
Table of Contents
1.
Refactoring
2.
Database Refactoring
*
Why Database Refactoring?
*
Preserving Semantics
*
What Database Refactorings Aren’t
3.
Why Database Refactoring is Hard
4.
How to Refactor Your Database
*
Step 1: Start in Your Development Sandbox
*
Step 2: Implement In Your Integration Sandbox(es)
*
Step 3: Install Into Production
5.
Adopting Database Refactoring Within Your Organization
6.
Database Refactoring Best Practices
7.
Database Refactoring in the Real World
8.
The Catalog of Database Refactorings (posted as another page)
1. Refactoring
Martin Fowler (1999) describes a programming technique called refactoring, a disciplined way to restructure code. The basic idea is that you make small changes to your code to improve your design, making it easier to understand and to modify. Refactoring enables you to evolve your code slowly over time, to take an iterative and incremental approach to programming. Martin’s refactoring site, www.refactoring.com, is a good online resource.
A critical aspect of a refactoring is that it retains the behavioral semantics of your code, at least from a black box point of view. For example there is a very simple refactoring called Rename Method, perhaps from getPersons() to getPeople(). Although this change looks easy on the surface you need to do more than just make this single change, you must also change every single invocation of this operation throughout all of your application code to invoke the new name. Once you’ve made these changes then you can say you’ve truly refactored your code because it still works again as before.
It is important to understand that you do not add functionality when you are refactoring. When you refactor you improve existing code, when you add functionality you are adding new code. Yes, you may need to refactor your existing code before you can add new functionality. Yes, you may discover later on that you need to refactor the new code that you just added. The point to be made is that refactoring and adding new functionality are two different but complementary tasks.
2. Database Refactoring
In the February 2002 issue of Software Development I described a technique that I called data refactoring. This article described my preliminary experiences at something that should more appropriately have been called database refactoring in hindsight. Hence the new name. From this point forward I’ll use the term code refactoring to refer to traditional refactoring as described by Fowler to distinguish it from database refactoring.
Let’s start with some definitions. A database refactoring is a simple change to a database schema that improves its design while retaining both its behavioral and informational semantics. For the sake of this discussion a database schema includes both structural aspects such as table and view definitions as well as functional aspects such as stored procedures and triggers. An interesting thing to note is that a database refactoring is conceptually more difficult than a code refactoring; code refactorings only need to maintain behavioral semantics while database refactorings also must maintain informational semantics.
There is a database refactoring named Split Column, one of many described in A Catalog of Database Refactorings, where you replace a single table column with two or more other columns. For example you are working on the Person table in your database and discover that the FirstDate column is being used for two distinct purposes – when the person is a customer this column stores their birth date and when the person is an employee it stores their hire date. Your application now needs to support people who can be both a customer and an employee so you’ve got a problem. Before you can implement this new requirement you need to fix your database schema by replacing the FirstDate column with BirthDate and HireDate columns. To maintain the behavioral semantics of your database schema you need to update all source code that accesses the FirstDate column to now work with the two new columns. To maintain the informational semantics you will need to write a migration script that loops through the table, determines the type, then copies the existing date into the appropriate column. Although this sounds easy, and sometimes it is, my experience is that database refactoring is incredibly difficult in practice when cultural issues are taken into account (more on this later).
2.1 Why Database Refactoring?
There are two fundamental reasons why you should be interested in database refactoring:
1.
To safely fix existing legacy databases. The bottom line is that legacy databases aren’t going to fix themselves, and that from a technical point of view database refactoring is a safe, simple way to improve data, and database, quality over time. My July 2006 survey into the current state of data management shows that 33% of organizations either are, or intend to take, an refactoring-based approach to addressing existing production data quality problems.
2.
To support evolutionary development. Modern software development processes, such as RUP, XP, and AUP, all work in an evolutionary if not agile manner. Data professionals need to adopt techniques, including this one, which enable them to work in such a manner.
2.2 Preserving Semantics
Informational semantics refers to the meaning of the information within the database from the point of view of the users of that information. To preserve the informational semantics implies that when you change the values of the data stored in a column the clients of that information shouldn’t be affected by the improvement. Similarly, with respect to behavioral semantics the goal is to keep the black box functionality the same – any source code that works with the changed aspects of your database schema must be reworked to accomplish the same functionality as before.
2.3 What Database Refactorings Aren’t
A small transformation to your schema to extend it, such as the addition of a new column or table, is not a database refactoring because the change extends your design. A large number of small changes simultaneously applied to your database schema, such as the renaming of ten columns, would not be considered a database refactoring because this isn’t a single, small change. Database refactorings are small changes to your database schema that improve its design while preserving the behavioral and informational semantics. That’s it. I have no doubt that you can make those changes to your schema, and you may even follow a similar process, but they’re not database refactorings.
3. Why Database Refactoring is Hard
Coupling. As you learned in Relational Databases 101 coupling is a measure of the degree of dependence between two items – the more highly coupled two things are the greater the chance that a change in one will require a change in another. Coupling is the “root of all evil” when in comes to database refactoring, the more things that your database schema is coupled to the harder it is to refactor. Unfortunately you learned in Relational Databases 101 that relational database schemas are potentially coupled to a wide variety of things:
*
Your application source code
*
Other application source code
*
Data load source code
*
Data extract source code
*
Persistence frameworks/layers
*
Your database schema
*
Data migration scripts
*
Test code
*
Documentation
Related posts:
- Organizing A Process Database ClaimsI claim: 1. A system that comprises a processor, a...
- Virtual Private Database in Oracle Enterprise 11g Oracle Enterprise Database11g has the Virtual Private Database feature to...
- Importing External Access Database Tables Using Visual Basic by Nicholas Brown Sometimes a Microsoft Access database user may find it desirable...
- Using Php to Populate a Drop Down List Box From a Mysql Database Table Quite often when you are developing web sites or applications...
- Process Database Entries BACKGROUND OF THE INVENTION The present invention relates to a...
Related posts brought to you by Yet Another Related Posts Plugin.
Leave a Reply