This is a nuts-and-bolts discussion of some potential issues with nulls.
While BOL has extensive documentation, it doesn’t give you a concise list of
"gotchas." These are some of the things you should watch for when
working with existing objects.
Watch your DML, particularly with inequality
Either provide for nullability in every statement where it is a potential
issue or have a look at the database to see if you should be addressing it. For
instance, if you have a table with 900,000 rows, 899,990 of which already have
the desired value, you might use the following statement:
update t
set col3 = 'Y'
where col3 <> 'Y'
or col3 is null
If col3 isn’t nullable, then the last line is superfluous. If it is nullable,
you’d best include is null
or risk not changing a row or two with a
null in that column. Based on your data, you may be justified omitting a search
condition and changing every row in the table.
There are many ways to determine if a field is nullable: the Enterprise
Manager, sp_columns, information_schema.columns, or go fishing in syscolumns.
Programmatically, you can use ADO’s adFldIsNullable
attribute.
Variables
Variables can be a major problem because they’re often used in multiple
expressions. They may evaluate to null; i.e., during division by zero, or they
may inadvertently never be set.
My employer’s senior management commissioned an Income Statement that
allocates very specific corporate expenses into several components of Cost of
Sales. I keep the sum of these expenses in a variable and use it in successive
queries. It initially looked like this:
Click here for code example 1.
This worked well until our Finance people started running it on just part of
the company. Since there were no corporate expenses, @nonGA was null and so was
every expression that used it. Fortunately, the effect, while disconcerting,
wasn’t subtle and I quickly fixed it.
Click here for code example 2.
ISNULL
replaces a null with a specified value. In this case,
zero. Be careful that ISNULL
does not hide problems with your
queries. They may be returning null because of logic or typing errors. Speaking
of typing errors, did you notice the distinction between the ISNULL
function and the IS NULL
predicate?
Consider the functions available to you
Beside ISNULL
, you have NULLIF
, COALESCE
,
and CASE
at your disposal. CASE
is a
jack-of-all-trades. The others are more specialized.
Suppose Stephen Wynkoop is feeling generous and wants to see the effect of
doubling his columnists’ hits on a specific date. Further, everyone, even
columnists with no hits that day, gets at least 10.
He’ll probably left join his columnist table with his hit table. Because many
rows will return null hits, he can’t just multiply them by two. Same for the
columnists with one to four hits. They would get fewer than 10 when doubled.
Here’s what he might do:
select c.full_name, case when h.hits > 5 then h.hits * 2 else 10 end
from columnists c left join hitstats h on h.columnist_id = c.id
and h.hit_date = 'April 1, 2001' -- arbitrary date for illustration
order by 2 desc
Columnists with no hits, or fewer than 5, are swept up by else
and given 10.
COALESCE
returns the first non-null argument given it. It can be
more concise than CASE
.
NULLIF(arg1, arg2)
returns null if its two arguments are equal,
and arg1’s value if not. It is most useful when combined with other functions.
Say a college sends solicitations to its alumni. Part of the solicitation is
a reminder of how much they contributed last time. Their table doesn’t allow
nulls so years without contributions have the value zero.
select alum, coalesce(nullif(contrib_2000,0),nullif(contrib_1999,0),nullif contrib_1998,0),0)
from contributions
If the alumnus’s 2000 contribution is zero, NULLIF
will convert
it to null and COALESCE
will move on to the next year. Otherwise, NULLIF
will return the year’s contribution and COALESCE
will stop there.
The last zero is a catchall for alumni who have not contributed in those years.
If the last zero is omitted, those alumni will return null.
Aggregates
Aggregates (SUM, AVG, MAX, MIN, STDEV, STDEVP, VAR, and VARP)
are straightforward as they ignore null values. So does COUNT
. The
only issue you should ever have is running the function against a column that
has all null values; i.e., one that was just added to the table and not yet
populated. The server will return null.
COUNT(*)
counts all rows, including rows with nulls. The college
can use this distinction to analyze its 2000 contributions. Remember that their
table doesn’t allow nulls.
select count(*) from contributions -- count of all alumni in the table
select count(nullif(contrib_2000,0)) from contributions -- count of alumni who contributed in 2000
select avg(contrib_2000) from contributions -- average contribution from all alumni
select avg(nullif(contrib_2000,0)) from contributions -- average from alumni who contributed
SELECT notes
- For
GROUP BY
andDISTINCT
, null values are
considered equal. - For
ORDER BY
, nulls are considered the lowest possible value;
they are at the beginning of an ascending list. - ANSI compatibility has considerable effects on the results of evaluating
nulls. For instance, whether "= null" will work. See below.
ANSI compatibility
I have to refer you to your version’s (or versions’) BOL for the gory
details. Some things to keep in mind are:
- Your server’s version. 6.5 and below behave differently than later
versions. - The database’s compatibility level (sp_dbcmptlevel).
- The database options (sp_dboption).
- How you connect to the server. ODBC and OLE DB give different default ANSI
settings than DB-Library. - The settings can be changed session-by-session; they’re not necessarily
uniform or under the control of the dbo or sa. - Depending on the server version, the behavior of stored procedures can
depend upon either what the settings were when the procedure was compiled or
what they are when the procedure is run.