# Miscellaneous > Database Design >  What problem is First Normal Form intended to solve?

## Marc Rochkind

In most database design books and articles, a department table (e.g.) that has columns named employee1, employee2, employee3, etc., is said to not be in 1NF.

However, if you read statements of what 1NF is (e.g., Wikipedia article, Chris Date's paper titled "What First Normal Form Really Means," Codd's articles), you will find that 1NF is concerned ONLY with a single column. Some formulations also say that one must not repeat an attribute, but that's impossible in the relational model and in all of the RDBMSes I'm familiar with.

In fact, Chris Date goes so far as to say that no relational database can have a relation that violates 1NF. It is like the sign on an elevator that says: "Out of order. Do not use."

I have come up with an alternative rule that specifically addresses the employee1, employee2, etc., problem. (What it says isn't relevant to this post.)

My question is this: Is my rule, assuming I've put it together well, an alternative to 1NF as the latter is usually expressed? In order to answer this question, I would need to know what problem 1NF was intended to solve. However, in reading the literature extensively, I have never seen this explained, except that in the Date paper I mentioned above he says that the problem it is intended to solve is the system not being relational. That's not what I was looking for.

I would think that if Codd, Date, etc., were concerned about the employee1, employee2, etc., problem, they would have said so. I suspect they were not, because such a table, however awkward, causes no consistency problems at all, and therefore probably is perfectly OK as far as relational database theory is concerned. (Its flaws are that the table definition needs to be altered whenever another employee is needed, and that the SQL becomes horrendous. I doubt that theoreticians would care about either of those very practical concerns.)

Comments? What problem is 1NF as formulated by the database gurus intended to solve?

----------


## skhanal

1NF deals with repeating values in a column. A column should only contain one value to satisfy 1NF. 

I think your model where Employees are in Dept. table is breaking 2NF as Employees are not functionally dependent on department key, unless you treat employee as an attribute of Department.

----------


## Marc Rochkind

2NF on applies if there is a composite key. It is in 3NF because the employee columns (which mean that the employee is in the department) all depend on the key, not on any non-key column.

I'm interested in your first statement, though. That certainly is what the historical literature says.

----------


## skhanal

If you are not considering Employee as an entity but attributes of a Department, then you satisfy "Key, the whole Key, nothing but the whole key, so help me Codd" rule, but in real world I will say "Help you God"  :Big Grin:

----------


## ErwinSmout

Starting with the beginning ...  In the very first paper [A relational model of data for large shared data banks], Codd mentioned the possibility that the domain of an attribute could be a domain of relations (relation-valued, that is) "Nested tables" if you want the SQL speak.  In that same paper, he also proceeded to call such relations (that include an attribute of what he called a "nonsimple" domain) "unnormalised" :

"A relation whose domains are all simple can be represented in storage by a two-dimensional column-homogeneous array of the kind discussed above. Some more complicated data structure is necessary for a relation with one or more nonsimple domains. For this reason (and others to be cited below) the possibility of eliminating nonsimple domains appears worth investigating! There is, in fact, a very simple elimination procedure, which we shall call normalization."

and the advantage to be had by rigorously applying this "normalization" procedure is described as follows :

"The simplicity of the array representation which becomes feasible when all relations are cast in normal form is not only an advantage for storage purposes but also for communication of bulk data between systems which use widely different representations of the data."

So the problem Codd thought would be solved is a problem of over-complexity of the interface between software components passing relational data onto one another.

But the problem here is that the precise meaning of the word "simple" (and "nonsimple") is not formally definable !!!  Over the following decades, a multitude of attempts has been done to improve on that situation ("no repeating groups" was a very popular one) but they all suffered from the same problem : the terms they came up with as a replacement for "simple / nonsimple" were themselves equally undefinable.  Codd ended up defining the distinction as "no relation-valued attributes", but that does not cover arrays as we know them from programming languages, nor does it cover things like the subclasses of java.util.Set and java.util.List.  So either you stick with "simple vs nonsimple" and leave it up to the [DBMS] user what he thinks is simple and/or nonsimple, or you try to kind of enumerate all the kinds of things that are "nonsimple" but then you are bound to overlook some stuff that you would have included had you considered, so your definition is bound to be incomplete wrt the intended semantics behind that word "nonsimple".

Those are the reasons why Date now holds (and it's relatively widespread current wisdom) that the only way to overcome this problem is to equate "being in 1NF" with "just being relational data".  And data is ***still*** relational if a relation has an attribute that is itself relation-typed.  (More precisely : a relation is a subset of a cartesian product of domains (/types) and if one of those types is itself a relation type then the relation has a relation-valued attribute, and there's nothing non-relational about that.  Things which ***are*** non-relational are duplicate tuples (a relation is a set and a set cannot hold duplicates) and NULLs (NULL is either not a value of the domain/type or if it is, it violates the basic propoerty that any value must have, and that is to be equal to itself).

But all that just deals with attributes that hold collections, or lists, or sets of values.  Your example with employee1,employee2,employee3 as three distinct attributes of the relation does not fit that bill.  There is nothing non-relational about such a structure, provided it is guaranteed that there will always be exactly 3 employees.  In the case of employees, that's rather unlikely, but consider cases like Q1,Q2,Q3,Q4 or jan,feb,mar,apr,may,...,nov,dec .  It's not so difficult to imagine business cases where it can/will be guaranteed that there is always going to be a number to fill in for each quarter or each month of a year, and there is also a relatively comforting certainty that the number of quarters or months in a year is rather unlikely to ever change ...

So while, formally speaking, a design such as your employee1,employee2,employee3 is in and of itself not nonrelational, in many cases it still violates what I call "the spirit of the original 1NF" because the distinct attributes are now used to circumvent the rule and "hide" the repetitive nature of the attributes from the DBMS.  (Whether they do or not, is to be assessed on a case by case basis.  E.g. if there is a semantic distinction between the roles that these attributes play in the relation (e.g. employee1 is the "first-contact" employee, employee2 is the "backup contact" employee and employee3 is the "second backup contact" employee) then there is in fact no repetition going on at all and there cannot be any doubt that such a relation is not suffering the "problem 1NF was intended to solve".  That would only be the case if the intended semantics of those three attributes are such that there is no meaningful difference between, say, 

employee1 : 1   employee2 : 2   employee3 : 3     and     employee1 : 2   employee2 : 3   employee3 : 1

(In which case the rules of the model are in fact such that the relation should hold each and every possible permutation of the three employee_ values concerned, because each distinct permutation of these values still gives rise to an instantiation of the predicate that is true.)  But assessing that requires inspection of the intended semantics, and formal first-order logic has no way to define the nature of such inspection.

----------

