# Miscellaneous > Database Design >  Stock Prices Database Design

## Adam Sneller

I am building a database to hold 10 years of data for all stocks traded on the NYSE.  The size of the project has raised some interesting questions as to exactly how I&#39;m supposed to normalize it.

The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table.  The only problem here, it the data table ends up being 250,000,000+ records long.  I&#39;m using mySQL, but even this crashes on the 1st query.

The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc.  This is then joined to the Ticker&#39;s table.  This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

The third model is to give each stock its own table, all joined to the Tickers table.  This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

Phew!  I get the feeling I&#39;m missing something obvious.  Any suggestions?

----------


## Popov

Use SQL Server


------------
Adam Sneller at 7/13/01 12:22:50 AM

I am building a database to hold 10 years of data for all stocks traded on the NYSE.  The size of the project has raised some interesting questions as to exactly how I&#39;m supposed to normalize it.

The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table.  The only problem here, it the data table ends up being 250,000,000+ records long.  I&#39;m using mySQL, but even this crashes on the 1st query.

The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc.  This is then joined to the Ticker&#39;s table.  This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

The third model is to give each stock its own table, all joined to the Tickers table.  This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

Phew!  I get the feeling I&#39;m missing something obvious.  Any suggestions?

----------


## Patrick

You are facing 2 issues.
1) It&#39;s necessary to develop a good design for your Database. In this part, we must temporary &#39;forget&#39; the size issue. You work with the 3rd NF Etc. Regular design...

2° the size issue must be faced. In your case, we could investigate in the &#39;table-slicing&#39; I mean several (a lot) tables containing each a period of time like Month or week (a kind of time unit workable). then each table becomes workable and if queries require data inside a month we look at only one table. However the month size is not a solution, the quarter could be your good value. 

I hope this help you.
Patrick


------------
Popov at 7/22/01 1:06:58 AM

Use SQL Server


------------
Adam Sneller at 7/13/01 12:22:50 AM

I am building a database to hold 10 years of data for all stocks traded on the NYSE.  The size of the project has raised some interesting questions as to exactly how I&#39;m supposed to normalize it.

The first model is to place the ticker symbol, company name, and description in a separate Tickers table, and group date, open, high, low, close, and volume in a joined data table.  The only problem here, it the data table ends up being 250,000,000+ records long.  I&#39;m using mySQL, but even this crashes on the 1st query.

The second model is to create separate Open, High, Low, Close, and Volume tables, with fields of Tickers_ID, Date_01, Date_02, etc.  This is then joined to the Ticker&#39;s table.  This works, but make queries very difficult, as each date has to be named separately in the SELECT statement.

The third model is to give each stock its own table, all joined to the Tickers table.  This works, but since there are over 3000+ stocks on the NYSE, it means I have to name each stock in the SELECT statement for a multi-stock query.

Phew!  I get the feeling I&#39;m missing something obvious.  Any suggestions?

----------


## Wessel

Adam,

It's a long time ago that you posted here, but I am just curious what solution did you implemented? I'm currently hitting the same wall after 2Milion quotes in one table it starts to get slower and I was thinking of the excellent suggestion of Patric. 

Wessel

----------

