Friday 11 February 2011

Admin: how much data do/will you have?

Its important to understand your storage requirements for your database. In many applications i've worked with, the bulk of data is stored in one table and its this table which is most sensitive to data load.

So with that in mind, you need to be able to predict just how large your table(s) is likely to get. If you've "inherited" a database, you can use existing data to make current predictions and harness some of the procedures and DMVs that SQL Server makes available. If you're in the situation of designing a database from scratch, you can make an informed guess by using the alrogithm posted on MSDN.

Existing Database

So how can you guage the size of a database and the objects within it? Lets look at the things at your disposal which give varying degress of accuracy:

1) The last backup size - this will be for the entire database and also include the data in your transaction log
2) The size of the mdf (+ndfs) - this won't tell you exactly how much data you have as the file may have been set to a certain size and not be "full" of data.
3) Right click the database in SSMS (or run DBCC SHOWFILESTATS) - this gives you the total extents used in the database from which you can get a ball park figure of how much space is required (An Extent is 8 x 64KB pages) but you'll need to rememeber that not all pages/extents will be "full" of data
4) EXEC sp_spaceused - this gives can be run for the entire database or just a single table
5) Query sys.partitions and sys.allocation_units to see how many pages an object is using

Empty Database

This article provides a method of estimating the space required in a table:
http://msdn.microsoft.com/en-us/library/ms175991.aspx
The articles give some caveats as to why this is only an estimate and in my analysis, I got a difference of approximately 10% more in actual size of data in a table compared to the estimate.

Example:

CREATE TABLE [dbo].[tblOutput](
  
[Output_ID] INT NULL,
  
[T1] INT NULL,
  
[T2] INT NULL,
  
[Value] VARCHAR(255) NULL,
  
[Run_ID] INT NULL,
  
[Date_Created] DATETIME NULL
)
ON [PRIMARY]
GO
-- 19240000 rows in table
-- ESTIMATED SIZE INFORMATION
-- 995820 kb simple estimate based upon #rows / rowsize in kb (not accurate, assumes pages are full)
-- 1012624 kb based upon MSDN algorithm to estimate data

-- ACTUAL SIZE INFORMATION
DBCC SHOWCONTIG('tblOutput')
-- 19678 Total Extents for the table DBCC SHOWCONTIG
-- 157392 Total Pages for the table DBCC SHOWCONTIG
EXEC sp_spaceused 'tblOutput'
-- 1905488 kb (including indexes)
-- 1259136 kb (just data)
DBCC SHOWFILESTATS
-- 1907648 kb based upon 29807 Extents (DBCC SHOWFILESTATS)

SELECT *
FROM sys.allocation_units u
  
INNER JOIN sys.partitions p
      
ON p.partition_id = u.container_id
WHERE OBJECT_NAME(p.OBJECT_ID) = 'tblOutput'
-- used 157393 pages ~ 1259144 kb

SELECT page_count, record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('tblOutput'), NULL, NULL, 'DETAILED')
-- 157393 pages, 19240000 rows


You can see that although quite minor, depending where you look for your information you get differing results but for me, the differences are such that I wouldn't be too concerned on which method was used when it comes to predicting data growth. There are so many factors that can be introduced (notably indexes) that its questionable just how valuable these figures are in isolation.

As for which one to use, I suppose it makes sense to use sp_spaceused as this is likely to be maintained by MS as the truth on storage - or they will deprecate it and replace it.

No comments:

Post a Comment

/* add this crazy stuff in so i can use syntax highlighter