Recently I had the opportunity to work with a client on a very large Web Application; the database for this application was composed of hundreds of tables, indexes, file-groups, and files; including a dozen partition-functions and as many partition-schemes, spanning both clustered and non-clustered indexes.

The client was requesting to improve performance in their database as they moved this massive application to Azure Services. In order to accomplish this, I had to complete three of the following tasks:

  1. Move all clustered indexes into fewer file-groups, with a minimal number of files in each group. With a current configuration of 150+ file-groups the client was looking for a better management strategy, which aligned the number of file-groups with their Azure hardware.
  2. Clean up the file and file-group naming conventions and make recommendations on best practices. Over the years many developers have created new SQL file-groups, with no clear convention in mind, to the point that it was impossible to associate any file or file-group with its associated application focus.
  3. Develop a bench-mark query set that could be tested on both the current database configuration and the new. What exactly, if any, were the performance gains of having 150+ file-groups; and more importantly, when the file-groups were reduced did performance actually increase?

The Process

  1. To accomplish the first goal, I developed analysis queries that could enumerate the current file and file-group configuration, which in turn were used to build SQL statements that could move both clustered and non-clustered indexes into new file-groups.
    Query to identify file-groups and files…

    select
    fg.name [File Group Name],
    convert(numeric,sdf.size)/128 [File Size (MB)],
    convert(numeric,sdf.size)/128/1024 [File Size (GB)], sdf.size/128/1024 [(GB)],
    sdf.name [File Logical Name],
    right(sdf.physical_name,CHARINDEX(‘\’,reverse(sdf.physical_name))-1) [File Physical Name],
    sdf.physical_name [File Physical Path] from sys.database_files [sdf] join sys.filegroups [fg] on sdf.data_space_id=fg.data_space_id
    order by 1 desc,2 desc

    Query to identify indexes and partitions…

    select
    o.Name [Table Name],
    i.Name [Index Name],
    i.Type, I.type_desc [Index Type],
    ps.name [Partition Schema],
    p.partition_number [Partition Number] from sys.objects o
    join sys.partitions p on p.object_id = o.object_id
    join sys.indexes i on p.object_id = i.object_id and p.index_id = i.index_id
    join sys.data_spaces ds on i.data_space_id = ds.data_space_id
    join sys.partition_schemes ps on ds.data_space_id = ps.data_space_id
    order by 1,2,6

  2. To accomplish the second goal, I developed analysis queries that could find naming patterns in the table schemes and isolate application features within the names of the tables.
    Query to identify new naming conventions…

    select
    fg.name [File Group Name],
        case
    when charindex(‘History’,fg.name)>0 then ‘FG_History’
    when charindex(‘Order’,fg.name)>0 then ‘FG_Orders’
    when charindex(‘Product’,fg.name)>0 then ‘FG_Products’
    when charindex(‘Asset’,fg.name)>0 then ‘FG_Assets’
    else ‘FG_Misc’
    end [New File Group],
    count(sdf.name) [Number of Files],
    from sys.database_files [sdf] join sys.filegroups [fg] on sdf.data_space_id=fg.data_space_id
    group by fg.name
    order by 2,1

  3. Lastly to accomplish the third goal, I set up two data environments; one with the current file and file-group configuration and a second database with the new file and file-group configuration. This allowed me to develop a single set of performance queries across a wide range of tables and run the queries in parallel on both databases; comparing execution plans and times.

After the analysis dust settled, it was clear that 150+ file-groups and 200+ files were actually reducing performance in the database. Moreover, as an intermediate step I actually sent all data and indexes back to the PRIMARY file group and was able to achieve better performance with four files in one file group than all the initial files and groups. There were additional considerations for determining how many file-groups and files were optimal with the new Azure database; so check out my additional references. If you are interested in the complete set of analysis queries for this project or are looking to optimize your SQL server files and file groups, please feel free to contact FMT Consultants below.

Additional References

Performance best practices for SQL Server in Azure
Data file best practices for SQL Server in Azure
The top 10 Best Practices for SQL Server storage systems