CREATE PARTITION FUNCTION

CREATE PARTITION FUNCTION (Transact-SQL)

Creates a function in the current database that maps the rows of a table or index into partitions based on the values of a specified column. Using CREATE PARTITION FUNCTION is the first step in creating a partitioned table or index.

CREATE PARTITION FUNCTION partition_function_name ( input_parameter_type )
AS RANGE [ LEFT | RIGHT ]
FOR VALUES ( [ boundary_value [ ,...n ] ] )
[ ; ]

partition_function_name
Is the name of the partition function. Partition function names must be unique within the database and comply with the rules for identifiers.
input_parameter_type
Is the data type of the column used for partitioning. All data types are valid for use as partitioning columns, except text, ntext, image, xml, timestamp, varchar(max), nvarchar(max), varbinary(max), alias data types, or CLR user-defined data types.The actual column, known as a partitioning column, is specified in the CREATE TABLE or CREATE INDEX statement.
boundary_value
Specifies the boundary values for each partition of a partitioned table or index that uses partition_function_name. If boundary_value is empty, the partition function maps the whole table or index using partition_function_name into a single partition. Only one partitioning column, specified in a CREATE TABLE or CREATE INDEX statement, can be used.boundary_value is a constant expression that can reference variables. This includes user-defined type variables, or functions and user-defined functions. It cannot reference Transact-SQL expressions. boundary_value must either match or be implicitly convertible to the data type supplied in input_parameter_type, and cannot be truncated during implicit conversion in a way that the size and scale of the value does not match that of its corresponding input_parameter_type.

If boundary_value consists of datetime or smalldatetime literals, these literals are evaluated assuming that us_english is the session language. This behavior is deprecated. To make sure the partition function definition behaves as expected for all session languages, we recommend that you use constants that are interpreted the same way for all language settings, such as the yyyymmdd format; or explicitly convert literals to a specific style. For more information, see Writing International Transact-SQL Statements. To determine the language session of your server, run SELECT @@LANGUAGE.

…n
Specifies the number of values supplied by boundary_value, not to exceed 999. The number of partitions created is equal to n + 1. The values do not have to be listed in order. If the values are not in order, the Database Engine sorts them, creates the function, and returns a warning that the values are not provided in order. The Database Engine returns an error if n includes any duplicate values.
LEFT | RIGHT
Specifies to which side of each boundary value interval, left or right, the boundary_value [ ,…n ] belongs, when interval values are sorted by the Database Engine in ascending order from left to right. If not specified, LEFT is the default. For more information, see Examples.
The scope of a partition function is limited to the database that it is created in. Within the database, partition functions reside in a separate namespace from the other functions.Any rows whose partitioning column has null values are placed in the left-most partition, unless NULL is specified as a boundary value and RIGHT is indicated. In this case, the left-most partition is an empty partition, and NULL values are placed in the following partition.
Any one of the following permissions can be used to execute CREATE PARTITION FUNCTION:

  • ALTER ANY DATASPACE permission. This permission defaults to members of the sysadmin fixed server role and the db_owner and db_ddladmin fixed database roles.
  • CONTROL or ALTER permission on the database in which the partition function is being created.
  • CONTROL SERVER or ALTER ANY DATABASE permission on the server of the database in which the partition function is being created.

A. Creating a RANGE LEFT partition function on an int column

The following partition function will partition a table or index into four partitions.

CREATE PARTITION FUNCTION myRangePF1 (int)
AS RANGE LEFT FOR VALUES (1, 100, 1000);

The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

Partition 1 2 3 4
Values col1 <= 1 col1 > 1 AND col1 <= 100 col1 > 100 AND col1 <= 1000 col1 > 1000

B. Creating a RANGE RIGHT partition function on an int column

The following partition function uses the same values for boundary_value [ ,…n ] as the previous example, except it specifies RANGE RIGHT.

CREATE PARTITION FUNCTION myRangePF2 (int)
AS RANGE RIGHT FOR VALUES (1, 100, 1000);

The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

Partition 1 2 3 4
Values col1 < 1 col1 >= 1 AND col1 < 100 col1 >= 100 AND col1 < 1000 col1 >= 1000

C. Creating a RANGE RIGHT partition function on a datetime column

The following partition function partitions a table or index into 12 partitions, one for each month of a year’s worth of values in a datetime column.

CREATE PARTITION FUNCTION [myDateRangePF1] (datetime)
AS RANGE RIGHT FOR VALUES ('20030201', '20030301', '20030401',
               '20030501', '20030601', '20030701', '20030801',
               '20030901', '20031001', '20031101', '20031201');

The following table shows how a table or index that uses this partition function on partitioning column datecol would be partitioned.

Partition 1 2 11 12
Values datecol < February 1, 2003 datecol >= February 1, 2003 AND datecol < March 1, 2003 datecol >= November 1, 2003 AND col1 < December 1, 2003 col1 >= December 1, 2003

D. Creating a partition function on a char column

The following partition function partitions a table or index into four partitions.

CREATE PARTITION FUNCTION myRangePF3 (char(20))
AS RANGE RIGHT FOR VALUES ('EX', 'RXE', 'XR');

The following table shows how a table that uses this partition function on partitioning column col1 would be partitioned.

Partition 1 2 3 4
Values col1 < EX col1 >= EX AND col1 < RXE col1 >= RXE AND col1 < XR col1 >= XR
Advertisements

Partitioning the Data in a Table

Is it ever good database design practice (for speed sake, etc.) to essentially make copies of tables to hold a certain group of data?

For example, I have come across a database table that stores information for a housing subdivision; ie. lot number, lot size, lot price, etc. And the database to which this table belongs stores this data for many subdivisions. However, instead of having one table that stores the subdivision information for ALL subdivisions (and having some ID that represents the specific subdivision), this database has one table for each subdivision. For example, ‘Clair Ridge Estates Subdivision Info’ and another table ‘Possum Bend Subdivision Info’, etc, with each table having the exact same fields. And, if they needed another subdivision, they would make yet another copy and give it a unique name.”

Yes, there are times this is a good idea. I like this question because it reminded me of one of my favorite features of SQL Server – partitioned views.

What you’re referring to is an optimization method called horizontal partitioning. That is, a table is split up into multiple smaller tables containing the same number of columns, but fewer rows. Compare this to vertical partitioning, in which the table is split into multiple smaller tables with the same number of rows, but fewer columns.

And yes, this design decision is often made to improve performance. Horizontally partitioning a table gives us some advantages:

  • Each partition table will have fewer rows; if you have to (heaven forbid) table-scan the data, it will take less time.
  • Indexes on each partition table will be smaller (=faster seeks) than a corresponding index on the unpartitioned table.
  • If you need to, you can put each partition table on a different filegroup and partition the data among multiple disks/RAID volumes/drive controllers.
  • If you’re trying to whomp Oracle’s TPC-C benchmark, then you may want to consider partitioning the data among multiple federated servers in SQL Server 2000. (Although for storing information about housing subdivisions, this may be a bit over the top.)
  • If you create a partitioned view on the partitioned tables, you can treat the view like it is the whole table, and the QP (query processor) will only touch the tables it needs to fulfill the query. You get the benefits of horizontal partitioning without the query headache.

So, as you can see, horizontal partitioning is all about splitting up the workload – spreading out data access among tables, indexes, disks, and servers.

Why would you want to do this? Well, maybe you have a big table – hundreds of millions of rows, for instance. Or maybe not so many rows, but large rows. Or maybe you have a table in a data warehouse that contains frequently and infrequently accessed rows. All of these situations are candidates for partitioning.

Now, there are two big downsides to all of this:

  • Unless you use a partitioned view to access the data, you’ll have to build logic into your application to access the correct table, and that has a high suck factor. Please, please, use the partitioned view instead.
  • You actually have to partition the data. And maintain it. And create the partitioned view. And balance the amount of data in each partitioned table, if needed. In other words, the dreaded “administrative overhead”.

Okay, since I’m touting the partitioned view, let me quickly explain how to create one. I’ll use the information from this question as an example.First, the tables:


CREATE TABLE Subdiv_ClaireRidgeEstates (SubdivID int, LotID int /*, etc.*/)
CREATE TABLE Subdiv_TibetianYakFarms (SubdivID int, LotID int /*, etc.*/)

You may notice that I included the Subdivision ID in each table. This is important; for the partitioned view to work most effectively, the QP must be able to know that each partition table will only contain a certain type of data. To do this, you need to build CHECK constraints on each table on the ID that you’re partitioning on. Since you’re partitioning the data by subdivision, you will build CHECK constraints on SubdivID:


ALTER TABLE Subdiv_ClaireRidgeEstates ADD CONSTRAINT CK_CRE_SubdivID CHECK (SubdivID = 42)
ALTER TABLE Subdiv_TibetianYakFarms ADD CONSTRAINT CK_TYF_SubdivID CHECK (SubdivID = 9538)

You could just as easily partition by using a surrogate key field and assigning a range of key values to each partition table. Or by partitioning on a date and using a range of dates for each partition value. Regardless, you still need those CHECK constraints in place on each table.

After actually creating the partition tables, distributing the data, and building the CHECK constraints, building the view is pretty easy. You just SELECT * from each partition table and use UNION ALL to combine the results of the query:


CREATE VIEW Subdivision
AS
SELECT * FROM Subdiv_ClaireRidgeEstates
UNION ALL
SELECT * FROM Subdiv_TibetianYakFarms

Now, if you’ve been following along with the example, try inserting some sample rows into each table:


INSERT Subdiv_ClaireRidgeEstates VALUES (42,9999)
INSERT Subdiv_TibetianYakFarms VALUES (9538,1234)

Now, turn on the “Show Execution Plan” option in query analyzer, and run the following queries:

SELECT * FROM Subdivision WHERE SubdivID = 42
SELECT * FROM Subdivision WHERE SubdivID = 9538
SELECT * FROM Subdivision


You’ll notice that for the first two queries, SQL Server only pulls information from the required partition table. Only in the last query, where we don’t filter by SubdivID, does the QP pull data from each partition table.

In SQL Server 7.0, you unfortunately cannot update data in a partitioned view. However, this IS possible in SQL Server 2000. Check out SQL Server Books Online (especially if you’re going to use distributed partitioned views) for the do’s and don’ts of partitioning data.