Exploring SQL Server 2016 Dynamic Data Masking – Part One – Creating a Table that uses Dynamic Data Masking

In this article about exploring SQL Server 2016 features I will start investigating a new feature known as Dynamic Data Masking. By using data masking you can obscure the real value of a data column with a masked value.  In this article I’ll be exploring the different features of how to hide the value of sensitive data by applying Dynamic Data Masking.

What is Dynamic Data Masking?

What exactly is Dynamic Data Masking?  Data masking takes a given column value and converts it into an obfuscated value.   This new feature obscures a table column value as the data is returned to the client.   The real data value is stored in a SQL Server table storage in an un-obfuscated format.  It is only when the data is returned to the client that the value is masked.  This allows you to make sure certain users only see obfuscated values and do not see the real data values. 

With the introduction of Dynamic Data Masking, Microsoft has provided a method to minimize the risk of exposing confidential data by providing a database driven method to mask confidential data.  Now the DBAs can define masking rules for confidential columns and then tie those masking rules to database user rights.  This allows some database users to see the data unobscured, while other can only see the masked value.  This new feature allows you to mask data in the database layer, without modifying any application code.

Type of Obfuscations Supported

To obfuscate a column of data in a table SQL Server 2016 provides four different obfuscation routines:  Default, Email, Custom String and Random.  The “Default” data masking routine will masked the data with a series of “X’s” for data types of: char, nchar, varchar, nvarchar, text, or ntext.   If the column being masked with the “Default” data masking routine is numeric (bigint, bit, decimal, int, money, numeric, smallint, smallmoney, tinyint, float, or real) then actual numeric value is replace with a masked value of zero.   For binary, varbinary and image columns the masked value is a 0 when the “Default” data masking routine is used.  For date columns (date, datetime2, datetime, datatimeoffset, smalldata, and time) that use the “Default” data masking routine then the date value will be represented as 01/01/1900 00:00:00.0000000 when the data is masked.   When the “Email” masking routine is used SQL Server takes the first character of the email address, and the domain suffix and creates an email address that looks something like GXXX@XXXX.com.  With the custom string masking type you can define how many beginning and ending characters of the column value you want to expose in clear text, and what masked value you want to specify for the middle of the column.   For the last masking type of “Random” SQL Server will mask any numeric data type with a random value, where a range can be specified for the different random values.

When a column that has masking rules defined on a column, the data is masked as it is returned to client.  That is unless the database users that is selecting the data has been given access to read the data in unmasked format.  Any user that has been granted UNMASK database permissions is allowed to see data in clear text without it getting obscured by the masking rule.

To better understand Dynamic Data Masking let me go through a couple of examples.   Keep in mind the research and examples used in this article where developed using SQL Server 2016 CTP 3.2.  Therefore information provided here might change before the final bits for SQL Server 2016 are delivered.

 

Creating a Table That Contains Masked Data

It is very easy to create columns in a table that will be dynamically masked.  Here is some code I will use to create and populate a table that contains a number of masked data columns:

USE tempdb;
GO
CREATE TABLE ClientInfo
  (ClientID int IDENTITY,
   FirstName varchar(65),
   LastName varchar(65),
   PhoneNum bigint 
      MASKED WITH (FUNCTION = 'default()'),
   EmailAddr varchar(100) 
      MASKED WITH (FUNCTION = 'email()'),
   CreditCardNum varchar(19) MASKED 
      WITH (FUNCTION = 'partial(0,"XXXX-XXXX-XXXX-",4)'),
   BirthDT date MASKED 
      WITH (FUNCTION = 'default()'));
 
INSERT Clientinfo (FirstName, LastName, PhoneNum, EmailAddr,CreditCardNum,BirthDT) VALUES 
('George', 'Washington', 5555814441, 
'GeorgeW@datanbasejournal.com', '0123-4567-8901-2345','02/22/1732'),
('Thomas', 'Jefferson', 5559841298, 
'ThomasJ@datanbasejournal.com', '9999-9999-9999-9999', '04/13/1743'),
('Abraham', 'Lincoln', 5554070123, 
'AbrahamL@datanbasejournal.com','0000-1111-2222-3333', '02/12/1809');

Here you can see I created four different columns that are masked.  The first column that is masked is PhoneNumber, which has a data type of bigint and uses the “default()” masking type.  The next column is EmailAddr, which has a masking type of “email()”. The CreditCardNum is define using the “partial” masking type.  Lastly I created the BirthDT column with a “default()” masking type.   Once the table is created I populated it with three rows.

To demonstrate how data is masked with a SELECT statement I will need to create a database user and then provide them with SELECT access to the ClientInfo table.  To create my test user with the appropriate rights I will run the following code:

USE tempdb;
GO
CREATE USER AppReader WITHOUT LOGIN;
GRANT SELECT ON ClientInfo TO AppReader;
 

With this code I created the AppReader user and only provided SELECT rights on the ClientInfo table.  To show how Dynamic Data Masking works, let me first run the following code to show how data is returned when the AppReader selects data from the ClientInfo table:

USE tempdb;
GO
EXECUTE AS USER = 'AppReader';
SELECT * FROM ClientInfo;
REVERT;

When I run this code I get the following output:

ClientID FirstName LastName  PhoneNum EmailAddr       CreditCardNum       BirthDT
-------- --------- --------- -------- --------------- ------------------- ---------
1        George   Washington 0        GXXX@XXXX.com   XXXX-XXXX-XXXX-2345 1900-01-01
2        Thomas   Jefferson  0        TXXX@XXXX.com   XXXX-XXXX-XXXX-9999 1900-01-01
3        Abraham  Lincoln    0        AXXX@XXXX.com   XXXX-XXXX-XXXX-3333 1900-01-01

Here you can see that PhoneNum, EmailAddr, CreditCardNum and BirthDT columns all got masked.  The PhoneNum column was a bigint column that was defined with a “default()” mask criteria.  As you can see the “default()” mask setting caused a “0” to be returned for each of the PhoneNum values.  For the EmailAddr column, which is defined as varchar(100) with an edit mask of “email()”. Here you can see the EmailAddr returns the first character of the email address, and the last node of the domain name, with “XXX@XXXX.” in between. For the CreditCardNum column a custom “partial” edit mask that has been defined: (0,”XXXX-XXXX-XXXX-“,4).  By having this masking definition SQL Server only showed the last four characters of the credit card column to the AppReader user.  The last column returned is BirthDT, which also has a “default()” edit mask.  Since the BirthDT column is defined as a date field, the masked value of “1900-00-01” is returned for every BirthDT value.  As you can see my AppReader database user was unable to see the real data values for each of the columns that were masked.

How to Grant Rights so Some Users can See Unmasked Data

In order to see data in an un-masked format for columns that have a masked rule defined a database user needs special permission.  The permission they need is UNMASK rights, which is a database level permission.    The database owner automatically gets these rights, as well as does anyone in the sysadmin role.  But for normal database users you need to grant them UNMASK rights.  In order to show how this works let me first create a database user and give them SELECT rights to SELECT data from the ClientInfo table and UNMASK rights in the database using the following script:

USE tempdb;
GO
CREATE USER AppAdmin WITHOUT LOGIN;
GRANT SELECT ON ClientInfo TO AppAdmin;
GRANT UNMASK TO AppAdmin;  

Now that I have created the AppAdmin user and given them UNMASK and SELECT rights, let’s see what data the AppAdmin user can see by running the following SELECT statement:

USE tempdb;
GO
EXECUTE AS USER = 'AppAdmin';
SELECT * FROM ClientInfo;
REVERT;
 

When I run this command I get the following output:

ClientID FirstName    LastName          PhoneNum      EmailAddr                                                                                              -------- ------------ ----------------- ------------- -------------
1        George       Washington        5555814441    GeorgeW@databasejournal.com                                                                         
2        Thomas       Jefferson         5559841298    ThomasJ@databasejournal.com                                                                         
3        Abraham      Lincoln           5554070123    AbrahamL@databasejournal.com  
 
CreditCardNum           BirthDT
----------------------- ----------------
0123-4567-8901-2345     1732-02-22
9999-9999-9999-9999     1743-04-13
0000-1111-2222-3333     1809-02-12
 

Note: output has been reformatted for readability.

By reviewing the output you can see that you can see all the columns of data in clear text, even those that had masking rules defined.

One thing to be aware of is the UNMASK right applies to all tables in the database.  Therefore there doesn’t seem an easy way to provide rights so a specific user could see unmasked data in one table, and masked data in another table in the same database.

Summary

Data Masking is the process of converting a text value into an alternative value that hides the real underlying data value.  This conversion, or obfuscation is done right in the database engine within SQL Server 2016 and therefore requires no application code to mask a column value.  If you have a need to show obfuscated values to some users and the actual values to others, then you should consider looking into SQL Server 2016 Dynamic Data Masking to see if it will meet your needs for obscuring confidential data. 

Keep in mind obscuring data using Dynamic Data Masking is not really a security method to restrict access to confidential data.  But instead it is only a method to obscure the data values when the data is returned to the client.  The data stored on disks is still in clear text.  If your goal is to secure confidential data values, then you should consider implementing a true encryption process, like Always Encrypted so your data will be encrypted at the rest and in transit.   

In this article I wasn’t able to discuss all the aspects of using Dynamic Data Masking.  In future articles I will explore more information about masking data using the new SQL Server 2016 Dynamic Data Masking feature.

See all articles by Greg Larsen

Gregory Larsen
Gregory Larsen
Gregory A. Larsen is a DBA at Washington State Department of Health (DOH). Greg is responsible for maintaining SQL Server and other database management software. Greg works with customers and developers to design and implement database changes, and solve database/application related problems. Greg builds homegrown solutions to simplify and streamline common database management tasks, such as capacity management.

Get the Free Newsletter!

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles