Data Masking

Data masking or data obfuscation is the process of hiding original data with the aim to protect data. It is generally used for personal identifiable data - such as name, address, phone number, email, social security number - of personal sensitive data such as medical records or phone logs. Through the GDPR or 'General Data Protection Regulation' activated in the EU in 2018, all European companies will be obligated to use data masking at application & analyst level.

When applying data masking, two principles should apply:
  • the data should remain meaningful for the application layer: this means that although the data can't be read, the masked data entry should still allow application to run; e.g., credit card validation must still remain possible even though the actual credit card number can't be seen.
  • the masking operation should prevent to be obviously
    reverse engineered: e.g., if a data masking transformation reverses the front name, the front name can still be obtained.
Types of data masking include:
  • (i) number variance (e.g. a random variance is applied to a salary field)
  • (ii) substraction (e.g. every first name is replaced by another first name)
  • (iii) shuffling
  • (iv) encryption (i.e. through an encryption key and a secondary table)
  • (v) nulling or deletion of values or
  • (vi) masking out
    (e.g. as used in credit card formatting such as XXXXXX 2341)