NashTech Insights

Google Cloud DLP: Creating DLP Jobs and Templates

Agnibhas Chattopadhyay
Agnibhas Chattopadhyay
Table of Contents
DLPlogo

Data security and privacy are paramount concerns in today’s digital age. With the increasing amount of data being generated and processed, organizations need effective ways to protect sensitive information. Google Cloud’s Data Loss Prevention (DLP) offers a robust solution for identifying and managing sensitive data. In this blog, we’ll delve into how to create DLP jobs and templates using Java and Spring Boot.

NOTE: Google Cloud DLP service is now known as Sensitive Data Protection service. Although, in this blog, we will be using the term “Google Cloud DLP”.

Understanding Google Cloud DLP

Google Cloud DLP is a fully managed service that enables you to discover, classify, and then protect sensitive data inside and outside Google Cloud. It utilizes various methods, including content inspection, contextual inspection, and metadata inspection, to scan and identify sensitive information within your datasets.

The DLP Inspection Service provides users with the capability to conduct a comprehensive scan of a particular resource in order to identify instances of sensitive data. Users have the ability to specify the specific type of information they wish to search for, and then the inspection service will generate a detailed report outlining all matches to the specified data type. This report will include pertinent information such as the precise number of credit card numbers found within a designated Cloud Storage bucket, along with the exact location of each identified instance.

There are two ways to perform an inspection:

  • Create an inspection or hybrid job through the Google Cloud console or through the Cloud Data Loss Prevention API of Sensitive Data Protection (DLP API).
  • Send a content.inspect request to the DLP API.
Inspection through a job

You can configure inspection and hybrid jobs through the Google Cloud console or through the Cloud Data Loss Prevention API. Then, the results of inspection and hybrid jobs are stored in Google Cloud.

You can specify actions that you want Sensitive Data Protection to take when the inspection or hybrid job is complete. For example, you can configure a job to save the findings to a BigQuery table or send a Pub/Sub notification.

Inspection jobs

Sensitive Data Protection has built-in support for select Google Cloud products. You can inspect a BigQuery table, a Cloud Storage bucket or folder, and a Datastore kind. For further information, see Inspect Google Cloud storage and databases for sensitive data.

Hybrid jobs

A hybrid job lets you scan payloads of data sent from any source, and then store the inspection findings in Google Cloud. For further information, see Hybrid jobs and job triggers.

Inspection through a content.inspect request

The content.inspect method of the DLP API lets you send data directly to the DLP API for inspection. The response contains the inspection findings. Use this approach if you require a synchronous operation or if you don’t want to store the findings in Google Cloud.

Sensitive data de-identification

The de-identification service offers the capability to obscure sensitive data instances. It provides a range of transformation methods, such as masking, redaction, bucketing, date shifting, and tokenization. For more detailed information on these transformation methods, please refer to the transformation methods documentation.

There are two ways to perform de-identification:

Risk analysis

The risk analysis service lets you analyze structured BigQuery data to identify and visualize the risk that sensitive information will be revealed (re-identified).

You can use risk analysis methods before de-identification to help determine an effective de-identification strategy, or after de-identification to monitor for any changes or outliers.

You perform risk analysis by creating a risk analysis job. For more information, see Re-identification risk analysis.

Cloud Data Loss Prevention API

The Cloud Data Loss Prevention API lets you use the Sensitive Data Protection services programmatically. Through the DLP API, you can inspect data from inside and outside Google Cloud and build custom workloads on or off cloud. For more information, see Service method types.

Asynchronous operations

If you want to asynchronously inspect or analyze data at rest, you can use the DLP API to create a DlpJob. Creating a DlpJob is the equivalent of creating an inspection job, hybrid job, or risk analysis job through the Google Cloud console. The results of a DlpJob are stored in Google Cloud.

Synchronous operations

If you want to inspect, de-identify, or re-identify data synchronously, use the inline content methods of the DLP API. To de-identify data in images, you can use the image.redact method. You send the data in an API request and the DLP API responds with the inspection, de-identification, or re-identification results. The results of content methods and the image.redact method aren’t stored in Google Cloud.

Spring Cloud GCP

The Spring framework! It’s not just popular, it’s the ultimate go-to framework for developers all around the world. Since its inception, Spring has been revolutionizing the Java landscape with its versatility and reliability. And now, enter Spring Cloud GCP, the result of an extraordinary collaboration between Spring and Google Cloud Platform.

This project seamlessly integrates the power of Spring with the cutting-edge features of the Google Cloud Platform. Spring Cloud GCP empowers developers to harness the immense potential of the Google Cloud Platform, unleashing a whole new realm of possibilities for building robust and scalable applications.

Spring Cloud GCP comprises of multiple libraries that enable seamless integration between Springboot applications and various GCP services. With Spring Cloud GCP, developers gain direct access to a wide range of services offered by the Google Cloud Platform. Note that while Spring Cloud GCP offers support for a vast majority of services, not all services are currently included. The project continues to evolve, expanding the list of supported services and enhancing the overall functionality.

You can find comprehensive information about Spring Cloud GCP on its official website, accessible through this link. This website offers precise details regarding the features and services provided by Spring Cloud GCP, along with relevant documentation to assist you in getting started. While the page covers a wide array of services, our focus for this article will be on Google Cloud Storage, one of the services offered by Spring Cloud GCP.

Getting Started with Google Cloud DLP in Java Spring Boot

To use Google Cloud DLP in a Java Spring Boot application, follow these steps:

Set up your Google Cloud Project
Spring Cloud GCP Create Project
Set up Google Cloud SDK and Authentication
  • Install the Google Cloud SDK: https://cloud.google.com/sdk/docs/install
  • Create a Spring Boot Project You can create a Spring Boot project using a tool like Spring Initializr (https://start.spring.io/), or use your preferred IDE to create a new Spring Boot project.
  • Configure authentication:
    • After you have installed Google Cloud SDK open up the terminal in your IDE and run
    • gcloud auth login
    • Now a browser window will open up asking for permission to access your GCP account. Allow it and go back to your Terminal and run:
    • gcloud config set project PROJECT_ID (replace PROJECT_ID with your Google Cloud Project ID).
  • Now we can access all GCP resources from our local machine.
Springboot Application
Add the required dependencies:

For Maven:

  <dependencies>
    <dependency>
      <groupId>com.google.cloud</groupId>
      <artifactId>libraries-bom</artifactId>
      <version>26.22.0</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
 <dependency>
        <groupId>com.google.cloud</groupId>
        <artifactId>google-cloud-dlp</artifactId>
        <version>2.1.1</version>
    </dependency>
  </dependencies>

For Gradle:

implementation platform('com.google.cloud:libraries-bom:26.22.0')

implementation 'com.google.cloud:google-cloud-dlp'


We have completed the initial setup for accessing Google Cloud Platform (GCP) resources. Moving forward, we will now examine a sample template and a sample job implemented using Java. These examples will serve as valuable references for your specific implementation requirements.

Creating DLP Template

Now, let’s create a DLP template for inspecting sensitive data:

import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.CreateInspectTemplateRequest;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectTemplate;
import com.google.privacy.dlp.v2.LocationName;
import java.io.IOException;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;

class TemplatesCreate {

  public static void main(String[] args) throws Exception {
    String projectId = "your-project-id"; // Please enter your project ID here
    createInspectTemplate(projectId);
  }

  public static void createInspectTemplate(String projectId) throws IOException {

    try (DlpServiceClient dlpServiceClient = DlpServiceClient.create()) {
      // Specify the type of info the inspection will look for.
      List<InfoType> infoTypes =
          Stream.of("PHONE_NUMBER", "EMAIL_ADDRESS", "CREDIT_CARD_NUMBER")
              .map(it -> InfoType.newBuilder().setName(it).build())
              .collect(Collectors.toList());

      // Create inspection configuration
      InspectConfig inspectConfig = InspectConfig.newBuilder().addAllInfoTypes(infoTypes).build();


      String displayName = "Config Name";
      String description = "Config Description";

      InspectTemplate inspectTemplate =
          InspectTemplate.newBuilder()
              .setInspectConfig(inspectConfig)
              .setDisplayName(displayName)
              .setDescription(description)
              .build();

      CreateInspectTemplateRequest createInspectTemplateRequest =
          CreateInspectTemplateRequest.newBuilder()
              .setParent(LocationName.of(projectId, "global").toString())
              .setInspectTemplate(inspectTemplate)
              .build();

      InspectTemplate response =
          dlpServiceClient.createInspectTemplate(createInspectTemplateRequest);
      System.out.printf("Template created: %s", response.getName());
    }
  }
}
Creating DLP Job

Now let’s check out a sample job that inspects a BigQuery table and stores the result in another table.

import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.Action;
import com.google.privacy.dlp.v2.BigQueryOptions;
import com.google.privacy.dlp.v2.BigQueryOptions.SampleMethod;
import com.google.privacy.dlp.v2.BigQueryTable;
import com.google.privacy.dlp.v2.CreateDlpJobRequest;
import com.google.privacy.dlp.v2.DlpJob;
import com.google.privacy.dlp.v2.FieldId;
import com.google.privacy.dlp.v2.GetDlpJobRequest;
import com.google.privacy.dlp.v2.InfoType;
import com.google.privacy.dlp.v2.InfoTypeStats;
import com.google.privacy.dlp.v2.InspectConfig;
import com.google.privacy.dlp.v2.InspectDataSourceDetails;
import com.google.privacy.dlp.v2.InspectJobConfig;
import com.google.privacy.dlp.v2.LocationName;
import com.google.privacy.dlp.v2.OutputStorageConfig;
import com.google.privacy.dlp.v2.StorageConfig;
import java.io.IOException;
import java.util.Arrays;
import java.util.List;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;

public class JobsCreateBQ {

  public static void main(String[] args) throws Exception {
    String projectId = "your-project-id"; // Please enter your project ID here
    inspectBigQueryTableWithSampling(projectId);
  }
  static Action createSaveFindingsAction(String datasetId, String tableId, String projectId) {
    return Action.newBuilder()
            .setSaveFindings(
                    Action.SaveFindings.newBuilder()
                            .setOutputConfig(
                                    OutputStorageConfig.newBuilder()
                                            .setTable(
                                                    BigQueryTable.newBuilder()
                                                            .setProjectId(projectId)
                                                            .setDatasetId(datasetId)
                                                            .setTableId(tableId))))
            .build();
  }
  // Inspects a BigQuery Table
  public static void inspectBigQueryTableWithSampling(
          String projectId)
          throws ExecutionException, InterruptedException, IOException {
    try (DlpServiceClient dlp = DlpServiceClient.create()) {
      String datasetId = "your-dataset";
      String tableId = "InputTable";
      String outputTableId = "OutputTable";

      BigQueryTable tableReference =
              BigQueryTable.newBuilder()
                      .setProjectId(projectId)
                      .setDatasetId(datasetId)
                      .setTableId(tableId)
                      .build();

      BigQueryOptions bigQueryOptions =
              BigQueryOptions.newBuilder()
                      .setTableReference(tableReference)
                      .setRowsLimit(1000)
                      .setSampleMethod(SampleMethod.TOP)
                      .addIdentifyingFields(FieldId.newBuilder().setName("unique_id_column"))
                      .build();

      StorageConfig storageConfig =
              StorageConfig.newBuilder().setBigQueryOptions(bigQueryOptions).build();


      List<String> infoTypeNames = Arrays.asList(
              "PERSON_NAME", "EMAIL_ADDRESS", "PHONE_NUMBER", "STREET_ADDRESS","DATE_OF_BIRTH","LAST_NAME","FIRST_NAME"
              // ... add more InfoType names here
      );

      List<InfoType> infoTypes = infoTypeNames.stream()
              .map(infoTypeName -> InfoType.newBuilder().setName(infoTypeName).build())
              .collect(Collectors.toList());

      InspectConfig inspectConfig = InspectConfig.newBuilder()
              .addAllInfoTypes(infoTypes)
              .setIncludeQuote(true)
              .build();

      InspectJobConfig inspectJobConfig =
              InspectJobConfig.newBuilder()
                      .setStorageConfig(storageConfig)
                      .setInspectConfig(inspectConfig)
                      .addActions(createSaveFindingsAction(datasetId,outputTableId,projectId))
                      .build();

      CreateDlpJobRequest createDlpJobRequest =
              CreateDlpJobRequest.newBuilder()
                      .setParent(LocationName.of(projectId, "global").toString())
                      .setInspectJob(inspectJobConfig)
                      .build();

      final DlpJob dlpJob = dlp.createDlpJob(createDlpJobRequest);
      System.out.println("Job created: " + dlpJob.getName());

      Thread.sleep(TimeUnit.MINUTES.toMillis(1)); // 1min sleep to wait for the job to complete. Change accordingly.

      // Get the latest state of the job from the service
      GetDlpJobRequest request = GetDlpJobRequest.newBuilder().setName(dlpJob.getName()).build();
      DlpJob completedJob = dlp.getDlpJob(request);

      // Parse the response and process results.
      System.out.println("Job status: " + completedJob.getState());
      System.out.println("Job name: " + dlpJob.getName());
      InspectDataSourceDetails.Result result = completedJob.getInspectDetails().getResult();
      System.out.println("Findings: ");
      for (InfoTypeStats infoTypeStat : result.getInfoTypeStatsList()) {
        System.out.print("\tInfo type: " + infoTypeStat.getInfoType().getName());
        System.out.println("\tCount: " + infoTypeStat.getCount());
      }
    }
  }
}

In this example, we create a DLP job to inspect a file in a Google Cloud Storage bucket for phone numbers.

Summary

This guide provides a high-level overview of using Google Cloud DLP to create inspection jobs with Java Spring Boot. Make sure to customize and adapt the code according to your specific use case and requirements. Integrating DLP into your application ensures that you maintain a strong level of data security and compliance with data protection regulations.

Stay tuned for more insightful articles on cutting-edge technologies at Nashtech Blogs, where we continue to explore and share the latest innovations in the tech industry.

References

Agnibhas Chattopadhyay

Agnibhas Chattopadhyay

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

%d