NashTech Blog

Table of Contents

Introduction

Databricks Compute refers to the computing resources provided by Databricks. This computing infrastructure supports various tasks such as running interactive notebooks, executing automated jobs, and handling SQL commands.

Types of Databricks Compute

There are various types of compute available in Databricks

1. All-purpose compute: It is used to analyze data using the notebook. You can create, terminate, and restart this compute using the UI, and CLI.

3. SQL Warehouse: It is used to run Databricks SQL commands. You can create SQL warehouses using the UI, and CLI.

4. Instance pools: This Compute is idle, ready-to-use instances, used to reduce start and autoscaling times. You can create this compute using the UI, and CLI.

Advantages of Databricks Compute

1. To run the interactive databricks notebook

2. It provides job compute that helps in cost optimization.

3. Helps to run the SQL Commands through SQL Warehouse (Endpoint).

How to Setup Your First Databricks Compute

Here we are going to discuss, how to configure your first all-purpose compute through Databricks UI.

1. Login to Databricks account. On the Databricks UI Dashboard, choose Compute or select Compute from the drop-down menu by clicking on New.

2. Now navigate to All-purpose compute and click on Create compute

3. Configure compute details including policy, node type, Databricks runtime version, and other specifications.

4. Policy is the set of rules that are designed by the admin to restrict the permission to create the cluster. You can select the policy.

5. Select the type of node. There are two types of nodes available multi-node and single node. There is a difference in the worker node type and driver node type for both compute configurations.

Select multi-node for a workload to support multiple worker nodes and process tasks quickly. Alternatively, use single-node for simpler tasks.

Select any of them.

6. Now you can select the access mode and single-user access.

Access ModeVisible to userSupport LanguagesNotes
Single UserAlwaysPython, SQL, Scala, Rused by only a single user.
SharedAlways(Need Premium Plan)Python, SQLused by multiple users
No isolation SharedAdmin can hide the cluster type from the admin setting page.Python, SQL, Scala, Raccount-level setting for this type of cluster
CustomHidden(for all new clusters)Python, SQL, Scala, RThis option is shown only if you have existing clusters without a specified access mode.

7. Now select the Databricks runtime version. Choose the latest version for all-purpose compute, as it supports the newest optimizations and up-to-date packages.

As per your need, you can enable it or leave it.

Different Configuration for Multi-Node & Single-Node Compute

9. Multi-node and single-node configurations feature distinct in worker types and driver types, which are elaborated below.

For Multi-Node you have two options worker type and driver type.

Here you can select the worker type as per your workload and assign the number of worker types example 2 to 5, etc..

In the driver type select the type of driver. Driver type maintains state information of all the notebooks attached to the cluster.

Enable autoscaling is used to specify the minimum and maximum worker nodes. If autoscaling is unchecked, a fixed number of worker nodes is established.

In Single-Node, the driver node functions as the sole worker node, handling all tasks. To select the node type, click on the node type and choose the desired type from the drop-down menu.

Enable autoscaling local storage involves Databricks continuously monitoring the free disk space on the Spark workers within your cluster.

To reduce the cost we have to define the termination duration.

Compute Dashboard

Here, you can manage the cluster by stopping or restarting the compute, editing permissions, checking event logs, adding libraries, reviewing driver logs, and more.


Conclusion

In conclusion, Databricks Compute offers an infrastructure supporting various tasks, including interactive notebooks, automated jobs, and SQL commands. With options like all-purpose compute, job compute, SQL warehouses, and instance pools. Setting up the first all-purpose compute through the Databricks UI involves configuring policies, node types, runtime versions, and other specifications, providing flexibility and scalability for diverse workloads. The Compute Dashboard facilitates easy management, making Databricks a powerful platform for data analysis and processing.


Related Article

Databricks Job Workflow

time, portal, time machine-2034990.jpg

Delta Time Travel

Delta Sharing

Picture of Manish Mishra

Manish Mishra

Manish Mishra is a Software Consultant with a focus on Scala, Apache Spark, and Databricks. My proficiency extends to using the Great Expectations tool for ensuring robust data quality. I am passionate about leveraging cutting-edge technologies to solve complex challenges in the dynamic field of data engineering.

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

Scroll to Top