

Replace Add a name for your job… with your job name.Įnter a name for the task in the Task name field. Performs tasks in parallel to persist the features and train a machine learning model. Ingests order data and joins it with the sessionized clickstream data to create a prepared data set for analysis.Įxtracts features from the prepared data. Ingests raw clickstream data and performs processing to sessionize the records. The following diagram illustrates a workflow that: You can configure tasks to run in sequence or parallel. You control the execution order of tasks by specifying dependencies between the tasks. Legacy Spark Submit applications are also supported. You can implement a task in a JAR, a Databricks notebook, a Delta Live Tables pipeline, or an application written in Scala, Java, or Python. You can run your jobs immediately, periodically through an easy-to-use scheduling system, whenever new files arrive in an external location, or continuously to ensure an instance of the job is always running. Databricks manages the task orchestration, cluster management, monitoring, and error reporting for all of your jobs. Your job can consist of a single task or can be a large, multi-task workflow with complex dependencies. For the other methods, see Jobs CLI and Jobs API 2.1. This article focuses on performing job tasks using the UI.

You can monitor job run results using the UI, CLI, API, and notifications (for example, email, webhook destination, or Slack notifications). You can repair and re-run a failed or canceled job using the UI or API. You can create and run a job using the UI, the CLI, or by invoking the Jobs API. You can also run jobs interactively in the notebook UI. For example, you can run an extract, transform, and load (ETL) workload interactively or on a schedule. A job is a way to run non-interactive code in a Databricks cluster.
