Saturday, December 17, 2022

Batch Apex

In this article, we will take a look at Batch Apex’s place in asynchronous Apex, compare them to other methods, and provide tips and tricks to help you implement it.

Asynchronous Apex

To better understand Batch Apex and its functions, we first need to explore how it compares to its counterparts (shown in the table below). Let’s think of asynchronous jobs in Salesforce as actions running in the background. 

Asynchronous Apex may be used when you have an automated task for processing a number of records. For example, you should integrate with a REST API and send a large number of records daily to an external system. For this use case, while it may seem deceptively simple, you also need to log the responses returned from this system. To do this, we need to know how to:

  1. Schedule
  2. Optimize the amount of records processed
  3. Commit the responses

We will get to that in a second. But first, check out this overview of asynchronous Apex methods.

Did you accidentally run an Apex batch job? Thankfully, you can abort running processes by going to: Setup > Environments > Apex Jobs and clicking Abort.

If this is for a future task, you can find those in Setup > Environments > Scheduled Jobs

TypeOverviewCommon Scenarios
Future MethodsRun in their own thread, and do not start until resources are available.Web service callout.
Batch ApexRun large jobs that would exceed normal processing limits.Data cleansing or archiving of records.
Queueable ApexSimilar to future methods, but provide additional job chaining and allow more complex data types to be used.Performing sequential processing operations with external Web services.
Scheduled ApexSchedule Apex to run at a specified time.Daily or weekly tasks.

Essentially, you use asynchronous Apex for increased governor limits and you can execute these actions separately from other tasks inside your org. You can also schedule them, aside from future methods and queueable Apex, by using the Schedulable interface.

Note: Future methods are rarely used nowadays; it’s usually something you see in older orgs and would typically need to refactor for new requirements. Something else to note: You cannot pass sObjects as arguments, parameters must be primitive data types, and they do not need an interface to be implemented. They also have different governor limits, as explained here.

What is Batch Apex?

Batch Apex facilitates the asynchronous processing of records in multiple chunks and should be a straightforward implementation for any developer.

The difference between queueable and Batch Apex (which all belong to asynchronous Apex), is that you would use Batch Apex whenever you are processing a larger number of records compared to queueable. 

Batch Apex jobs are limited to five tasks running simultaneously, whereas queueable jobs can run up to 100! Queueable jobs can also be connected and call one another without any limits. And, with the recent updates, they have transaction finalizers similar to the Batch Apex’s finish method.

Keeping these in mind, let’s say there are more than 50,000 records you need to process. In this case, you would use Batch Apex. It runs in default size of 200 records per chunk, and all the batches will have their own governor limits for processing.

Each batch job must implement three methods: start, execute, and finish. The bulk of the code should look like this:

global database.querylocator start(Database.BatchableContext BC)
{
    //this is where you get records for processing, 
or access to a list of records that is           
//already been passed and available to the Batch class
}
 
global void execute(Database.BatchableContext BC, scope)
 {
    //this is where you should be processing the records,
 making callouts to external systems

}
 
global void finish(Database.BatchableContext BC)
 {
      
    //finish method is where you do your DML and commit your processed information to Salesforce, this is for creating log records for your integration, and updating processed records to the database and chaining another batch class or even schedule one.
}

Start method is where all the necessary information for processing is gathered. What does this mean? Here, you can either pass records to this approach or query your records to be processed.

There’s a critical step though – the start method runs on a single-action context, whereas the execute method will have a separate default context for every 200 records. So, you should always use the execute method for heavy processing. Just because you’re using Batch Apex does not ensure you get instantly higher governor limits!

We’ll use the API integration use case from earlier. Let’s say we need to commit the responses to the database; if this is completed somewhere other than the execute method, you will only have one execution context. Here, you’ll get separate contexts for every 200 records in the execute method.

It’s crucial to note that when conducting integrations using Batch Apex, you should include Database.AllowsCallouts to the class definition:

public class SearchAndReplace implements Database.Batchable<sObject>, 
   Database.AllowsCallouts{
}

Have you followed this step but still getting errors? Maybe you’re trying to make an external service callout after completing your DML. By doing this, you will see the following error:

“You have uncommitted work pending. Please commit or rollback before calling out.”

The error occurs if you use a DML and make a callout afterwards. This is why callouts must be done in execute, and commit responses within the finish method. It’s also possible to conduct callouts after the initial 200 records in the batch job because they all run in different contexts.

Finish method is where post-processing is completed. Here, you may update logs and records and keep them in the database. Should you ever want to send an admin an email to see the records being successfully processed and the errors occurring, it would be done during the finish method.

Using State in Batch Apex

The most important thing you should keep in mind is that each chunk of Apex batches (whether it’s one or 200), brings its own execution context and refreshed limits for processing an increased number of records in your system.

Sometimes you may need to store information generated from a business process or from the amount of records processed. If each execution brings a new context and a new state, what would you need to do to keep this state? This is where Database.Stateful comes in. Check out this great explanation by legendary user, sfdcfox.

Using the Database.Stateful interface is optional but also useful when making calculations and counting the records being processed. One thing to note about Database.Stateful – it causes performance drops because the class will be serialized at the end of each execution method to keep its state. This will increase the processing time for your Batch Apex, so use it wisely!

The following example from this documentation summarizes a custom field total__c as the records are processed:

public class SummarizeAccountTotal implements 
    Database.Batchable<sObject>, Database.Stateful{

   public final String Query;
   public integer Summary;
  
   public SummarizeAccountTotal(String q){Query=q;
     Summary = 0;
   }

   public Database.QueryLocator start(Database.BatchableContext BC){
      return Database.getQueryLocator(query);
   }
   
   public void execute(
                Database.BatchableContext BC, 
                List<sObject> scope){
      for(sObject s : scope){
         Summary = Integer.valueOf(s.get('total__c'))+Summary;
      }
   }

public void finish(Database.BatchableContext BC){
   }
}

If you want to keep data from the previous Execute method, or the Finish method needs any data or information from the previous Execute, you can use Database.Stateful. However, I recommend avoiding this for the reasons explained above, unless used for these specific use cases. 

Testing Batch Apex

Something to watch out for when testing Batch Apex is using Test.startTest() and Test.stopTest() methods – these will create the necessary separate execution context so your batch can run successfully. Also, make sure that the number of records inserted is less than or equal to the batch size of 200, because test methods can execute only one batch. You must ensure that the Iterable returned by the start method matches the batch size.

@isTest static void test() {

        Test.startTest();

        runTestBatch batch1 = new runTestBatch();
        Id batchId = Database.executeBatch(batch1);

        Test.stopTest();
	
     // System.assert logic afterwards
    }
}

Best Practices

As with future methods, there are a few things you should keep in mind when using Batch Apex:

  • To ensure fast execution of batch jobs, minimize Web service callout times and tune queries used in your Batch Apex code.
  • The longer the batch job executes, the more likely other queued jobs are delayed.
  • Only use Batch Apex if you have more than one batch of records. If you don’t have enough records to run more than one batch, use queueable Apex.
  • Limit the number of asynchronous requests to minimize the chance of delays.
  • Use extreme care if you are planning to invoke a batch job from a trigger. Check that the trigger won’t exceed the limit for batch jobs.
  • A maximum of 50 million records can be returned in the QueryLocator object. If more than 50 million records are returned, the batch job is immediately terminated and marked as Failed.
  • If the start method of the batch class returns a QueryLocator, the optional scope parameter of executeBatch can have a maximum value of 2,000.
  • If the start method of the batch class returns an iterable, the scope parameter value has no upper limit.
  • Start, execute, and finish methods can implement up to 100 callouts each. Implement AllowsCallouts to enable callouts from the Batch Apex.
  • Methods marked as future can’t be called from a Batch Apex class.
  • All methods in the class must be defined as global or public.

Optimize the Implementation

  • Batch Apex jobs run faster when the start method returns a QueryLocator object that doesn’t include related records via a subquery. Avoiding relationship subqueries in a QueryLocator allows batch jobs to run using a quicker, chunked implementation.
  • To ensure fast execution of batch jobs, minimize Web service callout times and tune queries used in your Batch Apex code.
  • For each 10,000 AsyncApexJobrecords, Apex creates an AsyncApexJob record of type BatchApexWorker for internal use. When querying for all AsyncApexJob records, we recommend that you filter out records of type BatchApexWorker using the JobType.
  • For a sharing recalculation, we recommend that the execute method is deleted. Then re-create all Apex managed sharing for the records in the batch. This process ensures that sharing is accurate and complete.
  • Use the Test methods startTest and stopTest around the executeBatch method to ensure that it finishes before continuing your test.

Summary

Batch Apex is the bread and butter for every Salesforce Developer. You’re also still working within the limits of Salesforce platform, which is a unique challenge if you need to process millions of records. Luckily, it’s Batch Apex to the rescue! Remember the tips I gave you, and also keep general best practices such as bulkification in mind.

READ MORE: 12 Salesforce Apex Best Practices

No comments:

Post a Comment