Spring Boot Batch vs Scheduler Best Practices

In backend engineering, a common anti-pattern involves implementing heavy data processing logic directly within a simplified scheduling method. While tools like Linux Crontab or Spring's @Scheduled annotation are excellent for time-based triggering, they lack the transactional resilience required for high-volume data operations. This article analyzes the architectural distinction between "Scheduling" and "Batch Processing" and demonstrates how to decouple them effectively in a Spring Boot environment.

1. Architectural Distinction: Trigger vs. Workload

The confusion often arises because both concepts involve "doing something at a specific time." However, from a system design perspective, their responsibilities are orthogonal.

Feature Scheduling (@Scheduled) Batch Processing (Spring Batch)
Primary Role Triggering execution at a point in time Processing large volumes of data (ETL)
State Management Stateless (Fire and Forget) Stateful (Maintains execution context)
Transaction Single massive transaction (usually) Chunk-based micro-transactions
Failure Handling Try-catch blocks required manually Built-in Retry, Skip, and Restartability

Scheduling is strictly about timing. It answers the question, "When should this start?" Batch processing is about execution. It answers, "How do we process 10 million rows without crashing the heap?"

2. The Pitfall of Pure Scheduling

Using @Scheduled for business logic creates critical bottlenecks. Consider a legacy system scenario where a daily task calculates interest for all users.

Anti-Pattern Warning: Do not load entire datasets into memory within a scheduled task. This leads to OutOfMemoryError and long database locks.
@Component
public class LegacyInterestScheduler {

    @Autowired
    private UserRepository userRepository;

    // Bad Practice: Mixing Scheduling and Processing
    @Scheduled(cron = "0 0 0 * * *")
    public void calculateInterest() {
        // 1. Loading all data causes OOM
        List<User> users = userRepository.findAll(); 
        
        for (User user : users) {
            try {
                user.calculateInterest();
                userRepository.save(user);
            } catch (Exception e) {
                // 2. Logging is the only recovery mechanism
                log.error("Failed for user: " + user.getId(), e);
            }
        }
        // 3. If the server restarts mid-process, we lose track of progress
    }
}

The code above has three major flaws:

  1. Memory Consumption: findAll() fetches all rows. If the user base grows to 1 million, the application crashes.
  2. Transaction Size: If @Transactional is applied at the method level, a rollback affects all 1 million records. If not, partial failures leave the DB in an inconsistent state.
  3. Restartability: If the server crashes at record #50,000, restarting the job implies reprocessing the first 50,000 records, potentially duplicating financial transactions.

3. Implementing Robust Batch Architecture

Spring Batch introduces the concept of Chunk-Oriented Processing. Instead of reading everything at once, it reads, processes, and writes in configurable chunks (e.g., 1,000 records at a time). This ensures that transactions are committed periodically, keeping memory usage stable.

Job Configuration

The following configuration defines a Job that handles the same logic but with architectural stability.

@Configuration
public class InterestBatchConfig {

    @Bean
    public Job interestJob(JobRepository jobRepository, Step interestStep) {
        return new JobBuilder("interestJob", jobRepository)
                .start(interestStep)
                .build();
    }

    @Bean
    public Step interestStep(JobRepository jobRepository, 
                             PlatformTransactionManager transactionManager,
                             ItemReader<User> reader,
                             ItemProcessor<User, User> processor,
                             ItemWriter<User> writer) {
        return new StepBuilder("interestStep", jobRepository)
                .<User, User>chunk(1000, transactionManager) // Commit every 1000 items
                .reader(reader)
                .processor(processor)
                .writer(writer)
                .faultTolerant()
                .retryLimit(3) // Auto-retry on failure
                .retry(DeadlockLoserDataAccessException.class)
                .build();
    }
}
Architecture Note: The JobRepository persists the state of the job (Started, Completed, Failed) in the database. This allows manual intervention or automatic restarts from the point of failure.

4. Coupling Scheduler with Batch Job

Finally, we use the Scheduler solely as a trigger. The JobLauncher executes the predefined Batch Job. This separation allows you to run the batch manually via API or CLI without modifying the scheduling logic.

@Component
@RequiredArgsConstructor
public class BatchScheduler {

    private final JobLauncher jobLauncher;
    private final Job interestJob;

    @Scheduled(cron = "0 0 0 * * *")
    public void runInterestJob() {
        try {
            JobParameters jobParameters = new JobParametersBuilder()
                    .addLong("timestamp", System.currentTimeMillis()) // Unique ID per run
                    .toJobParameters();
            
            jobLauncher.run(interestJob, jobParameters);
        } catch (Exception e) {
            // Detailed handling is managed by Spring Batch tables
            log.error("Job launch failed", e);
        }
    }
}

By injecting System.currentTimeMillis() as a JobParameter, we ensure each execution is treated as a unique instance by Spring Batch. If we wanted to resume a failed job, we would use the same parameters, and Spring Batch would intelligently skip the already completed chunks.

Conclusion and Trade-offs

Adopting Spring Batch adds complexity. You must maintain meta-data tables (BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, etc.) and understand the framework's lifecycle. However, for enterprise applications handling critical data, the trade-off is justified. Simplicity in code (using only @Scheduled) often leads to complexity in operations (debugging logs, manual data fixes). Use Scheduling strictly for timing, and delegate the heavy lifting to Batch.

1 comment

  1. Great comparison between batch and scheduling jobs! The examples make the differences very clear. Thanks for sharing.

    ReplyDelete