Integrating Spark with Spring Boot
Source: https://relishcode.com
Off late, I have started relying more and more on Spring Boot for increased productivity. It is much faster to get the boiler plate stuff out of the way using Spring Boot.
For example:
- Automatic configuration for application dependencies like Spring REST, JPA
- Starter dependencies that save lot of time while setting up project
- The application.properties which I leverage to externalize my application configuration with default values.
- Executable jars with embedded container out of the box, which save lot of time for prototypes.
For one of my project, I needed to use Apache Spark and started missing Spring Boot from day one. It took me some time to get both of these working together and felt its worth capturing in a blog.
Problem # 1
If you simply include Spark and Spring Boot dependencies in the same project as shown below:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
</dependency>
You will see following error:
java.lang.IllegalStateException: Detected both log4j-over-slf4j.jar AND bound slf4j-log4j12.jar on the class path, preempting StackOverflowError
Root Cause
This happens because both Spark and Spring Boot package logging libraries which causes this conflict.
Solution
You need to remove the logging library from either of them. In my case, since I need to use Spark binaries present on the cluster, I had to remove logging from Spring Boot. Here is my modified Spring Boot dependency:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-logging</artifactId>
</exclusion>
</exclusions>
</dependency>
Problem #2
Now, if you run your application, chances are you won’t see any error but still your application won’t be initialized. For example, here is my sample code:
@SpringBootApplication
public class SpringSampleApplication implements CommandLineRunner {
public static void main(String[] args) {
SpringApplication.run(SpringSampleApplication.class, args);
}
@Override
public void run(String... args) throws Exception {
SparkSession sparkSession = SparkSession
.builder()
.appName("SparkWithSpring")
.master("local")
.getOrCreate();
System.out.println("Spark Version: " + sparkSession.version());
}
}
When I run this, It does not print Spark version.
Root cause:
Since, we have removed logging from Spring Boot, we are now relying on Spark logging. Though, there is one more problem but we are not seeing Spring Boot errors on the console.
Solution:
Add log4j.properties to src/main/resources, as shown below:
log4j.rootLogger=INFO, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
logrj.appender.console.Target=System.out
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
Problem #3
Now, you should see below error while running your application:
***************************
APPLICATION FAILED TO START
***************************
Description:
The Bean Validation API is on the classpath but no implementation could be found
Action:
Add an implementation, such as Hibernate Validator, to the classpath
Root cause:
Spark packages bean validation jar which Spring Boot is trying to auto configure.
Solution:
Add bean validation dependency as shown below:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-validation</artifactId>
</dependency>
Now, when you run your application, it should be able to initialize Spring Boot and Spark Session together. In my case, it prints the Spark Version as expected along with other bootstrap messages from Spark:
. ____ _ __ _ _
/\\ / ___'_ __ _ _(_)_ __ __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
\\/ ___)| |_)| | | | | || (_| | ) ) ) )
' |____| .__|_| |_|_| |_\__, | / / / /
=========|_|==============|___/=/_/_/_/
:: Spring Boot :: (v0.0.1-SNAPSHOT)
?Spark ?Version: 2.1.0
Though, typically programmers use Spark with Scala, however if you end up using Java and need to leverage Spring Boot, this article should get you going.
Thanks for reading!!!!
Source: https://relishcode.com
Software Engineering and Artificial Intelligence Professional
2 年This good Idea!
Hello Neeraj, I am working integration of Springboot Scala application. When working locally with all the setup, sqlContext.sql works perfectly, but when i try to communicate spark sql hosted on remote server, sqlContext.sql keeps on waiting and doesn't return anything. Can you help me sort out this issue?
Senior Consultant
3 年Hii Neeraj Malhotra i have integrated spark dependency in my pom.xml of springboot , Application is working fine but the issye is i am not able to see springboot logs , in console spark logs are taking control and spring logs does not come in console. need your help
Software Developer at Accenture
3 年Hi Neeraj, I am new to spark with spring-boot. I have created a spring-boot application and trying to establish connection to hive using spark-session. i am running my spring-boot application using 'mvn spring-boot:run'. But i am not able to connect to hive. When i run my java class, only having code related to hive connection using sparksession. I am able to establish connection using spark-submit. How do i run my spring-boot application so that i can establish connection to hive using sparksession