- 1.使用idea对scala spark程序进行打包时,无法正常编译出class文件,原因在于pom文件中的jdk版本设置不正确,需要设置为电脑本机的jdk版本
- 2. maven无法直接对scala程序进行打包,需要在pom文件中引入scala相关编译插件
- <build>
- <plugins>
- <plugin>
- <groupId>net.alchim31.maven</groupId>
- <artifactId>scala-maven-plugin</artifactId>
- <version>3.2.2</version>
- <executions>
- <execution>
- <!– 声明绑定到 maven 的 compile 阶段 –> <goals>
- <goal>compile</goal>
- <goal>testCompile</goal>
- </goals>
- <configuration>
- <args>
- <arg>-dependencyfile</arg>
- <arg>${project.build.directory}/.scala_depencencies</arg>
- <arg>-nobootcp</arg>
- </args>
- </configuration>
- </execution>
- </executions>
- <configuration>
- <displayCmd>true</displayCmd>
- <jvmArgs>
- <jvmArg>-Xss20m</jvmArg>
- </jvmArgs>
- </configuration>
- </plugin>
- <plugin>
- <groupId>org.apache.maven.plugins</groupId>
- <artifactId>maven-shade-plugin</artifactId>
- <version>3.2.1</version>
- <executions>
- <execution>
- <phase>package</phase>
- <goals>
- <goal>shade</goal>
- </goals>
- <configuration>
- <filters>
- <filter>
- <artifact>*:*</artifact>
- <excludes>
- <exclude>META-INF/*.SF</exclude>
- <exclude>META-INF/*.DSA</exclude>
- <exclude>META-INF/*.RSA</exclude>
- </excludes>
- </filter>
- </filters>
- <transformers>
- <transformer implementation=”org.apache.maven.plugins.shade.resource.AppendingTransformer”>
- <resource>reference.conf</resource>
- </transformer>
- <!–指定main方法–> <transformer implementation=”org.apache.maven.plugins.shade.resource.ManifestResourceTransformer”>
- <mainClass>com.fuda.bigdata.hana2hive.hana2hive_a017</mainClass>
- </transformer>
- </transformers>
- </configuration>
- </execution>
- </executions>
- </plugin>
- </plugins>
- </build>
- 3.电脑本机的scala版本需要与目标大数据集群的scala版本一致,cdh6.3.2默认的scala版本为2.11.12
- 4.待打包的代码需要是源根目录
- 5.maven版本选用3.6.3(其他版本不知道可不可以)
- 6.maven仓库中E:\maven\apache-maven-3.6.3-bin\apache-maven-3.6.3\conf 的setting.xml文件加入更新版本的maven 仓库地址
- 修改maven setting.xml
<mirror>
<id>nexus-aliyun</id>
<mirrorOf>central</mirrorOf>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</mirror>
<mirror>
<id>repo1</id>
<mirrorOf>central</mirrorOf>
<name>Human Readable Name for this Mirror.</name>
<url>http://repo1.maven.org/maven2/</url>
</mirror>
<mirror>
<id>alimaven</id>
<mirrorOf>*</mirrorOf>
<name>阿里云公共仓库</name>
<url>https://maven.aliyun.com/repository/public</url>
</mirror>
<mirror>
<id>maven-default-http-blocker</id>
<mirrorOf>external:http:*</mirrorOf>
<name>Pseudo repository to mirror external repositories initially using HTTP.</name>
<url>http://0.0.0.0/</url>
<blocked>true</blocked>
</mirror>
</mirrors> - 7.运行命令
- /opt/cloudera/parcels/CDH/bin/spark-submit –class com.fuda.bigdata.hana2hive.hana2hive_a304 –jars hdfs://nameservice1/tmp/spark_jars/ngdbc-2.9.12.jar –keytab /opt/kerberos_keytab/dolphinscheduler.keytab –principal hive/hive@HADOOP.COM –master yarn –deploy-mode cluster /tmp/Zyy/spark_hana2hive-1.0-SNAPSHOT.jar 10
- 运行前注意事项:
- 1.maven打出来的代码jar包需要上传至堡垒机中相关目录下,默认存在/tmp/Zyy/目录下
- 2.本次代码需要采集sap hana数据库的数据,需要提前下载好sap hana的jdbc jar包,上传至hdfs文件系统中,本次上传在hdfs://nameservice1/tmp/spark_jars/ngdbc-2.9.12.jar
- 保险起见,授予ngdbc-2.9.12.jar 777的访问权限
- 命令行如下
- kinit -kt /opt/kerberos_keytab/dolphinscheduler.keytab hdfs/hdfs
- hadoop fs -chmod -R 777 /tmp/spark_jars/ngdbc-2.9.12.jar
- 1.maven打出来的代码jar包需要上传至堡垒机中相关目录下,默认存在/tmp/Zyy/目录下
- 修改maven setting.xml