编写hana2hive的spark程序操作步骤

  • 1.创建idea maven程序
  • 2.创建文件结构
  • 3.随便创建一个以 .scala结尾的文件,会弹出设置scala_jdk,设置scala_2.12.12 jdk
  • 4.新建scala object程序
  • 5.修改pom.xml文件,模板如下 part01
    <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <scala.version>2.11.12</scala.version>
    <scala.binary.version>2.11</scala.binary.version>
    <maven.compiler.source>17</maven.compiler.source>
    <maven.compiler.target>17</maven.compiler.target>
    </properties>
    <dependencies>
    <!– https://mvnrepository.com/artifact/com.fasterxml.jackson.module/jackson-module-scala –> <dependency>
    <groupId>com.fasterxml.jackson.module</groupId>
    <artifactId>jackson-module-scala_2.11</artifactId>
    <version>2.12.7</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.json4s/json4s-jackson –> <dependency>
    <groupId>org.json4s</groupId>
    <artifactId>json4s-jackson_2.11</artifactId>
    <version>3.5.4</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.scala-lang.modules/scala-parser-combinators –> <dependency>
    <groupId>org.scala-lang.modules</groupId>
    <artifactId>scala-parser-combinators_2.11</artifactId>
    <version>1.1.0</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml –> <dependency>
    <groupId>org.scala-lang.modules</groupId>
    <artifactId>scala-xml_2.11</artifactId>
    <version>1.1.0</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.apache.hadoop.thirdparty/hadoop-shaded-protobuf_3_7 –> <dependency>
    <groupId>org.apache.hadoop.thirdparty</groupId>
    <artifactId>hadoop-shaded-protobuf_3_7</artifactId>
    <version>1.1.1</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.apache.orc/orc-core –> <dependency>
    <groupId>org.apache.orc</groupId>
    <artifactId>orc-core</artifactId>
    <version>1.7.1</version>
    </dependency>
  • pom.xml文件,模板 part02
    <!– https://mvnrepository.com/artifact/com.sap.cloud.db.jdbc/ngdbc –><dependency>
    <groupId>com.sap.cloud.db.jdbc</groupId>
    <artifactId>ngdbc</artifactId>
    <version>2.9.12</version>
    <type>pom</type>
    </dependency>
    <dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-common</artifactId>
    <version>1.8.2</version>
    </dependency>
    <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-library</artifactId>
    <version>2.11.12</version>
    </dependency>
    <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-compiler</artifactId>
    <version>2.11.12</version>
    </dependency>
    <dependency>
    <groupId>org.scala-lang</groupId>
    <artifactId>scala-reflect</artifactId>
    <version>2.11.12</version>
    </dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.4.0</version>
    </dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.4.0</version>
    </dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-yarn_2.11</artifactId>
    <version>2.4.0</version>
    </dependency>
  • pom.xml文件,模板 part03
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.4.0</version>
    </dependency>
    <!– https://mvnrepository.com/artifact/org.wltea/analyzer –> <dependency>
    <groupId>com.janeluo</groupId>
    <artifactId>ikanalyzer</artifactId>
    <version>2012_u6</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.4.6</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.4.6</version>
    </dependency>
    <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-streaming_2.11</artifactId>
    <version>2.4.0</version>
    </dependency>
    <dependency>
    <groupId>log4j</groupId>
    <artifactId>log4j</artifactId>
    <version>1.2.15</version>
    <exclusions>
    <exclusion>
    <groupId>javax.jms</groupId>
    <artifactId>jms</artifactId>
    </exclusion>
    <exclusion>
    <groupId>com.sun.jdmk</groupId>
    <artifactId>jmxtools</artifactId>
    </exclusion>
    <exclusion>
    <groupId>com.sun.jmx</groupId>
    <artifactId>jmxri</artifactId>
    </exclusion>
    </exclusions>
    </dependency>
    </dependencies>
  • pom.xml文件,模板 part04
    <build>
    <plugins>
    <plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.2.2</version>
    <executions>
    <execution>
    <!– 声明绑定到 maven 的 compile 阶段 –> <goals>
    <goal>compile</goal>
    <goal>testCompile</goal>
    </goals>
    <configuration>
    <args>
    <arg>-dependencyfile</arg>
    <arg>${project.build.directory}/.scala_depencencies</arg>
    <arg>-nobootcp</arg>
    </args>
    </configuration>
    </execution>
    </executions>
    <configuration>
    <displayCmd>true</displayCmd>
    <jvmArgs>
    <jvmArg>-Xss20m</jvmArg>
    </jvmArgs>
    </configuration>
    </plugin>

  • pom.xml文件,模板 part05
    <plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-shade-plugin</artifactId>
    <version>3.2.1</version>
    <executions>
    <execution>
    <phase>package</phase>
    <goals>
    <goal>shade</goal>
    </goals>
    <configuration>
    <filters>
    <filter>
    <artifact>*:*</artifact>
    <excludes>
    <exclude>META-INF/*.SF</exclude>
    <exclude>META-INF/*.DSA</exclude>
    <exclude>META-INF/*.RSA</exclude>
    </excludes>
    </filter>
    </filters>
    <transformers>
    <transformer implementation=”org.apache.maven.plugins.shade.resource.AppendingTransformer”>
    <resource>reference.conf</resource>
    </transformer>
    <!–指定main方法–> <transformer implementation=”org.apache.maven.plugins.shade.resource.ManifestResourceTransformer”>
    <mainClass>com.fuda.bigdata.hana2hive.hana2hive_a017</mainClass>
    </transformer>
    </transformers>
    </configuration>
    </execution>
    </executions>
    </plugin>
    </plugins>
    </build>
    </project>
  • 6.报错
    • 将maven从3.8.1版本降到3.6.3版本
  • 7.将如下四个配置文件从集群复制粘贴出来,放进程序resource文件夹下,还有kerberos认证的两个文件
  • 示例代码
    package com.fuda.bigdata.hana2hiveimport org.apache.hadoop.security.UserGroupInformationimport org.apache.spark.SparkConfimport org.apache.spark.sql.SparkSessionobject hana2hive {
    def main(args:Array[String]) : Unit = {
    System.setProperty(“HADOOP_USER_NAME”,”root”)
    System.setProperty(“java.security.krb5.conf”,”E:\\desktop\\Desktop\\spark_hana2hive\\src\\main\\resources\\krb5.conf”)
    System.setProperty(“user.name“,”root”)
    UserGroupInformation.loginUserFromKeytab(“hive/hive@HADOOP.COM”,”E:\\desktop\\Desktop\\spark_hana2hive\\src\\main\\resources\\dolphinscheduler.keytab”)
    System.out.println(UserGroupInformation.getLoginUser())
    //1.TODO 创建SparkSQL运行环境 val sparkConf = new SparkConf()
    val spark = SparkSession.builder()
    .appName(“hana2hive”)
    .master(“local[*]”)
    .config(“spark.sql.hive.hiveserver2.jdbc.url.principal”,”hive/hive@HADOOP.COM”)
    .config(“spark.hadoop.security.authentication”,”kerberos”)
    .config(“spark.hadoop.kerberos.realm”,”HADOOP.COM“)
    .config(“spark.hadoop.security.authorization”,”true”)
    .enableHiveSupport()
    .getOrCreate()
    spark.sql(“show databases”).show()
    spark.close()
    }
    }
  • 本地电脑顺利通过kerberos安全认证,顺利访问到hive数据库的数据
  • 8.报错Exception in thread “main” java.lang.ClassNotFoundException: com.sap.db.jdbc.Driver
    • 在spark代码的pom文件中加上hana的jdbc jar包
  • 打通sap hana和hive数据库的示例代码
    package com.fuda.bigdata.hana2hiveimport org.apache.hadoop.security.UserGroupInformationimport org.apache.spark.SparkConfimport org.apache.spark.sql.SparkSessionobject hana2hive {
    def main(args:Array[String]) : Unit = {
    System.setProperty(“HADOOP_USER_NAME”,”root”)
    System.setProperty(“java.security.krb5.conf”,”E:\\desktop\\Desktop\\spark_hana2hive\\src\\main\\resources\\krb5.conf”)
    System.setProperty(“user.name“,”root”)
    UserGroupInformation.loginUserFromKeytab(“hive/hive@HADOOP.COM”,”E:\\desktop\\Desktop\\spark_hana2hive\\src\\main\\resources\\dolphinscheduler.keytab”)
    System.out.println(UserGroupInformation.getLoginUser())
    //1.TODO 创建SparkSQL运行环境,连接上开启了安全认证的hive数据库 val sparkConf = new SparkConf()
    val spark = SparkSession.builder()
    .appName(“hana2hive”)
    .master(“local[*]”)
    .config(“spark.sql.hive.hiveserver2.jdbc.url.principal”,”hive/hive@HADOOP.COM”)
    .config(“spark.hadoop.security.authentication”,”kerberos”)
    .config(“spark.hadoop.kerberos.realm”,”HADOOP.COM“)
    .config(“spark.hadoop.security.authorization”,”true”)
    .enableHiveSupport()
    .getOrCreate()
    spark.sql(“show databases”).show()
    //2.连接 SAP HANA数据库 val hanaUrl = “jdbc:sap://192.168.1.50:30015” val hanaUser = “Z_FR” val hanaPassword = “MsE7dpAEA0k9PbjN” val hanaTable = “SAPABAP1.KONP” val hanaOptions = Map(
    “url” -> hanaUrl,
    “user” -> hanaUser,
    “password” -> hanaPassword,
    “driver” -> “com.sap.db.jdbc.Driver”,
    “fetchsize” -> “10000” )
    val hanaDF = spark.read.format(“jdbc”).options(hanaOptions).option(“dbtable”,hanaTable).load()
    hanaDF.show()
    spark.close()
    }
    }
  • 后续操作
    • 1.在hive中建表,对数据进行装载代码配置
    • 2.在spark代码中修改运行模式为yarn
    • 3.打包上传到服务器,配置shell命令进行定时调度

作者 admin

张宴银,大数据开发工程师

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注