• 1.首先把hdfs-site.xml,core-site.xml,hive-site.xml这三个配置文件放到/mnt/sdg/software/datax/conf/ 和 /mnt/sdg/software/datax/plugin/reader/hdfsreader/ 中
    • sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/hdfs-site.xml /mnt/sdg/software/datax/conf/
    • sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/core-site.xml /mnt/sdg/software/datax/conf/
    • sudo cp /opt/cloudera/parcels/CDH/lib/hive/conf/hive-site.xml /mnt/sdg/software/datax/conf/
    • sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/hdfs-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
    • sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/core-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
    • sudo cp /opt/cloudera/parcels/CDH/lib/hive/conf/hive-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
    • sudo chmod -R 755 hdfs-site.xml
    • sudo chmod -R 755 hive-site.xml
    • sudo chmod -R 755 core-site.xml
  • 2.其次配置hdfsreader.json文件
    {
    “job”: {
    “setting”: {
    “speed”: {
    “channel”: “1”
    }
    },
    “content”: [
    {
    “reader”: {
    “name”: “hdfsreader”,
    “parameter”: {
    “defaultFS”: “hdfs://nameservice1”,
    “path”: “/user/hive/warehouse/student/*”,
    “fileType”: “text”,
    “column”: [“*”],
    “encoding”: “UTF-8”,
    “fieldDelimiter”: “,”,
    “hadoopConfig”:{
    “dfs.nameservices”: “nameservice1”,
    “dfs.ha.namenodes.nameservice”: “namenode69,namenode80”,
    “dfs.namenode.rpc-address.nameservice.namenode69”: “hadoopm01:8020”,
    “dfs.namenode.rpc-address.nameservice.namenode80”: “hadoopm02:8020”,
    “dfs.client.failover.proxy.provider.nameservice”: “org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”,
    “dfs.client.failover.proxy.provider.nameservice.configured.providers”: “namenode69,namenode80”,
    “dfs.client.failover.proxy.provider.namenode69”: “org.apache.hadoop.hdfs.server.namenode.ha.KerberosFailoverProxyProvider”,
    “dfs.client.failover.proxy.provider.namenode80”: “org.apache.hadoop.hdfs.server.namenode.ha.KerberosFailoverProxyProvider”
    },
    “haveKerberos”: “true”,
    “kerberosKeytabFilePath”: “/opt/kerberos_keytab/dolphinscheduler.keytab”,
    “kerberosPrincipal”: “hdfs/hdfs@HADOOP.COM”
    }
    },
    “writer”: {
    “name”: “streamwriter”,
    “parameter”: {
    “print”: true
    }
    }
    }
    ]
    }
    }
  • 3.直接运行datax脚本会报错,无法识别到hdfs高可用配置
    • python /mnt/sdg/software/datax/bin/datax.py /mnt/sdg/software/datax/job/mysql2hdfs_test.json
  • 4.进入到hdfsreader文件夹中运行
    • cd /mnt/sdg/software/datax/plugin/reader/hdfsreader
    • python /mnt/sdg/software/datax/bin/datax.py /mnt/sdg/software/datax/job/mysql2hdfs_test.json
      • 在hdfsreader文件夹中运行才可以正常运行,其他路径均报错
      • 准确的说,是在有core-site.xml,hdfs-site.xml,hive-site.xml文件的路径下才可以正常运行
      • 将mysql-connector包放到hdfsreader/lib/路径下依旧存在上述问题
      • 是何原因引起?应该如何处理?会影响到dolphinscheduler对datax采集任务的调度吗?
  • mysql2hdfs采集时使用sql语句,需要获取到采集当前系统时间。根据当前系统时间进行ods分区操作

作者 admin

张宴银,大数据开发工程师

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注