跳至内容
- 1.首先把hdfs-site.xml,core-site.xml,hive-site.xml这三个配置文件放到/mnt/sdg/software/datax/conf/ 和 /mnt/sdg/software/datax/plugin/reader/hdfsreader/ 中
- sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/hdfs-site.xml /mnt/sdg/software/datax/conf/
- sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/core-site.xml /mnt/sdg/software/datax/conf/
- sudo cp /opt/cloudera/parcels/CDH/lib/hive/conf/hive-site.xml /mnt/sdg/software/datax/conf/
- sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/hdfs-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
- sudo cp /opt/cloudera/parcels/CDH/lib/hadoop/etc/hadoop/core-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
- sudo cp /opt/cloudera/parcels/CDH/lib/hive/conf/hive-site.xml /mnt/sdg/software/datax/plugin/reader/hdfsreader/
- sudo chmod -R 755 hdfs-site.xml
- sudo chmod -R 755 hive-site.xml
- sudo chmod -R 755 core-site.xml
- 2.其次配置hdfsreader.json文件
{
“job”: {
“setting”: {
“speed”: {
“channel”: “1”
}
},
“content”: [
{
“reader”: {
“name”: “hdfsreader”,
“parameter”: {
“defaultFS”: “hdfs://nameservice1”,
“path”: “/user/hive/warehouse/student/*”,
“fileType”: “text”,
“column”: [“*”],
“encoding”: “UTF-8”,
“fieldDelimiter”: “,”,
“hadoopConfig”:{
“dfs.nameservices”: “nameservice1”,
“dfs.ha.namenodes.nameservice”: “namenode69,namenode80”,
“dfs.namenode.rpc-address.nameservice.namenode69”: “hadoopm01:8020”,
“dfs.namenode.rpc-address.nameservice.namenode80”: “hadoopm02:8020”,
“dfs.client.failover.proxy.provider.nameservice”: “org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider”,
“dfs.client.failover.proxy.provider.nameservice.configured.providers”: “namenode69,namenode80”,
“dfs.client.failover.proxy.provider.namenode69”: “org.apache.hadoop.hdfs.server.namenode.ha.KerberosFailoverProxyProvider”,
“dfs.client.failover.proxy.provider.namenode80”: “org.apache.hadoop.hdfs.server.namenode.ha.KerberosFailoverProxyProvider”
},
“haveKerberos”: “true”,
“kerberosKeytabFilePath”: “/opt/kerberos_keytab/dolphinscheduler.keytab”,
“kerberosPrincipal”: “hdfs/hdfs@HADOOP.COM”
}
},
“writer”: {
“name”: “streamwriter”,
“parameter”: {
“print”: true
}
}
}
]
}
}
- 3.直接运行datax脚本会报错,无法识别到hdfs高可用配置
- python /mnt/sdg/software/datax/bin/datax.py /mnt/sdg/software/datax/job/mysql2hdfs_test.json

- 4.进入到hdfsreader文件夹中运行
- cd /mnt/sdg/software/datax/plugin/reader/hdfsreader
- python /mnt/sdg/software/datax/bin/datax.py /mnt/sdg/software/datax/job/mysql2hdfs_test.json
- 在hdfsreader文件夹中运行才可以正常运行,其他路径均报错

- 准确的说,是在有core-site.xml,hdfs-site.xml,hive-site.xml文件的路径下才可以正常运行
- 将mysql-connector包放到hdfsreader/lib/路径下依旧存在上述问题
- 是何原因引起?应该如何处理?会影响到dolphinscheduler对datax采集任务的调度吗?
- mysql2hdfs采集时使用sql语句,需要获取到采集当前系统时间。根据当前系统时间进行ods分区操作