周三. 6 月 18th, 2025

hive 学习笔记

Hive 常见操作

作者张, 宴银

7 月 22, 2024

hive模糊搜索表：show tables like ‘*name*’;
查看表结构信息：desc table_name;
查看分区信息：show partitions table_name;
加载本地文件：load data local inpath ‘/xxx/test.txt’ overwrite into table dm.table_name;
从查询语句给table插入数据：insert overwrite table table_name partition(dt) select * from table_name;
导出数据到本地系统：insert overwrite local directory ‘/tmp/text’ select a.* from table_name a order by 1;
1. 创建表时指定的一些属性：
1. 字段分隔符：row format delimited fields terminated by ‘\t’
1. 行分隔符：row format delimited lines terminated by ‘\n’
1. 文件格式为文本型存储：stored as textfile
1. 命令行操作：hive -e ‘select table_cloum from table’执行一个查询,在终端上显示mapreduce的进度，执行完毕后，最后把查询结果输出到终端上，接着hive进程退出，不会进入交互模式
1. hive -S -e ‘select table_cloum from table’ -S，终端上的输出不会有mapreduce的进度，执行完毕，只会把查询结果输出到终端上。
hive修改表名：alter table old_table_name rename to new_table_name;
hive复制表结构：create table new_table_name like table_name;
hive添加字段：alter table table_name add columns(columns_values bigint comment ‘comm_text’);
hive修改字段：alter table table_name change old_column new_column string comment ‘comm_text’;
删除分区：alter table table_name drop partition(dt=’2021-11-30′);
添加分区：alter table table_name add partition (dt=’2021-11-30′);
删除空数据库：drop database myhive2;
强制删除数据库：drop database myhive2 cascade;
删除表：drop table score5;
清空表：truncate table score6;
向hive表中加载数据
1. 直接向分区表中插入数据：insert into table score partition(month =’202107′) values (‘001′,’002′,’100’);
1. 通过load方式加载数据：load data local inpath ‘/export/servers/hivedatas/score.csv’ overwrite into table score partition(month=’201806′);
1. 通过查询方式加载数据：insert overwrite table score2 partition(month = ‘202106’) select s_id,c_id,s_score from score1;
1. 查询语句中创建表并加载数据：create table score2 as select * from score1;
1. 在创建表是通过location指定加载数据的路径：create external table score6 (s_id string,c_id string,s_score int) row format delimited fields terminated by ‘,’ location ‘/myscore’;
1. export导出与import 导入 hive表数据（内部表操作）：
1. create table techer2 like techer; –依据已有表结构创建表
1. export table techer to ‘/export/techer’;
1. import table techer2 from ‘/export/techer’;
hive表中数据导出
1. insert导出
1. 将查询的结果导出到本地：insert overwrite local directory ‘/export/servers/exporthive’ select * from score;
1. 将查询的结果格式化导出到本地：insert overwrite local directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’ collection items terminated by ‘#’ select * from student;
1. 将查询的结果导出到HDFS上(没有local)：insert overwrite directory ‘/export/servers/exporthive’ row format delimited fields terminated by ‘\t’ collection items terminated by ‘#’ select * from score;
1. Hadoop命令导出到本地：dfs -get /export/servers/exporthive/000000_0 /export/servers/exporthive/local.txt;
1. hive shell 命令导出
1. 基本语法：（hive -f/-e 执行语句或者脚本 > file）hive -e “select * from myhive.score;” > /export/servers/exporthive/score.txt
1. hive -f export.sh > /export/servers/exporthive/score.txt
1. export导出到HDFS上：export table score to ‘/export/exporthive/score’;

3.1.4 Hive查询语句

GROUP BY 分组：select s_id ,avg(s_score) avgscore from score group by s_id having avgscore > 85; 对分组后的数据进行筛选，使用 having
join 连接：inner join 内连接；left join 左连接；right join 右链接；full join 全外链接。
order by 排序：ASC（ascend）: 升序（默认） DESC（descend）: 降序
sort by 局部排序：每个MapReduce内部进行排序，对全局结果集来说不是排序。
distribute by 分区排序：类似MR中partition，进行分区，结合sort by使用

作者张, 宴银

相关文章

业务知识学习笔记

物流知识讲堂 – 物流业务知识

5 月 13, 2025 admin

学习笔记数仓建模

中级数据建模师

2 月 8, 2025 张, 宴银

学习笔记数仓建模

初级模型设计

1 月 24, 2025 张, 宴银

发表回复取消回复

You missed

业务知识学习笔记

物流知识讲堂 – 物流业务知识

2025 年 5 月 13 日 admin

零基础学习数据治理 – 黎山

2025 年 2 月 27 日张, 宴银

AI大模型工具

2025 年 2 月 25 日张, 宴银

AI 机器学习

机器学习 – Numpy

2025 年 2 月 11 日张, 宴银