大数据数仓面试流程和重点面试题(Big data warehouse interview process and key interview questions)

一、自我介绍

看简历+表达能力

2-3分钟左右(学历、参加工作、爱好、特长)

二、项目

三、数据仓库

1、以数仓为中心

不要直接上来说ods、dwd、dws、ads

2、范式建模与维度建模的方式区别

3、主题划分是否合适

4、事实表与维度表的介绍

有多少张,哪些缓解进行度量

5、总结矩阵

6、变化的数据如何处理,拉链表还是?

7、不常变化的数据,如何处理

四、数据治理

1、源数据包括哪些种类

hive、spark、kafka源数据

分几类

2、指标体系的建设与维护

公司某个业务的指标

3、OLAP-ClickHouse(恶楼跑)

存在哪些问题

提升了多少速度

4、spark-数仓中的一部分

底层了解

宽窄依赖

shuffle

job

rdd

spark开发,数据处理(非结构化数据清洗转换为关系型数据)

5、开发

hive的数据倾斜

hive调优过程

小文件、OOM处理(举例子,解决的流程☆,之前的视频有讲到)

mr

多表join定位处理

开窗函数的场景和使用,考察SQL的完整程度

————————

1、 Self introduction

Resume + expression ability

About 2-3 minutes (education, work, hobbies, specialties)

2、 Project

Back

3、 Data warehouse

1. Focus on data warehouse

Don’t directly say ODS, DWD, DWS, ads

2. Differences between paradigm modeling and dimension modeling

3. Is the subject division appropriate

4. Introduction to fact table and dimension table

How many and what mitigation measures are there

5. Summary matrix

6. How to deal with changing data, zipper table or?

7. How to deal with infrequently changing data

4、 Data governance

1. What kinds of source data are included

Hive, spark, Kafka source data

Divided into several categories

2. Construction and maintenance of index system

Indicators of a company’s business

3. OLAP Clickhouse

What are the problems

How much speed has it increased

4. Spark part of the data warehouse

Bottom understanding

Width dependence

shuffle

job

rdd

Spark development, data processing (cleaning and transforming unstructured data into relational data)

5. Develop

Hive’s data skew

Hive tuning process

Small file and oom processing (for example, the solution process ☆, as mentioned in the previous video)

mr

Multi table join location processing

The scenario and use of windowing function to investigate the integrity of SQL