SQL29 计算用户的平均次日留存率(Sql29 calculates the average next day retention rate of users)

SQL29 计算用户的平均次日留存率

困难 通过率:48.58% 时间限制:1秒 空间限制:256M

描述

题目:现在运营想要查看用户在某天刷题后第二天还会再来刷题的平均概率。请你取出相应数据。

示例:question_practice_detail

id device_id quest_id result date
1 2138 111 wrong 2021-05-03
2 3214 112 wrong 2021-05-09
3 3214 113 wrong 2021-06-15
4 6543 111 right 2021-08-13
5 2315 115 right 2021-08-13
6 2315 116 right 2021-08-14
7 2315 117 wrong 2021-08-15
……

根据示例,你的查询应返回以下结果:

avg_ret
0.3000

示例1

输入:

drop table if exists `user_profile`;
drop table if  exists `question_practice_detail`;
drop table if  exists `question_detail`;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int 
);
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);
CREATE TABLE `question_detail` (
`id` int NOT NULL,
`question_id`int NOT NULL,
`difficult_level` varchar(32) NOT NULL
);

INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16');
INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18');
INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13');
INSERT INTO question_detail VALUES(1,111,'hard');
INSERT INTO question_detail VALUES(2,112,'medium');
INSERT INTO question_detail VALUES(3,113,'easy');
INSERT INTO question_detail VALUES(4,115,'easy');
INSERT INTO question_detail VALUES(5,116,'medium');
INSERT INTO question_detail VALUES(6,117,'easy');

复制

输出:

0.3000

解析:

对于这道题,对于SQL能力很弱的我来说根本无从下手,实际上我在大学期间的数据库学的也不怎么样,而对于这道题中使用的一些技巧我在大学期间根本没有学,因此在这道题的解析中我会以知识点解析很详细的初学者模式学习这道题,而非解析这道题。

牛客网官方的解题方式使用了两种解法:

解法一,表里的数据可以看作是全部第一天来刷题了的,那么我们需要构造出第二天来了的字段,因此可以考虑用left join把第二天来了的拼起来,限定第二天来了的可以用筛选,并用device_id限定是同一个用户。

date_add(date1,interval 1 day)=date2

关于解法一,我们是先将表里的每一条数据都独立的来看,将它们都看成是第一天来刷题的,即使是有同一个人连续刷题刷了好几天,我们也将这些连续的数据独立分开来看,将每一条信息都分别看成第一天来刷题的,然后我们再进行一个合表,而合表条件我们使用date_add(date1,interval 1 day)=date2来进行匹配,这个是什么意思呢?实际上就是让date1加一天,然后看和date2是否相等,这个操作实际上就是让原数据中的日期加一天,然后让加过的表和之前的刷题信息表进行一次匹配,看看有多少能匹配上的,因为原来的信息表中保留的是原始的时间信息,而新表中的时间信息都加了一天,这样一匹配的话,有些信息的日期就在原信息中不存在了,这是因为其第二天没有刷题,因此没有这第二天的系信息,同时我们还要使用device_id限定是同一个用户,这样一来就能确保是每个人是否在第二天刷题了,这个思路实际上就是看一看某个人是否每条刷题记录都有第二天的记录。

SQL脚本的书写方式如下:

select count(date2) / count(date1) as avg_ret
from(
	select
    	distinct qpd.device_id,
    	qpd.date as date1,
    	uniq_id_date.date as date2
    from question_practice_detail as qpd
    left join(
    	select distinct device_id, date
        from question_practice_detail
    ) as uniq_id_date
    on qpd.device_id=uniq_id_date.device_id
    	and date_add(qpd.date,interval 1 day)=uniq_id_date.date
) as id_last_next_date

关于这个脚本,使用了分步查询,其中最大的外层查询,是在查询内层结果中的date2的数量与date1的数量的比值,内层查询则是对一个同样的表(这里也用了一个简单的分步,旨在去掉多余没用的数据)进行了自身的左连接,这里我们如果想要对一个同样的表进行自身的连接并且使用自身的字段时,可以加上别名加以区分,而连接的条件就是其中一个表的日期加一后和另一个表的相等,且设备id仍然相等,这样就得到了一个用户,连续两天刷题的匹配结果,而需要注意的是,左连接会保留左侧表和右侧表没有匹配的信息,因此我们使用date1的数量就可以获取到原表中所有的刷题信息,而右侧匹配到的就是有第二天刷题的信息,这样做比值,就可以得到我们想要的结果了。

在这个问题中其实非常重要的一个问题就是:什么是用户的平均次日存留率?这个怎么计算是个问题,在题目中解释道:,这个是什么意思呢?我们在解决问题的时候一定要学会理解问题并简化问题,在某天刷题之后第二天还会来,其意思就是,针对于用户的每一条记录,在第二天还有这个人的刷题记录,并且这个第二天是一个相对的第二天,是一个相对概念,也就是说如果我连着刷题刷了三天:10号,11号,12号,那么11号相对于10号就是第二天,而12号相对于11号也是第二天,总体上这三天中存在两个第二天,这就让我们必须单独的看待每一天,然后找每一天的第二天,而不能在内心中存在12号是10号的第三天的想法,这种想法尽管首先上就是不对的,但是一些人在做题的时候内心还是隐约有这种顾虑,比如我,进而难以很好的理解这道题的意思,这道题的第二天就是指相对的第二天,我们必须要独立的来看待每一天,然后找到每一天的第二天,然后我们要求的概率,实际上就是所有的有第二天的信息所占的比例,也就是说所有有第二天记录的记录在所有记录中所占据的比例,如上边我举的例子,三天中有两天有第二天,而总共有三天,那么我第二天还来刷题的概率就是2/3,这三天之中。有两天拥有第二天,有一天不拥有,这个概率就是这样来的。

用户在某天刷题后第二天还会再来刷题的平均概率

解法二,用lead函数将同一个用户连续两天的记录拼接起来。先按用户分组,再按日期升序排序,再两两拼接(最后一个默认和null拼接),即。

partition by device_id
order by date
lead(date) over (partition by device_id order by date)

关于解法二我基本上完全不懂,因为我不知道lead函数是什么意思,因此我查了一下:LEAD()函数是一个窗口函数,允许我们向前看多行并从当前访问行的数据,LEAD()函数对于计算同一结果集中当前行和后续行之间的差异非常有用,LEAD()函数的用法:

LEAD(<expression>[,offset[, default_value]]) OVER (
    PARTITION BY (expr)
    ORDER BY (expr)
) 

其中expression是LEAD()函数返回的值experssion从offset-th有序分区排,offset是从当前行向前行的行数,以获取值,offset必须是一个非负整数,如果offset为零,则LEAD()函数计算expression当前行的值,如果省略offset,则LEAD()函数默认使用一个。default_value,如果没有后续行,则LEAD()函数返回default_value。例如,如果offset是1,则最后一行的返回值为default_value。如果未指定default_value,则函数返回NULL。PARTITION BY子句将结果集中的行划分LEAD()为应用函数的分区,如果PARTITION BY未指定子句,则结果集中所有行都被视为单个分区,ORDER BY子句是确定函数中行的顺序用的。

MySQL LEAD()函数示例

/*
	我们将使用示例数据库中的orders和customers表进行演示:
	以下语句查找每个客户的订单日期和下一个订单日期:
*/
SELECT 
    customerName,/*字段*/
    orderDate,/*字段*/
    LEAD(orderDate,1) OVER (/*获取下一行的orderDate*/
        PARTITION BY customerNumber /*按照customerName分组*/
        ORDER BY orderDate /*排序*/) nextOrderDate/*命名为nextOrderDate*/
FROM 
    orders
INNER JOIN customers USING (customerNumber); 

我们将获得一个这样的结果:

+------------------------------------+------------+---------------+
| customerName                       | orderDate  | nextOrderDate |
+------------------------------------+------------+---------------+
| Atelier graphique                  | 2013-05-20 | 2014-09-27    |
| Atelier graphique                  | 2014-09-27 | 2014-11-25    |
| Atelier graphique                  | 2014-11-25 | NULL          |
| Signal Gift Stores                 | 2013-05-21 | 2014-08-06    |
| Signal Gift Stores                 | 2014-08-06 | 2014-11-29    |
| Signal Gift Stores                 | 2014-11-29 | NULL          |
| Australian Collectors, Co.         | 2013-04-29 | 2013-05-21    |
| Australian Collectors, Co.         | 2013-05-21 | 2014-02-20    |
| Australian Collectors, Co.         | 2014-02-20 | 2014-11-24    |
| Australian Collectors, Co.         | 2014-11-24 | 2014-11-29    |
| Australian Collectors, Co.         | 2014-11-29 | NULL          |
| La Rochelle Gifts                  | 2014-07-23 | 2014-10-29    |
| La Rochelle Gifts                  | 2014-10-29 | 2015-02-03    |
| La Rochelle Gifts                  | 2015-02-03 | 2015-05-31    |
| La Rochelle Gifts                  | 2015-05-31 | NULL          |
| Baane Mini Imports                 | 2013-01-29 | 2013-10-10    |
| Baane Mini Imports                 | 2013-10-10 | 2014-10-15    |
...
/*
在此示例中,我们首先将客户编号的结果集划分为多个分区。然后,我们按订单日期对每个分区进行排序。最后,LEAD()函数应用于每个分区以获取下一个订单日期。

一旦后续行穿过分区边界,nextOrderDate每个分区的最后一行中的值就是NULL。
*/

现在我们知道了LEAD()的用法,实际上就是找下n行的n个字段,因此我们就可以理解这道题的解法了:先是按用户分组,然后按照时间升序排序,然后我们将连续两天的记录拼合起来,然后我们就可以检查两个字段的日期差了,只要日期差为1,那就说明这是一个连续刷题的记录,而为0的话说明这天的第二天没有连续刷题,然后即为我们想要求的概率,这里可以使用平均值计算出来,也就是求日期差为1的个数的平均值。

日期差为1的记录/所有的记录

SQL脚本:

select avg(if(datediff(date2, date1)=1, 1, 0)) as avg_ret/*差值为1则为1,否则为0,然后求这个字段相加的平均值*/
from (
    select
        distinct device_id,
        date as date1,
        lead(date) over (partition by device_id order by date) as date2/*获取相应的分组以及分组内下行的日期*/
    from (
        select distinct device_id, date/*简化表格*/
        from question_practice_detail
    ) as uniq_id_date
) as id_last_next_date
————————

Sql29 calculates the average next day retention rate of users

Difficult passing rate: 48.58% time limit: 1 second space limit: 256M

describe

Title: now the operator wants to check the average probability that users will brush again the next day after one day. Please take out the corresponding data.

示例:question_practice_detail

id device_id quest_id result date
1 2138 111 wrong 2021-05-03
2 3214 112 wrong 2021-05-09
3 3214 113 wrong 2021-06-15
4 6543 111 right 2021-08-13
5 2315 115 right 2021-08-13
6 2315 116 right 2021-08-14
7 2315 117 wrong 2021-08-15
……

According to the example, your query should return the following results:

avg_ret
0.3000

Example 1

Input:

drop table if exists `user_profile`;
drop table if  exists `question_practice_detail`;
drop table if  exists `question_detail`;
CREATE TABLE `user_profile` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`gender` varchar(14) NOT NULL,
`age` int ,
`university` varchar(32) NOT NULL,
`gpa` float,
`active_days_within_30` int ,
`question_cnt` int ,
`answer_cnt` int 
);
CREATE TABLE `question_practice_detail` (
`id` int NOT NULL,
`device_id` int NOT NULL,
`question_id`int NOT NULL,
`result` varchar(32) NOT NULL,
`date` date NOT NULL
);
CREATE TABLE `question_detail` (
`id` int NOT NULL,
`question_id`int NOT NULL,
`difficult_level` varchar(32) NOT NULL
);

INSERT INTO user_profile VALUES(1,2138,'male',21,'北京大学',3.4,7,2,12);
INSERT INTO user_profile VALUES(2,3214,'male',null,'复旦大学',4.0,15,5,25);
INSERT INTO user_profile VALUES(3,6543,'female',20,'北京大学',3.2,12,3,30);
INSERT INTO user_profile VALUES(4,2315,'female',23,'浙江大学',3.6,5,1,2);
INSERT INTO user_profile VALUES(5,5432,'male',25,'山东大学',3.8,20,15,70);
INSERT INTO user_profile VALUES(6,2131,'male',28,'山东大学',3.3,15,7,13);
INSERT INTO user_profile VALUES(7,4321,'male',28,'复旦大学',3.6,9,6,52);
INSERT INTO question_practice_detail VALUES(1,2138,111,'wrong','2021-05-03');
INSERT INTO question_practice_detail VALUES(2,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(3,3214,113,'wrong','2021-06-15');
INSERT INTO question_practice_detail VALUES(4,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(5,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(6,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(7,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(8,3214,112,'wrong','2021-05-09');
INSERT INTO question_practice_detail VALUES(9,3214,113,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(10,6543,111,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(11,2315,115,'right','2021-08-13');
INSERT INTO question_practice_detail VALUES(12,2315,116,'right','2021-08-14');
INSERT INTO question_practice_detail VALUES(13,2315,117,'wrong','2021-08-15');
INSERT INTO question_practice_detail VALUES(14,3214,112,'wrong','2021-08-16');
INSERT INTO question_practice_detail VALUES(15,3214,113,'wrong','2021-08-18');
INSERT INTO question_practice_detail VALUES(16,6543,111,'right','2021-08-13');
INSERT INTO question_detail VALUES(1,111,'hard');
INSERT INTO question_detail VALUES(2,112,'medium');
INSERT INTO question_detail VALUES(3,113,'easy');
INSERT INTO question_detail VALUES(4,115,'easy');
INSERT INTO question_detail VALUES(5,116,'medium');
INSERT INTO question_detail VALUES(6,117,'easy');

copy

Output:

0.3000

Resolution:

I didn’t know how to solve this problem in the database for beginners. In fact, I didn’t know how to solve this problem in the University, so I didn’t know how to solve this problem in the database mode.

The official solution of niuke.com uses two solutions:

< strong > solution 1 < / strong >, the data in the table can be regarded as all the questions brushed on the first day, so we need to construct the fields that came on the second day. Therefore, we can consider using left join to spell the fields that came on the second day, and limit those that came on the second day to filter and use device_ The ID limit is the same user.

date_add(date1,interval 1 day)=date2

As for solution 1, we first look at each piece of data in the table independently and regard them as the first day to brush the questions. Even if the same person brushes the questions continuously for several days, we also look at these continuous data independently and regard each piece of information as the first day to brush the questions separately. Then we conduct a table combination, and the table combination condition is that we use date_ Add (date1, interval 1, day) = date2 to match. What does this mean? In fact, it is to add one day to date1 and see whether it is equal to date2. This operation is actually to add one day to the date in the original data, and then match the added table with the previous question brushing information table to see how many can be matched, because the original time information is retained in the original information table, while the time information in the new table is added by one day. If such a match is made, The date of some information does not exist in the original information, because it does not brush the title the next day, so there is no system information the next day. At the same time, we have to use device_ The ID is limited to the same user. In this way, we can ensure that everyone can brush questions the next day. This idea is actually to see if someone has records of the next day for each brush question record.

The SQL script is written as follows:

select count(date2) / count(date1) as avg_ret
from(
	select
    	distinct qpd.device_id,
    	qpd.date as date1,
    	uniq_id_date.date as date2
    from question_practice_detail as qpd
    left join(
    	select distinct device_id, date
        from question_practice_detail
    ) as uniq_id_date
    on qpd.device_id=uniq_id_date.device_id
    	and date_add(qpd.date,interval 1 day)=uniq_id_date.date
) as id_last_next_date

For this script, a step-by-step query is used. The largest outer query is the ratio of the number of date2 to the number of date1 in the query inner result. The inner query is to make its own left connection to the same table (a simple step is also used here to remove redundant and useless data). Here, if we want to make its own connection to the same table and use its own fields, Aliases can be added to distinguish. The connection condition is that the date of one table is equal to that of the other table after adding one, and the device ID is still equal. In this way, the matching result of a user’s question brushing for two consecutive days can be obtained. It should be noted that the left connection will retain the information that there is no matching between the left table and the right table. Therefore, we can obtain all the question brushing information in the original table by using the number of date1, The right side matches the information about the question brushing the next day. In this way, we can get the result we want.

In fact, a very important question in this question is: < strong > what is the average next day retention rate of users < / strong >? How to calculate this is a problem, which is explained in the title: what does this mean? When solving problems, we must learn to understand the problem and simplify the problem. We will come the next day after brushing the questions one day. This means that for each record of the user, there is the person’s question brushing record on the next day, and this second day is a relative second day, which is a relative concept. That is to say, if I brush the questions for three days in a row: No. 10, No. 11, No. 12, So the 11th is the second day relative to the 10th, and the 12th is also the second day relative to the 11th. Generally, there are two second days in these three days, which makes us have to look at each day separately, and then look for the second day of each day. We can’t have the idea that the 12th is the third day of the 10th in our heart. Although this idea is wrong in the first place, some people still have this kind of concern when doing questions, such as me, Then it is difficult to understand the meaning of this question well. The second day of this question refers to the relative second day. We must look at each day independently, and then find the second day of each day. Then the probability we require is actually the proportion of all the information with the second day, that is, the proportion of all the records with the second day in all the records, as I mentioned above, Two of the three days have the second day, and there are three days in total, so the probability that I will brush the questions the next day is 2 / 3, in these three days. There are two days to have the next day, one day not to have, this is the probability.

用户在某天刷题后第二天还会再来刷题的平均概率

< strong > solution 2 < / strong >, use the lead function to splice the records of the same user for two consecutive days. First group by users, then sort in ascending order by date, and then splice in pairs (the last default and null splice), i.e.

partition by device_id
order by date
lead(date) over (partition by device_id order by date)

I basically don’t understand solution 2 at all, because I don’t know what the lead function means, so I checked it: the lead() function is a window function that allows us to look forward to multiple rows and access the data of the row from the current one. The lead() function is very useful for calculating the difference between the current row and subsequent rows in the same result set. The usage of the lead() function:

LEAD(<expression>[,offset[, default_value]]) OVER (
    PARTITION BY (expr)
    ORDER BY (expr)
) 

Where expression is the value returned by the lead() function. Expression is arranged in an orderly partition from offset th. Offset is the number of rows from the current row to the front row to obtain the value. Offset must be a non negative integer. If offset is zero, the lead() function calculates the value of the current row of expression. If offset is omitted, the lead() function uses one by default. default_ Value. If there is no subsequent line, the lead() function returns default_ value。 For example, if offset is 1, the return value of the last line is default_ value。 If default is not specified_ Value, the function returns null. The partition by clause divides the rows in the result set into the partition of the application function. If the partition by clause is not specified, all rows in the result set are regarded as a single partition, and the order by clause is used to determine the order of rows in the function.

MySQL LEAD()函数示例

/*
	我们将使用示例数据库中的orders和customers表进行演示:
	以下语句查找每个客户的订单日期和下一个订单日期:
*/
SELECT 
    customerName,/*字段*/
    orderDate,/*字段*/
    LEAD(orderDate,1) OVER (/*获取下一行的orderDate*/
        PARTITION BY customerNumber /*按照customerName分组*/
        ORDER BY orderDate /*排序*/) nextOrderDate/*命名为nextOrderDate*/
FROM 
    orders
INNER JOIN customers USING (customerNumber); 

We will get such a result:

+------------------------------------+------------+---------------+
| customerName                       | orderDate  | nextOrderDate |
+------------------------------------+------------+---------------+
| Atelier graphique                  | 2013-05-20 | 2014-09-27    |
| Atelier graphique                  | 2014-09-27 | 2014-11-25    |
| Atelier graphique                  | 2014-11-25 | NULL          |
| Signal Gift Stores                 | 2013-05-21 | 2014-08-06    |
| Signal Gift Stores                 | 2014-08-06 | 2014-11-29    |
| Signal Gift Stores                 | 2014-11-29 | NULL          |
| Australian Collectors, Co.         | 2013-04-29 | 2013-05-21    |
| Australian Collectors, Co.         | 2013-05-21 | 2014-02-20    |
| Australian Collectors, Co.         | 2014-02-20 | 2014-11-24    |
| Australian Collectors, Co.         | 2014-11-24 | 2014-11-29    |
| Australian Collectors, Co.         | 2014-11-29 | NULL          |
| La Rochelle Gifts                  | 2014-07-23 | 2014-10-29    |
| La Rochelle Gifts                  | 2014-10-29 | 2015-02-03    |
| La Rochelle Gifts                  | 2015-02-03 | 2015-05-31    |
| La Rochelle Gifts                  | 2015-05-31 | NULL          |
| Baane Mini Imports                 | 2013-01-29 | 2013-10-10    |
| Baane Mini Imports                 | 2013-10-10 | 2014-10-15    |
...
/*
在此示例中,我们首先将客户编号的结果集划分为多个分区。然后,我们按订单日期对每个分区进行排序。最后,LEAD()函数应用于每个分区以获取下一个订单日期。

一旦后续行穿过分区边界,nextOrderDate每个分区的最后一行中的值就是NULL。
*/

Now that we know the usage of lead (), we can actually find n fields in n rows, so we can understand the solution of this problem: first group by users, and then sort in ascending order of time. Then we can put together the records of two consecutive days, and then we can check the date difference between the two fields. As long as the date difference is 1, it means that this is a record of continuous question brushing, If it is 0, it means that there is no continuous question brushing on the second day of the day, and then it is the probability we want to ask. Here, we can use the average value to calculate it, that is, find the average value of the number of dates with a difference of 1.

日期差为1的记录/所有的记录

SQL script:

select avg(if(datediff(date2, date1)=1, 1, 0)) as avg_ret/*差值为1则为1,否则为0,然后求这个字段相加的平均值*/
from (
    select
        distinct device_id,
        date as date1,
        lead(date) over (partition by device_id order by date) as date2/*获取相应的分组以及分组内下行的日期*/
    from (
        select distinct device_id, date/*简化表格*/
        from question_practice_detail
    ) as uniq_id_date
) as id_last_next_date