2021.9.9滴滴数据分析实习生笔试
作者:互联网
一、笔试形式
可能是因为招聘实习生,所以笔试不是很正规,是一个hr实习生加我微信后直接把题目以PDF的形式发给我的,要求在1小时到1.5小时把答案写在word里发给她(没有给我任何excel),PDF里也没有对题目给出的图片里的英文代号给出解释,我一直很懵。
声明一下:这是我自己写的答案,未必正确,我自己在mysql里验证过可运行~欢迎大家一起讨论~
二、笔试内容
第一题
自己建对应的表(验证代码是否正确)
create table u_log (
day date not null,
login_timestamp timestamp not null,
uid int(2) not null,
gender char(1) not null,
age integer not null
);
插入数据
insert into u_log values ('2020/8/1', '2020/8/1 11:00',1,'F',31),
('2020/8/1', '2020/8/1 14:00',1,'F',31),
('2020/8/1', '2020/8/1 19:00',3,'M',23),
('2020/8/1', '2020/8/1 14:00',1,'F',31),
('2020/8/1', '2020/8/1 20:00',2,'M',19),
('2020/8/1', '2020/8/1 21:00',4,'M',54),
('2020/8/1', '2020/8/1 21:05',5,'M',55);
得到数据库中表格
1,列出女性用户uid只显示前200
select distinct uid from u_log where gender='F' order by uid limit 200 ;
2.按年龄分组统计人数
(1)分成 <20,20-40,41-60,>60
首先添加一列一年龄分组
select *,
(case
when 0<=age and age < 20 then '<20'
when 20<=age and age <40 then '20-40'
when 40<age and age <= 60 then '41-60'
else '>60'
end) as age_level
from u_log ;
得到查询结果为
分组统计得到结果
select age_level,count(*)
from (
select *,
(case
when 0<=age and age < 20 then '<20'
when 20<=age and age <40 then '20-40'
when 40<age and age <= 60 then '41-60'
else '>60'
end) as age_level
from u_log ) a
group by age_level;
查询结果
第二题
我认为bikes表和users在这道题中没有用到所以我只建了trips,region,promotion三张表
建表语句
create table trips (
id int(11) not null primary key default 0,
user_id int(11) not null default 0,
bike_id int(11) not null default 0,
status varchar(191) not null default '0',
started_at datetime default '0000-00-00',
completed_at datetime default '0000-00-00',
region_id int(11) default 0
);
create table regions (
id int(11) not null primary key default 0,
name varchar(191)
);
create table promotions (
id int(11) not null primary key default 0,
p_name varchar(191),
start_at datetime,
end_at datetime);
插入数据
insert into trips values(1,001,545,'completed','2020-08-01 09:00','2020-08-01 09:05',2),(2,010,589,'completed','2020-08-01 10:00','2020-08-01 09:02',1),
(3,024,245,'failed','2020-08-02 08:00','2020-08-02 08:05',1),(4,001,556,'completed','2020-08-02 09:00','2020-08-01 09:05',2),
(5,054,123,'completed','2020-08-03 20:00','2020-08-03 21:05',1),(6,078,545,'completed','2020-08-10 19:00','2020-08-10 19:05',1),
(7,011,111,'completed','2020-08-11 02:00','2020-08-11 02:05',2),(8,001,545,'started','2020-09-11 09:00','2020-09-11 09:05',2);
insert into regions values(1,'浦东新区'),(2,'城北高新区');
insert into promotions values(1,'8月大促','2020-08-01','2020-08-31');
1.促销活动期间,有多少用户和订单(插入数据时把活动名写成了8月大促)
select r.name,count(distinct user_id),count(r.id)
from trips t
join regions r
on t.region_id = r.id
where started_at between
(select start_at from promotions where p_name = '8月大促')
and
(select end_at from promotions where p_name = '8月大促')
group by r.name;
2.首日使用情况占比
select concat(
round(sum(case when day(started_at) =
(select day(start_at) from promotions
where p_name = '8月大促') then 1 else 0 end)
/
sum(case when started_at between
(select start_at from promotions
where p_name = '8月大促') and (select end_at from promotions where p_name = '8月大促')
then 1 else 0 end)*100
,2)
,'%') as a
from trips;
第三题
建表语句
create table completes_order_info (
city_name varchar(191),
completed_order_nums int(3),
date date
);
插入数据
insert into completes_order_info values ('杭州',207,'2020-7-1'),('成都',178,'2020-7-1'),
('天津',57,'2020-7-01'),('重庆',214,'2020-7-01'),
('杭州',62,'2020-7-02'),('成都',111,'2020-7-02'),
('天津',73,'2020-7-02'),('重庆',60,'2020-7-02'),
('重庆',103,'2020-7-03'),('杭州',63,'2020-7-03');
1.各城市完单量最高的单量和日期
select a.city_name,a.completed_order_nums,a.date
from(
select *, row_number()over(partition by city_name order by completed_order_nums desc) as rn
from completes_order_info) a
where rn = 1;
第四题
查找2021-01-04到2021-01-10每个完成状态为1的订单详情和该用户上一次、下一次订单完成时间
第五题
我自己建了一个表命名为python考察.csv
1.添加Attack, Defense, Sp.Atk, Sp.Def, Speed,Generation指标的汇总值(Total)这一列,输 出Type1取值为Charmeleon的Total取值前十的数据。
import pandas as pd
#读入数据
data = pd.read_csv(open('C:/Users/dell/Desktop/文件夹/滴滴笔试9.9/python考察.csv',encoding = 'UTF-8'))
#添加汇总列
data['total'] = data[['Attack','Dedense','Sp.Atk','Sp.Def','Speed','Generation']].apply( lambda x: x.sum(),axis=1)
#取出前10行
data[data['type1']=='chameleon'][:10]
data为
得到结果
2.将数据集根据type1各取值进行取值并统计个数。
data.type1.value_counts()
如果存在错误欢迎大家指出,一起讨论一起进步~
祝大家都能取得想要的成绩~
标签:数据分析,11,00,name,2021.9,滴滴,2020,08,select 来源: https://blog.csdn.net/qq_43656500/article/details/120241152