联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp

您当前位置:首页 >> Database作业Database作业

日期:2018-08-11 04:30


The below table contains a record of every swipe that occurs on Tinder. It has the id of the user who swipes, the if of the user who is swiped on, the time the swipe occurred, and whether the swipe was a right swipe or a left swipe.


Table name: ‘swipes’

swiping_user_id (integer)swiped_on_user_id (integer)swipe_at (timestamp)swipe_type

(varchar)

195831572472592015-03-15 00:01:05‘right’

582305087328322014-06-10 05:12:05‘left’

794839267672912015-08-10 12:45:01‘left’


1)Write a query that lists all users who have both swiped right and swiped left. (relevant time period: all time)


select distinct a.swiping_user_id as swiping_user_id

from swipes a, swipes b

where a.swiping_user_id=b.swiping_user_id

and a.swipe_type!=b.swipe_type

order by a.swiping_user_id desc;


2)Write a query that determines the % of right swipes per day that resulted in a match. A match occurs when user A right swipes on user B, and user B right swipes on user A.  (relevant time period: all time)


select

 cast(c.swipe_at as date) as swipe_day,

 count(distinct a.swiping_user_id, a.swiped_on_user_id) / count(c.swipe_type) as percentage

from swipes a, swipes b, swipes c

where c.swipe_type = 'right'

     and a.swiping_user_id = b.swiped_on_user_id

     and b.swiping_user_id = a.swiped_on_user_id

     and a.swipe_type = b.swipe_type

     and a.swipe_type = 'right'

     and c.swipe_type = 'right'

A/B Test Question


The table below shows results from an A/B/C test of a product feature on a continuous metric.



MeanStandard Deviation% Lift

(relative to control)N

Control7412,000

Variant 16.73-4.2%12,000

Variant 27.161.4%12,000


3)In 2-3 sentences, describe the results of this test and the recommendation you would make to the product manager. Make any necessary assumptions but please state those assumptions.


#assumption: the higher score the better

#assumption: our sample size is larger than required to prove two variants are statistically different


Variant 2 has higher mean and are more spread out. The lift rate indicated variant 2 has better performance of our targeting model. I would recommend pursue variant 2, meanwhile working on lower the standard deviation.

4)The Product team is looking for interesting insights into user swipe behavior.  Can you think of an interesting analysis you can do with the data from the swipes table above and how it could inform the product?  Please provide the corresponding query to pull the information (if possible).


One thing I would like to improve base on our existing data is app recommendation time. Instead of sending out push notification at noon, we selectively send a notification to users not using the app for more than 24 hours and at his/her most likely hour of using the app.

这儿没思路。。。怎么用sql表达多一天?如果时间超过一天的话,还得求average?

Question 1:

Given a table with following schema: create table test_a(id integer, test_name varchar(100));

1.How can you select all the records where the 'id' column is an even number? All the odd number records?


Select * From test_a

Where test_a.id%2=0 --even number

Select * From test_a

Where test_a.id%2=1 -- odd number


2.There are 2 ways you could remove all of the rows from the table test_a. 'DELETE FROM TEST_A' and 'TRUNCATE TABLE TEST_A'. Which statement would you use? Is there a difference between the 'DELETE' and 'TRUNCATE' statements?


Depends on quantity and what kind of access do I have.


TRUNCATE is a DDL command, executed using a table lock. It removes data by deallocating the data pages used to store the data. It will minimal logging in the transaction log, so it's fast. However, it cannot be used with indexed views.


Delete is a DML command, executed using a row lock. It maintains the log, so it's slower compared to TRUNCATE. It can be used with indexed views.


3.What is the difference between the WHERE and HAVING clauses?


WHERE clause is used for filtering rows while HAVING clause is used to filter groups. Additionally, WHERE is used before the GROUP BY clause, and HAVING is used after the GROUP BY clause.


4.Write a sql statement that would determine if table 'test_a' has duplicate records.


SELECT * FROM test_a

GROUP BY id, test_name

HAVING COUNT (*) > 1


5.Assuming table 'test_a' has duplicate records, can the duplicate records be removed using a single 'DELETE' statement? If so, write the sql statement. If not, write the sql to remove the duplicate rows.


WITH test_distinct AS

(

SELECT *, ROW_NUMBER () Over(Partition BY id ORDER BY id) AS RowNumber

FROM test_a

)

DELETE FROM test_distinct WHERE RowNumber > 1


Question 2:

Three tables need to be joined to produce an aggregated result.

The first table is the events one. Every few seconds, a user clicks on a link somewhere and we receive an associated event log with some metadata.

The columns we receive on each events are:

●      timestamp

●      event_type (one of 'search', 'click', 'land')

●      network (one of 'google' or 'yahoo': the network the event belongs to.)

●      partner

●      device_type

●      domain

●      tag


For each network (google or yahoo) in the events table, we have a corresponding report from the network itself. Each event uses (domain or tag) along with device_type to identify the corresponding row in the network table. Each event only matches to one network.

See the below tables.


The second table is the google_revenue one.

The columns we receive from google are:

●      day

●      domain

●      device_type

●      network_reported_searches (an integer type)

●      network_reported_clicks (an integer type)

●      network_revenue (a decimal type)


The third table is the yahoo_revenue one.

The columns we receive from yahoo are:

●      day

●      tag

●      device_type

●      network_reported_searches (an integer type)

●      network_reported_clicks (an integer type)

●      revenue (a decimal type)

The result table will have aggregated results for each day, partner, and network.

The columns we want are:

●      day

●      partner (from the events table)

●      network (from the events table)

●      count_search_events (an integer type)

●      count_clicks_events (an integer type)

●      sum_network_reported_searches, (an integer type)

●      sum_network_reported_clicks, (an integer type)

●      sum_network_revenue, (a decimal type)


Provide the SQL joining the 3 tables to produce the final result.


(

 select google_revenue.day, events.partner, evets.network, count_search_events(

   select count(event_type) from events where event_type='search'), count_clicks_events(select count(event_type) into google from events where event_type='click'),google_network_reported_searches as sum_network_reported_searches, google_network_reported_clicks as sum_network_reported_clicks,google_network_revenue as sum_network_revenue from google_revenue

 join events on google.domain=events.domain

 and google.device_type=evenets.device_type

) full join

(

 select yahoo_revenue.day, events.partner, evets.network, count_search_events(

   select count(event_type) from events where event_type='search'), count_clicks_events(select count(event_type) into yahoo from events where event_type='click'),yahoo_network_reported_searches as sum_network_reported_searches, yahoo_network_reported_clicks as sum_network_reported_clicks,yahoo_network_revenue as sum_network_revenue from yahoo_revenue

 join events on yahoo.tag=events.tag

 and yahoo.device_type=evenets.device_type

) on google.day=yahoo.day

group by day



版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp