Lyft Data Science Assignment
Thank you for taking the time to complete Lyft’s Data Science
Assignment!
Assignment
Lyft ridesharing is a two-sided marketplace with drivers and passengers. Every day new drivers
join the platform and existing drivers either drive or they do not. Suppose you are working as a
Data Scientist on the Driver Retention team whose primary goal is to reduce the rate of churn of
activated drivers (a driver becomes ‘activated’ once they complete their first ride).
The team would like to understand churn better. Explore the data to provide the team with a
deeper understanding of churn at Lyft. Your summary should include:
● The definition (with justification) for a driver to be considered churned.
● An assessment on the current business impact of churn to Lyft.
● Insights on factors affecting churn.
● Insights on segments of drivers more likely to churn.
Next, the team would like to size the opportunity of reducing churn in order to prioritize their
roadmap. The team is considering the following two hypotheses:
i. Doubling the number of rides in an activated driver’s first week.
ii. Another hypothesis you recommend.
Using the data, help the team prioritize these two hypotheses. You should cover:
● How big the opportunities are.
● What might be the longer-term consequences on the marketplace of each hypothesis.
● Which segments of drivers are most likely affected by each hypothesis.
● Which hypothesis you have more confidence in.
Finally, suppose the team wants to test the following hypothesis: “eliminating the Prime Time
feature will decrease driver churn”. Design an experiment to do so. Your design should include:
● How you will divide observational units into control and treatment, and a description of
the treatment and control conditions.
● What are some potential second-order effects on the experience of drivers and
passengers during this experiment.
● What are the primary and secondary metrics you will track.
● How long you will run the experiment and how you will choose the winning variant.
Submission Instructions
1. Please do not write your name on any submission documents.
2. Using the data provided, aim to spend roughly 5-8 hours answering the questions.
3. Prepare a 20 minute presentation for a panel of Data Scientists. At Lyft, we believe
Data Scientists are most effective when they're telling a story with data. Typically
slides are most effective but you are welcome to use other formats (e.g.
iPython-markdown, R-markdown, Word doc but you will need to .pdf them) if you prefer.
4. Include all of your working materials (including all code) in a separate PDF.
5. Keep in mind that we will be grading the assignment based on its technical
soundness and depth, business applications and insights, structure and
organization, completeness and polish.
Data Provided
data/driver_ids.csv
driver_id Unique identifier for a driver
driver_onboard_date Date on which driver was onboarded
data/ride_ids.csv
driver_id Unique identifier for a driver
ride_id Unique identifier for a ride that was completed by the driver
ride_distance Ride distance in meters
ride_duration Ride durations in seconds
ride_prime_time PrimeTime applied on the ride
data/ride_timestamps.csv
ride_id Unique identifier for a ride that was completed by the driver
ride_picked_up_at Timestamp for when driver picked up the passenger
版权所有:编程辅导网 2021 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。