(2)技术面试
这一轮技术面试可能会以不同的形式出现,有可能是Take-Home Challenge,也可能会是一个在线的视频面试,主要问一些关于编程和代码类的问题,或是机器学习类的问题。
谷歌或是Facebook在这一轮的技术面试可能只关注申请者的技术能力,但是亚马逊在这一轮面试可能会涉及到过往经历的问题。因此申请者需要准备好向面试官解释自己过往所参与的项目以及自己所在其中解决的业务问题。
(3)现场面试
现场面试往往会在亚马逊的办公室呆上一整天的时间,因为一共会进行5到6轮的面试,其中包括一场非正式午餐面试。
这些面试持续的时间大约在45-60分钟,形式是一对一面试,面试官可能是你未来团队的领导,可能是HR经理,也可能是高级管理人员。
现场面试的形式比较多样,可能包括案例研究(Case Study)、技术演示(Technical Presentation)、Q&A、白板面试(Whiteboarding)等。
可以看出,在成为Amazon数据科学家之前需要经过层层筛选,过五关斩六将。因此,如何有针对性地准备面试问题就显得非常关键了。
现场面试题目分类及例题
现场面试的题目可以大致分为以下几类:
(1)编码(大约占37%)
亚马逊数据科学家必须编写代码并开发复杂的算法来综合来自多个来源的数据,因此面试者需要证明自己有相关的技术知识来分析和操作这些数据。
面试官会在SQL、数据结构、算法和建模等方面对面试者进行考察。大多数面试者选择使用Python解决数据结构和算法问题,并使用Python或R解决建模问题。
在大多数情况下,面试者可以在白板或者类似的地方编写代码,但根据部分应聘者反馈,他们只能用口头进行表达,这也说明了亚马逊非常看重员工的沟通技巧,所以申请者可以练习在纸上写脚本和通过推理说话。
枚举一些例题,供参考:
① SQL
- Write a SQL code to explain month to month user retention rate.
- Describe different JOINs in SQL.
- What is the most advanced query you’ve ever written?
- Given a table with three columns, (id, category, value) and each id has 3 or less categories (price, size, color); how can you find those id's for which the value of two or more categories matches one another?
② Data structure and algorithms
- Write a python code for recognizing if entries to a list have the same characters or not. Then what is the computational complexity of it?
- You have an array of integers and you want to find a certain element; what effective algorithm would you use and what is the efficiency of it?
- For a long sorted list and a short (4 element) sorted list, what algorithm would you use to search the long list for the 4 elements?
- Given an unfair coin with the probability of heads not equal to .5, what algorithm could you use to create a list of random 1s and 0s?
- Given a bar plot, imagine you are pouring water from the top. How do you qualify how much water can be kept in the bar chart?
- Write a Python function that displays the first n Fibonacci numbers.
③ Modeling
- How would you improve a classification model that suffers from low precision?
- We have two models, one with 85% accuracy, one 82%. Which one do you pick?
- When you have time series data by month, and it has large data records, how will you find significant differences between this month and previous month?
- How do you inspect missing data and when are they important?
(2)机器学习(约占27%)
根据职位的不同,面试官可能会要求面试者表达关于系统设计和机器学习模型的具体想法。
如果涉及到更深入的机器学习内容,面试者可能需要构建一个假设模型或讨论如何改进与现实生活中的亚马逊商业决策相关的现有模型。
根据其他应聘者的说法,在问到关于机器学习相关的问题时,还会涉及到无监督机器学习、偏差方差权衡、主成分分析和递归神经网络等内容。
例题参考:
- How do you interpret logistic regression?
- How does dropout work?
- What is L1 vs L2 regularization?
- What is the difference between bagging and boosting?
- Explain in detail how a 1D CNN works.
- Describe a case where you have solved an ambiguous business problem using machine learning.
- Having a categorical variable with thousands of distinct values, how would you encode it?
- How do you manage an unbalanced data set?
- What is lstm? Why use lstm? How was lstm used in your experience?
(3)行为问题(约占19%)
在遇到行为问题时,亚马逊非常独特的地方在于,亚马逊希望他们未来潜在的员工可以结合亚马逊的16条领导力准则进行回答。
申请者准备面试的时候,需要有策略地从你过去的经历中练习如何描述你的“故事”,强调你是如何体现16条原则中的每一条的。
以下是亚马逊的16条领导力准则:
- 顾客至尚(Customer Obsession)
- 主人翁精神(Ownership)
- 创新简化(Invent and Simplify)
- 决策正确(Are Right, A Lot)
- 好奇求知(Learn and Be Curious)
- 选贤育能(Hire and Develop the Best)
- 最高标准(Insist on the Highest Standards)
- 远见卓识(Think Big)
- 崇尚行动(Bias for Action)
- 勤俭节约(Frugality)
- 赢得信任(Earn Trust)
- 刨根问底(Dive Deep)
- 敢于谏言,服从大局(Have Back Bone; Disagree and Commit)
- 达成业绩(Deliver Results)
- 致力于成为全球最佳雇主(Strive to be Earth's Best Employer)
- 成功和规模带来更大的责任(Success and Scale Bring Broad Responsibility)
例题参考:
- Tell me about a time you made something much simpler for customers. (Principle: Customer Obsession)
- Tell me about a project you worked on that was not successful. What would you do differently? (Principle: Ownership)
- What’s the most innovative idea you’ve ever had? (Principle: Invent and Simplify)
- Tell me about a time you applied judgement to a decision when data was not available. (Principle: Are Right, A Lot)
- Why data science? (Principle: Learn and Be Curious)
- Where do you see yourself within the next 5 years? (Principle: Hire and Develop the Best)
- How would you improve this [project on your resume] if you had more time? (Principle: Insist on the Highest Standards)
- Tell me a time that a goal was hard to achieve. What did you learn from that? (Principle: Insist on the Highest Standards)
- Tell me about your most significant accomplishment. Why was it significant? (Principle: Think Big)
(4)统计学(约占17%)
亚马逊数据科学家必须从庞大而复杂的数据集中获得有用的见解,这使得统计分析成为他们日常工作的重要组成部分。因此,具备非常坚实的统计基础是非常重要的。
申请者可以复习一些基本统计学知识以及如何对统计术语进行简明解释,重点可以放在应用统计学和统计概率知识上。面试官在以前的面试中问过的问题包括A/B测试、标准化和贝叶斯定理等。
例题参考:
- What is p-value?
- What is the maximum likelihood of getting k heads when you tossed a coin n times? Write down the mathematics behind it.
- There are 4 red balls and 2 blue balls, what's the probability of them not being the same in the 2 picks?
- How would you explain hypothesis testing for a newbie?
- What is cross-validation?
- How do you interpret OLS regression results?
- Explain confidence intervals
- Name the five assumptions of linear regression
- Estimate the disease probability in one city given the probability is very low nationwide. Randomly asked 1000 people in this city, with all negative responses (NO disease). What is the probability of disease in this city?
- What is the difference between linear regression and a t-test?
常见面试真题参考答案及解析
原题:What is the difference between bagging and boosting?
Bagging是通过结合几个模型降低泛化误差的技术。主要想法是分别训练几个不同的模型,然后让所有模型表决测试样例的输出。