ASHIVNI SHEKHAWAT
Research scientist at Lyft Inc.

Hypothesis Testing

TermDefinition
H0Null hypothesis, the default position that there is no change
H1Alternative hypothesis, the alternative position that there is some change
Type I errorRejection of H0, when H0 is true
Type II errorFailure to reject H0, when H1 is true
ErrorFalse rejection of H0
α, Significance level, false positive ratePr(reject H0 | H0 is true); probability of type I error
β, false negative rate1 – Pr(reject H0 | H1 is true); probability of type II error
PowerPr(reject H0 | H1 is true); 1 – probability of type II error; 1 – β
P-valueProbability of observing a result at least as extreme as the observed result, given that H0 and all other model assumptions are true
Confidence IntervalThe set of effect sizes whose test produced P > 0.05 define a 1 – 0.05 = 0.95 or 95 % confidence interval.
PCEPer comparison error rate
PFEError rate per family (expected number of false rejections per family)
FWEFamily wise error rate (probability of at least one error in the family)
FDRFalse discovery rate (expected number of false significances / number of significances)

Classification

TermDefinition
True positive rate; TPR; Recall; Sensitivity True positives / All positives, i.e. the fraction of positives that are correctly classified by the model
False positive rate; FPR; Type I error rateFalse positives / All negatives, i.e. the fraction of all negatives that are incorrectly classified by the model
False negative rate; FNR; Type II error rate; Miss rateFalse negatives / All positives, i.e. the fraction of all positives that are incorrectly classified by the model
True negative rate; Selectivity; SpecificityTrue negatives / All negatives, i.e. the fraction of all negatives that are correctly classified by the model
AccuracyTrue positives + True negatives / Total population, i.e. the fraction of population that is correctly classified by the model
Precision; PPV; Positive predictive valueTrue positives / Predicted positives, i.e. the fraction of true positives amongst all predicted positives
FDR; False discovery rateFalse positives / Predicted positives, i.e the fraction of false positives amongst all predicted positives
Population prevalencePositives / Population, i.e. the fraction of population that is positive
F1 score2 * Precision * Recall / (Precision + Recall), i.e. the harmonic mean of precision and recall

Reinforcement Learning

Multi-armed Bandits

TermDefinition
Action taken by the agent at time step t
Estimate of the expected reward from action a at time step t
The number of times the action ‘a’ has been played till time step t
Reward obtained on the i_th play of action ‘a’
UCBUpper-Confidence-Bound algorithm for action selection in bandits

Markov Decision Processes

TermDefinition
MDPMarkov Decision Process
State, reward, and action at time step t
Set of all valid states, rewards, and actions (in state ‘s’)
Finite MDPA MDP in which the set of all states, rewards and actions is finite
TrajectoryThe sequence of state, action, and rewards starting at some given state, undefined
State dynamicsAlso known as the dynamics of a MPD, given by the four argument function undefined. This function gives the probability of reaching a state s’ and receiving a reward r by taking an action a in state s.

Limit Theorems

Weak law of large numbers

The sample average converges in probability to its expected value.

Strong law of large numbers

The sample average converges almost surely to its expected value.

Central limit theorem

The average of a large number of IID random variables, when shifted by the population average and scaled by the square root of the number of variables, approaches a normal distribution. More precisely, let {x1, x2, …, xn} be n IID random variables such that E(xi) = μ, Var(xi) = σ2. Let Sn = (x1 + x2 + … + xn)/n, then √n(Sn – μ) → N(0, σ2)

Extreme value theorem

The maximum (minimum) of a large number of IID random variables, when centered and scaled properly, converges to a generalized extreme value distribution. The Weibull, Gumbel, and Frechet distributions are members of this family. The centering and scaling here are not as universal as the central limit theorem.

Bootstrap theorem (for sample mean)

Let {x1, x2, …, xn} be an i.i.d with sample mean X̅, sample variance S, and a defined population variance. Let X̅B denote the mean of a bootstrap sample drawn from {x1, x2, …, xn}. Then, √n(X̅B – X̅)/S approaches the standard normal distribution in the limit of large n.