Here I list some books/papers/blogs etc. that I have read recently and found interesting. I try to link the source as much as I can. I hope this might provide some food for thought to a kindred spirit.
- [Research Paper] Network A/B Testing: From Sampling to Estimation, Huan Gui, Ya Xu, Anmol Bhasin, Jiawei Han, Proceedings of the 24th International Conference on World Wide Web May 2015 Pages 399–409. (link, pdf)
The authors discuss the problem of A/B testing in real social networks. They demonstrate the presence of network effects, and suggest a random cluster based partitioning algorithm that help reduce the variance and bias of ATE (Average Treatment Effect).
- [Research Paper] Network bucket testing, Lars Backstrom, Jon Kleinberg, Proceedings of the 20th international conference on World wide web March 2011 Pages 615–624 (link, pdf)
The authors discuss the problem of running A/B tests and sampling without bias and with a low variance in densely connected graphs, where the user behavior depends on the variant that their neighbors see. They propose a few methods to address this problem
- [Research Paper] Partitioning Nominal Attributes in Decision Trees, Don Coppersmith, Se June Hong and Jonathan R.M. Hosking, Data Mining and Knowledge Discovery volume 3, pages197–217 (1999) (pdf)
The paper discusses a criteria for optimal splits in categorical variables in multiclass classification trees. The optimal criteria is computationally too expensive, and the authors suggest a PCA based approximation that is linear in the total number of classes.
- [Review Paper] Learning from Imbalanced Data, Haibo He, and Edwardo A. Garcia, IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 9, September 2009 (link, needs access; pdf of a masters thesis which forms the basis of the paper)
A detailed discussion of the problem of class imbalance in classification problems, and a review of the techniques to handle class imbalance.
- [Research Paper] Improving the Sensitivity of Online Controlled Experiments: Case Studies at Netflix, Huizhi Xie and Juliette Aurisset, KDD ’16 August 13-17, 2016, San Francisco, CA, USA (pdf)
The authors present a discussion of the three variance reduction techniques, namely stratified sampling, post-stratified sampling, and CUPED. They present data from Netflix experiments about the efficacy of each technique. It is found that CUPED and post-stratified sampling outperform simple stratified sampling because of limitations of realtime online assignment. All variance reduction techniques outperform simple random sampling.
- [Research Paper] The Netflix Recommender System: Algorithms, Business Value, and Innovation, Carlos A. Gomez-Uribe and Neil Hunt, ACM Trans. Manage. Inf. Syst. 6, 4, Article 13 (December 2015), 19 pages.
The authors describe various recommendation tasks at Netflix, such as the personalized video ranker (PVR), the top-N ranker, the trending now, the continue watching, the because-you-watched ranker, page generation, evidence presentation, search, etc. Next they describe the business value generated by these recommenders, which is the net revenue. The revenue is directly proportional to the number of active users, which is influenced by user acquisition, retention, and re-activation. Since most experiments influence the experience of the currently active users, the focus here is on retention. However, retention is harder to measure, and is more long term, so they focus on more measurable and medium term metrics. The metrics quantifying the health of the recommendation system could be things like the effective catalogue size, the take rate, and other more classical recommender metrics such as MRR, MAP, NDGC etc. They also look at user engagement metrics such as the average time to play, days without play, number of videos watched per user, video abandonment rate, etc. From a business point of view they focus on the hours of video played, but with an appreciation of the diminishing returns. They present a discussion of the possible network effects as well.
- [Blog] Innovating Faster on Personalization Algorithms at Netflix Using Interleaving, Joshua Parks, Juliette Aurisset, Michael Ramm, Nov 2017.
The authors describe the interleaving technique and its applications in experimentation at Netflix.
- [Research Paper] Large-scale validation and analysis of interleaved search evaluation, O Chapelle, T Joachims, F Radlinski, ACM Transactions on Information Systems March 2012 Article No.: 6 (link, pdf)
- [Research Paper] BLEU: a Method for Automatic Evaluation of Machine Translation, Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu, Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318. (pdf)
The original research paper that introduced the BLEU metric for NLP. I find BLEU to be very reminiscent of the cluster expansion method from quantum mechanics. In cluster expansion we expand the energy in a series of increasing order of correlation function, which are very analogous to expanding a measure of accuracy in terms of n-gram accuracy.
- [Research Paper] A Call for Clarity in Reporting BLEU Scores, Matt Post, Amazon Research, Berlin, Germany. (pdf)
The author proposes a standardization of the BLEU metric so that results from different publications can be compared apples to apples.
- [Blog] Evaluating Text Output in NLP: BLEU at your own risk, Rachael Tatman (link)
A very readable discussion of the pros and cons of the BLEU metric.
Sorry, didn’t keep track of stuff. Also, traveled back from India to US. Its a real task in this covid madness.
- [Book] Reinforcement Learning: An Introduction, Second edition. Richard S. Sutton and Andrew G. Barto. The MIT Press. Cambridge, Massachusetts (pdf)
A very readable book on reinforcement learning. The authors do not shy away from mathematical details where necessary, but the focus is certainly on readability. Detailed proofs are omitted. Written by leaders in the field, the book provides a thorough overview of the reinforcement learning landscape.
- [Coursera Specialization] Reinforcement Learning Specialization, Alberta Machine Learning Institute, University of Alberta.
There are four courses in this specialization taught by Profs. Martha White and Adam White (yes, they are related). The courses are a great resource if you want to get a good introduction to RL. They follow the book liked above. You can audit all courses, but will have to pay if you want to get the graded exercises and the certification. The money is well spend if you can spare it.
March, Apr 2020
Not much reading; got busy with some personal stuff.
- [Book Chapter] Confidence Intervals vs Bayesian Intervals, Jaynes E.T., Kempthorne O. Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, vol 6b. Springer, Dordrecht (pdf)
Jaynes provides a physicists perspective on statistics. Interesting read.
- [Research Paper] Testing a Point Null Hypothesis: The Irreconcilability of P Values and Evidence, Burger, J. O. and Sellke T., Journal of American Statistical Association, Volume 82, 1987 – Issue 397. (pdf)
A very pointed critique of the frequentist hypothesis testing methodology, with fascinating examples.
- [Research Paper] Bayesian Estimation Supersedes the t Test, Kruschke, J. K., Journal of Experimental Psychology: General, 2013, Vol. 142, No. 2, 573– 603. (pdf)
A detailed comparison of Bayesian and frequentist testing frameworks with links to working code and numerical examples.
- [Research Paper] The fallacy of placing confidence in confidence intervals, Morey, R. D., et. al. Psychonomic Bulletin & Review volume 23, pages103–123 (2016) (link)
A critique of the indiscriminate use of confidence intervals. Demonstrates the counterintuitive properties of confidence intervals with very accessible examples.
- [Research Paper] The Earth Is Round (p < 0.05), Cohen, J. American Psychologist, Vol. 49, No. 12, 997-1003 (pdf).
A scathing critique of NHST (Null Hypothesis Significance Testing). A good portion of the paper is dedicated to NHST as practiced by behavioral sciences and psychology (apparently they do not even report confidence intervals), but there is good amount of general discussion.
- [Blog] Is Bayesian A/B Testing Immune to Peeking? Not Exactly, Robinson, D. (link)
Discusses the exploration of Bayesian testing at Stack Exchange. A good reference for Bayesian testing in general.
- [Blog] The Power of Bayesian A/B Testing, Frasco, M. (link)
Discusses the use of Bayesian testing at Convoy (a unicorn, and a great startup in general)
- [Unpublished manuscript] Bootstrap: A statistical method, Singh, K., Xie, M., Rutgers University (pdf).
Discusses the practical aspects of bootstraps, with a small section on the theoretical underpinnings.
- [Lecture notes] Bootstrap: Why it works? (A lecture in Advanced Statistical Inference, Stat 613, Texas A&M University), Rao, S. S. (pdf).
Discusses the theoretical aspects of why bootstraps work.
- [Research Paper] Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations, Greenland, S. et. al. Eur. J. Epidemiol (2016) 31:337–350 (pdf).
Discusses 25 misrepresentations prevalent in statistical testing literature. Great read for all practitioners of statistical testing.
- [Research Paper] The google file system, Ghemawat, S. et. al. Proc. ACM Symp. OSP (2003) 29-43 (pdf).
Describes the google file system in great detail. Great read for folks interested in distributing computing and big data.
- [Research Paper] An introduction to ROC analysis, Fawcett, T., Pattern Recognition Letters, Vol. 27 Issue 8, 2006, pp. 861-874, ISSN 0167-8655 (pdf).
Very readable introduction to ROC and AUC. Goes in reasonable depth; explores theoretical as well as numerical aspects.
- [Book] The emperor of all maladies, Mukherjee, S.
A compelling account of humanity’s eons long battle against cancer. This book is as much about cancer, as it is about the process of scientific discovery. It tells how science is driven in equal parts by careful and painstaking research, chance discoveries, blunderous mistakes, heroes sung and unsung, flashes of genius, and centuries of being blind to the obvious. One of the best books I have read.