Essential AI Papers to Explore for Staying Current
Written on
Chapter 1: The Importance of Reading in AI and Data Science
Artificial Intelligence is among the fastest-evolving domains within science and has emerged as one of the most desirable skills in recent years, commonly referred to as Data Science. The field encompasses a wide range of applications, typically categorized by input type: text, audio, image, video, or graph. It may also be classified by problem approach: supervised, unsupervised, and reinforcement learning. Keeping up with advancements can be quite overwhelming, leading to frustration. In light of this challenge, I offer a selection of reading recommendations aimed at keeping you informed about both contemporary and classic innovations in AI and Data Science.
Most of the papers mentioned focus on image and text, yet many of their underlying principles are broadly applicable and extend beyond mere vision and language tasks. For each recommendation, I outline reasons for reading (or revisiting) the paper and provide further materials for those interested in delving deeper into specific topics.
Before we begin, I wish to express my regrets to the Audio and Reinforcement Learning communities for not including works from these areas, as my familiarity with them is limited.
Section 1.1: AlexNet (2012)
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
In 2012, the authors introduced the use of GPUs to train a large Convolutional Neural Network (CNN) for the ImageNet challenge. This was a revolutionary step since CNNs were thought to be too cumbersome for such a large-scale problem. To everyone's astonishment, they secured first place with a Top-5 error rate of approximately 15%, surpassing the second-place team, which achieved around 26% using conventional image processing techniques.
Reason #1: While many are aware of AlexNet's historical significance, not everyone knows which techniques we currently employ were already established prior to its rise. You may find it intriguing how familiar several concepts introduced in this paper are, such as dropout and ReLU.
Reason #2: The proposed network boasted 60 million parameters—an astonishing feat for the standards of 2012. Nowadays, models with over a billion parameters are commonplace. Reading the AlexNet paper offers valuable insights into how developments have transpired since then.
Further Reading: To trace the evolution of ImageNet champions, consider reviewing the ZF Net, VGG, Inception-v1, and ResNet papers. The latter achieved superhuman performance and shifted the focus of subsequent competitions. Today, ImageNet primarily serves for Transfer Learning and the validation of low-parameter models.
AI Reading List (by Ilya Sutskever) - Part 1: This video discusses foundational papers in AI and their significance in the field.
Section 1.2: MobileNet (2017)
Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
MobileNet is recognized as one of the most notable "low-parameter" networks, making it well-suited for resource-constrained devices and enhancing real-time applications such as object recognition on mobile devices. The central concept behind MobileNet and similar models is to break down costly operations into a series of smaller, more efficient operations, which can be significantly faster and require fewer parameters.
Reason #1: Most of us lack the resources available to large tech companies. Grasping the principles of low-parameter networks is essential for creating models that are more cost-effective in terms of training and deployment. Personally, I have found that employing depth-wise convolutions can save you substantial amounts on cloud inference with negligible accuracy loss.
Reason #2: It is a common belief that larger models yield better performance. Papers like MobileNet demonstrate that there is much more to model efficacy than merely increasing filter count; elegance and efficiency are also crucial.
Further Reading: To date, MobileNet v2 and v3 have been released, introducing enhancements in accuracy and size. In parallel, many researchers have developed techniques for further reducing model sizes, such as SqueezeNet, while maintaining minimal accuracy loss.
How To Read AI Research Papers Effectively: This video offers strategies for efficiently navigating and understanding AI research papers.
Chapter 2: Key Papers in AI Development
Section 2.1: Attention is All You Need (2017)
Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.
This paper introduced the Transformer model. Previously, language models heavily relied on Recurrent Neural Networks (RNN) for sequence-to-sequence tasks. However, RNNs are notoriously slow and challenging to parallelize across multiple GPUs. In contrast, the Transformer model is built entirely on Attention layers, which assess the relevance of any sequence element to others. The proposed architecture not only achieved significantly improved state-of-the-art results but also trained much faster than prior RNN models.
Reason #1: Today, most new architectures in Natural Language Processing (NLP) stem from the Transformer. Models like GPT-2 and BERT are at the cutting edge of innovation. Understanding the Transformer is crucial for comprehending subsequent NLP models.
Reason #2: Most transformer models consist of billions of parameters. While literature on MobileNets discusses efficient model design, NLP research concentrates on optimizing training efficiency. Together, these perspectives provide a comprehensive toolkit for efficient training and inference.
Reason #3: Although the transformer model is predominantly associated with NLP, the proposed Attention mechanism boasts broad applicability. Models such as Self-Attention GAN illustrate the utility of global-level reasoning across various tasks. New research on Attention applications emerges regularly.
Further Reading: I highly recommend the BERT and SAGAN papers. The former extends the Transformer model, while the latter applies the Attention mechanism to images within a GAN framework.
Section 2.2: Stop Thinking with Your Head / Reformer (~2020)
Merity, Stephen. "Single Headed Attention RNN: Stop Thinking With Your Head." arXiv preprint arXiv:1911.11423 (2019).
Kitaev, Nikita, Łukasz Kaiser, and Anselm Levskaya. "Reformer: The Efficient Transformer." arXiv preprint arXiv:2001.04451 (2020).
While Transformer/Attention models have gained significant attention, they tend to be resource-intensive and not well-suited for typical consumer hardware. Both papers critique the architecture and propose computationally efficient alternatives to the Attention module. As with the discussion surrounding MobileNet, elegance is key.
Reason #1: "Stop Thinking With Your Head" is an entertaining read, which in itself is a valid reason to explore it.
Reason #2: Large corporations can quickly scale their research across hundreds of GPUs, but most of us cannot. Enhancing model size is not the only route to improvement—efficiency in utilizing existing resources is equally important.
Further Reading: As these papers were published in late 2019 and 2020, there is limited related literature. Consider revisiting the MobileNet paper for additional insights on efficiency.
Section 2.3: Human Baselines for Pose Estimation (2017)
Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple baselines for human pose estimation and tracking." Proceedings of the European conference on computer vision (ECCV). 2018.
In contrast to most papers that propose novel techniques to enhance state-of-the-art performance, this paper posits that a straightforward model employing current best practices can yield surprisingly effective results. They introduced a human pose estimation network that relies solely on a backbone network followed by three de-convolution operations. At the time, their method was the most effective for the COCO benchmark despite its simplicity.
Reason #1: Simplicity can often yield the most effective results. While we may be tempted to pursue intricate and flashy architectures, a baseline model might be faster to implement and still achieve comparable outcomes. This paper serves as a reminder that not all effective models must be complex.
Reason #2: Scientific progress occurs incrementally. Each new study advances the state-of-the-art, but it doesn't always have to be a linear journey. Sometimes it's beneficial to reevaluate and explore alternative paths.
Reason #3: Proper data augmentation, effective training schedules, and robust problem formulation often hold more significance than many acknowledge.
Further Reading: If you are interested in Pose Estimation, consider reviewing this comprehensive state-of-the-art analysis.
Section 2.4: Bag of Tricks for Image Classification (2019)
He, Tong, et al. "Bag of tricks for image classification with convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
Often, what one needs isn't a groundbreaking new model but rather a set of practical tricks. In many papers, just one or two new techniques may lead to slight performance improvements, yet these are frequently overlooked amid major contributions. This paper compiles a collection of tips used throughout the literature, summarizing them for our benefit.
Reason #1: Most tips are simple to apply.
Reason #2: There is a high likelihood that you are unaware of several of these techniques; they are not the typical "use ELU" suggestions.
Further Readings: Numerous other tricks exist, some tailored to specific problems while others are more general. A topic that deserves more attention is the use of class and sample weights. Consider reading this paper on class weights for unbalanced datasets.
With these key papers and further reading suggestions, I believe you have a wealth of material to explore. This list is by no means exhaustive, but I have aimed to highlight some of the most insightful and influential works I have encountered. Please share any additional papers you think should be included.
Happy reading!