Computer Science
- Teach Yourself CS: I especially recommend the computer architecture course
- A Relational Model of Data for Large Shared Data Banks: Edgar F. Codd demonstrates formal logic and set theory can be used to ensure data integrity. This paper led to SQL and essentially all modern RDBMSs.
- Fast Algorithms for Mining Association Rules: Seminal research on how to discover associations between transactions in massive datasets.
- The Hadoop Distributed File System: Storing massive datasets on distributed clusters.
- Resilient Distributed Datasets: The core distributed computing abstraction used in Apache Spark.