Last revised 12-14-98. (minor)
Circa 1966, most programs ran on computers without virtual memory. Load modules were read in and then execution began. A sequentially read file would typically be buffered such that much of the disc latency was masked by computation.
Then came time-sharing systems with demand paged virtual memory. It was hyped as the future.
Circa 1970 it was realized that demand paging had real performance disadvantages. Each page fault lead to a disc latency. The pages in a load module would be brought in a page at a time, each with a latency. This usually took much longer than when a load module file was brought in in a single operation. Also, a data file on disc could be mapped in to the virtual address space, and when read sequentially, there was a disc latency for each page fault. There was no buffering.
Back then, they reverted to what they had been doing before paging, i.e. fully loading a load module, then beginning execution and using buffering for sequentially read files. Today, I think that the analogous thing is to treat the lowest level SRAM cache as main memory and the DRAM memory as a peripheral.
Many of today's cache designs use the equivalent of demand paging. For efficiency, accesses to the next level lower memory should be grouped to minimize latency. Also, what's overwritten in the current level memory should be data (or code) that won't be needed for some time. This is often different than the least recently used data (or code).
To Computer Architecture Page