A Perspective on Future Computer Architecture is some conjecture on where computer architecture is headed. In particular, much of the superscalar control hardware complexity can be moved into software.
A Perspective on DRAM and Packaging is some conjecture on what would make for an effective year 2000 DRAM and packaging system. Last updated 5-5-04.
A Subsettable Top Level Cache is the text of an issued patent, #6092153. Please search at http://www.uspto.gov for the as issued version.
Rationale for Multiprocessors on a Chip
The String View of the Memory Hierarchy Takes the viewpoint that all data transfers in a memory hierarchy are string transfers, from data files on disc on up to the top level cache.
Where to do Memory Mapping (MMU) argues that the MMU (Memory Management Unit) function should be lower in the memory hierarchy, be implemented more like a file system and be mostly in software.
Subsets, A Way To Manage Cache Memory Contents Describes how to prevent most thrashing, prefetch data and make effective use of cache memory.
Another trend is more instructions executed per DRAM cycle time. This makes an efficient cache design all the more important.
Conditional Branches, Interlacing the Paths After describes how a simple instruction fetch unit can provide both paths after a conditional branch.
Future of Computing, It's Multiprocessors describes how multiprocessors will be the optimum path to the compute power needed for low cost graphics.
Hardware Software Codesign Hardware software codesign for demanding applications will often require an efficient memory hierarchy. In many cases, an effective codesign can be implemented as a function added to the arithmetic unit. For example, the MMX extensions to the Pentium can be viewed in this way.
A Not Yet Widely Learned Lesson From the Past (on caches) describes experience with demand paged time-sharing systems of ~30 years ago.
Depending on the application, code and data compression can provide a significant reduction in the size of DRAM and/or flash memory.
On a multiprocessor chip, the individual processors can be relatively simple, as compared to coming highly superscalar designs. For applications in which there is enough parallelism to keep multiple processors busy, I expect that the multiple processors on a chip approach to be 2-3 times faster than the single processor approach. Graphics applications have a lot of parallelism.