Protecting Memory in Microprocessor Systems


Memory Errors and Protection in Advanced Microprocessor Systems

When Intel launched the first microprocessor, the Intel 4004, in 1971, little did the world know what a drastic change this technology would have on our lives. Microprocessors grew faster and more powerful during the decades that followed, while also becoming much smaller and more affordable. Today, microprocessors are the lifeblood at the heart of our digital world.

Multiple microprocessors are at the core of every large data center and supercomputer at work in business and industry. For example, Watson, IBM’s Jeopardy-playing supercomputer, includes 2,800 microprocessors and can calculate up to 80 trillion operations a second. Each of those microprocessors has onboard memory in the way of RAM and ROM, while also being part of a larger system with additional memory devices.

Wherever memory is found – on the microprocessor or in SRAM or DRAM memory modules within the system – there is a risk of errors. Memory errors can be either hard or soft. Hard errors are caused by manufacturing defects, connection issues, or other physical hardware problems. Soft errors, on the other hand, are random occurrences caused by electrical disturbances from nearby components. While soft errors are more frequent, they are not as likely to cause a serious break.

As applications become increasing complex and memory-intensive, the risk of memory errors also grows. Operators of multiple microprocessor systems are required to add memory capacity to their servers to support these apps, while also expanding the storage density of the memory devices themselves. More memory equals the possibility for more errors. This is where the concept of memory protection comes into play.

The basic strategy behind memory protection is to manage how memory is allowed to be accessed by the system. Memory management proactively helps to avoid storage violations that can potentially damage a portion of the memory or the data that it contains. This type of memory protection can prevent any single application from taking control of a large amount of memory and causing damage to other apps, memory, or data on the system.

While some memory management and protection features may be built in to the operating system, other more sophisticated memory protection strategies can be implemented. These include:

Online Spare Memory Mode

Online spare memory mode allocates a portion of memory to be put aside and made unavailable for normal usage. It sits idle unless and until another channel in the system begins to experience a high rate of memory errors. At this point, data is copied to the spare memory channel while the affected channel is taken offline. There is typically no interruption to system service.

Mirrored Memory Mode

In mirrored memory mode, the exact same data is written to two channels at the exact same time. If one of the memory channels experiences an unrecoverable error, the system looks to the mirrored channel for what it needs. The problem with mirrored memory mode is that uses 50 percent more memory.

Lockstep Memory Mode

This strategy uses two memory channels, working in “lockstep” as a single memory channel. Error detection and correction can happen concurrently when using lockstep memory mode. It is the most reliable memory protection option, yet, like mirrored memory mode it utilizes more resources and can cut memory capacity by up to one-third.