Page 1 of 1
Bad Memory Gone Good?
Posted: Sun Jun 12, 2011 11:09 pm
by elray
Last week one of my X31's started acting strange, locking up - not during heavy compute cycles or disk writes, and not similar to WiFi problems I've see in the past, fan blowing hot air at reasonable intervals.
I loaded UBCD to run the Windows Memory Diagnostic; I was not surprised to see failures quickly for the "LRAND, Stride6 and WMATS+" tests, showing consistent single-bit errors, while the "WINVC" succeeded. Not that I know what any of that means - when I googled the results, I still didn't. So I let the test continue to run.
120 hours / 6 days later, now the memory passes all four tests.
Can SDRAM heal itself?
Re: Bad Memory Gone Good?
Posted: Sun Jun 12, 2011 11:15 pm
by Harryc
I would say that it is not likely that RAM would 'heal itself'. It is more likely that you have a dirty or loose contact. Take the module's out and clean the contacts with a white pen eraser.
Re: Bad Memory Gone Good?
Posted: Mon Jun 13, 2011 2:01 am
by rkawakami
Memory really cannot heal itself, although there are certain conditions which may introduce failures or alternately, "fix" them. Besides what Harry mentions, other factors can be temperature, system voltages and even the data that's stored in memory. Given that it appears you didn't to anything physical to disturb the memory module(s), I'd have to say you experienced an intermittent memory error. The Windows Memory Diagnostic, like most such programs, is designed to run a series of different tests in order to stress the memory. Particular addressing and data sequences lend themselves to detecting various failure mechanisms. In the case of WMD, I could not find any descriptions of the tests (patterns) that they use. However, I can only assume that MS is using fairly standard memory test algorithms (March or Moving Inversion patterns), along with a combination of data sequences in an effort to adequately test the memory.
If you saw single-bit errors and if they always occurred at the same address, then I'd have to say that it's a "weak" bit. Such errors generally are caused by the sequence of data that is used and/or the temperature the module is operating at. A contact problem would be seen as multiple failures, typically on the same data bit (if the contact problem were on a data input/output pin) or a whole block of addresses (if the contact issue were on an address or clock pin). If it were me, I'd toss the offending module as the error might come back to bite you again. This assumes you are able to repeat the error. In order to cover all of the bases, I'd run the diagnostic with the system in the coldest area you would ever use it in, along with the warmest. There are different failure modes in DRAMs that can occur when the memory is either cold or hot.
Also, I would not suggest using a pen eraser. That can be abrasive on the contacts. Use a pencil (soft rubber) eraser as that is usually able to get rid of any dirt/contamination without scrubbing the gold traces on the edge of the module too hard.