PHASE II: Undervolting and Stress-testing for stability
Undervolting is a very repetitive activity, as every drop in voltage requires stress-testing to ensure its stability. There are some shortcuts though: my prior experience with undervolting has resulted in some insight with regards to how low a CPU can be safely undervolted at a given clock speed.
But first, let's take a look at the PHC controls and what the numbers mean
On a terminal window, type:
Code: Select all
cat /sys/devices/system/cpu/cpu*/cpufreq/phc_default_controls
The output on my X61 with T8100 is:
Code: Select all
75:41 74:34 8:28 6:23 136:17
75:41 74:34 8:28 6:23 136:17
It's a sequence of pairs of numbers. Each pair represents a Frequency ID (FID) and a Voltage ID (VID).
The FIDs correspond to the frequencies found in /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies, so run
Code: Select all
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies
to show those frequencies. In my X61, this outputs:
Code: Select all
2101000 2100000 1600000 1200000 800000
2101000 2100000 1600000 1200000 800000
To be honest, the FIDs aren't that important, as the clock speed will be controlled by manipulating /sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq.
As for the voltage IDs, there's another set of controls for it.
/sys/devices/system/cpu/cpu0/cpufreq/phc_default_vids contains the default VIDs for cpu0. It can be read by the cat command:
Code: Select all
cat /sys/devices/system/cpu/cpu0/cpufreq/phc_default_vids
Output in my X61:
As can be seen, these are the same numbers in the VID part (the number after the colon) of
phc_default_controls.
75:
41 74:
34 8:
28 6:
23 136:
17
/sys/devices/system/cpu/cpu0/cpufreq/phc_vids shows the current VIDs for cpu0, and can be edited to control the voltages. This can be done by an
echo command to
phc_vids/i]:
The following code is an example, do not run it!!
Code: Select all
echo 40 33 27 22 17 > /sys/devices/system/cpu/cpu0/cpufreq/phc_vids
Note that this only changes the VIDs in cpu0. In practice, voltage controls for all cores are changed at the same time, which can be done using a for loop.
Again, the following code is an example, do not run it!!
Code: Select all
for i in 0 1 ; do echo 40 33 27 22 17 > /sys/devices/system/cpu/cpu$i/cpufreq/phc_vids; done
Now, new voltages can be entered, but the CPU won't necessarily adhere to it. Another program is needed to read the CPU's registers to check its status: read_msr, which can be found here:
http://www.linux-phc.org/forum/viewtopic.php?f=13&t=13
Download read_msr_0.2pre3.tar.bz2, unpack it to a directory of your choice, open a terminal window to said directory, then run:
in order to make it executable. This program requires root privileges, so
su or
sudo is required.
To check the FID/VID status, run:
Sample output from my X61:
Code: Select all
MSRTOOL V0.2pre-3 started...
[cpu1] [CURRENT] FID:8 HID:0 DID:0 VID:17
[cpu1] [TARGET] FID:8 HID:1 DID:0 VID:17
[cpu1] [HIGHEST] FID:10 (HID:0 DID:1) VID:34 (not sure if they exist here)
[cpu1] [LOWEST] FID:6 (HID:0 DID:0) VID:23 (not sure if they exist here)
[cpu1] [SLFM] FID:8 VID:17
[cpu1] [IDA] FID:11 VID:41
[cpu1] [CURRENTLY ACTIVE FEATURES] IDA:0 EIST:1
[cpu0] [CURRENT] FID:8 HID:0 DID:0 VID:17
[cpu0] [TARGET] FID:8 HID:0 DID:0 VID:17
[cpu0] [HIGHEST] FID:11 (HID:0 DID:1) VID:41 (not sure if they exist here)
[cpu0] [LOWEST] FID:6 (HID:0 DID:0) VID:23 (not sure if they exist here)
[cpu0] [SLFM] FID:8 VID:17
[cpu0] [IDA] FID:11 VID:41
[cpu0] [CURRENTLY ACTIVE FEATURES] IDA:0 EIST:1
Concentrate on the [CURRENT] and [TARGET] lines. In the course of my experiments, I tried to set the VID to 0, and it appeared in the [TARGET] field, but the [CURRENT] showed a different value. Usually, it can only go as low as the VID in the lowest clock speed, which means it cannot be undervolted any further. This is the reason why my method begins at the second lowest frequency.
Next is the final piece of software needed: the stress tester.
I use mprime, which can be found here:
https://www.mersenne.org/download/
Grab the 64-bit if you're running 64-bit Linux, 32-bit otherwise.
Unpack the archive to a directory of your choice, then open a terminal window on said directory.
Now, let's get on with the actual undervolting!!!
First off, we need the following windows open:
1. root terminal window on read_msr.py's directory (if you didn't close it) [henceforth Terminal1)
2. terminal window for mprime (we just did that) (henceforth Terminal2)
3. hardware monitor software to keep track of temperatures (I use psensor)
4. (optional) another root terminal window for clock/voltage control. Doing so compartmentalizes the controls, allowing a clearer view of the current status at any given time. (henceforth Terminal3)
As I've mentioned earlier, the undervolting and stress-testing part is repetitive and time consuming. Set aside an entire afternoon for this.
Another thing: get a pen and paper and write down the default VIDs in a single row. Then as the undervolting proceeds, jot down the VIDs tested for the current clock speed in a column below the default VID. At the start, it will look like:
As the undervolting reaches the higher clocks, it will look more like:
Code: Select all
41 34 28 23 17
37 30 26 20
34 28 24 19
31 26 22 18
29 25 21
27 19
Now we're ready.
1. Check the available clock speed. Do this on either Terminal1 or Terminal3:
Code: Select all
cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies
Select the second to the last frequency, then set it as the maximum frequency:
Code: Select all
[code]for i in 0 1 ; do echo (put frequency here and remove the parentheses) > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_max_freq; done
Get the current VIDs
Code: Select all
cat /sys/devices/system/cpu/cpu0/cpufreq/phc_vids
Example from my X61:
2. Now, to undervolt, chose a VID slightly lower than the current one. Since we're starting at the second-lowest frequency, this corresponds to the second to the right VID. But how low can we go? Can shortcuts be done? My experience says yes(!!)
Penryn CPUs (X61 with T8xxx or T9xxx CPUs) are quite frugal with voltages. In my experience, they can use the lowest VID (the rightmost one) up to 1.6GHz. After that it's a +2 or +3 VIDs for every succeeding clock speed.
Merom CPUs (X60/X61 with T5xxx or T7xxx CPUs) aren't as frugal. and I started with subtracting 3 from the default VID as a starting point, 2 for the next, then 1 per round until the stress-test fails. When it fails, I add 2 to the VID and stress-test again. Once I'm satisfied, I move on to the next clock speed and repeat the cycle again.
Now, back to undervolting...
Change only the second-to-the-last VID as it corresponds to the second-to-the-slowest clock speed, which we're testing at this stage:
Going back to my example from my X61:
We change it to
Then feed the entire line to the command below:
Code: Select all
for i in 0 1 ; do echo (put VID list here minus the parentheses) > /sys/devices/system/cpu/cpu$i/cpufreq/phc_vids
3. On Terminal 2, start mprime:
On first run, it will ask to join Gimps. Answer N because we're just stress-testing.
Next it asks for the number of torture test threads to run. It auto-detects the number of CPU cores so just press Enter.
Next it asks for the type of torture test to run. Select #1. and press Enter.
Next it asks whether to customize settings, answer N. It also asks to run weaker torture tests, answer N.
Finally it asks to accept all the above answers, answer Y.
Then it starts stress-testing.
While testing, go to Terminal1 and check the VIDs with
The [CURRENT] and [TARGET] lines must be the same, especially the FIDs and VIDs. This means that the CPU is running at the specified clock speed and using the intended voltage.
Keep the stress-test going for at least 30 minutes--for the lower speeds I even test up to an hour; because of the low clock speed, the temperature shouldn't get high enough to be cause worry. If the CPU temp exceeds 70c at low clock speeds, then the heatsink may need cleaning and repasting.
If the stress-test shows no error after 30 minutes-1 hour, it's time to stop the stress-test.
Go to Terminal2 then press Ctrl-C. mprime will then show this menu:
Code: Select all
Hit enter to continue:
Main Menu
1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Continue
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Preferences
15. Options/Torture Test
16. Options/Benchmark
17. Help/About
18. Help/About PrimeNet Server
Your choice:
Leave it as it is for now.
5. Now the VID can be lowered again. Go back to Terminal1, and lower the VID again as described in Step 2.
6. To restart the stress-test, go back to Terminal2, Type 15, then press Enter. Again, it will ask the same questions starting from the number of torture test threads to use. While stress-testing, run
again to verify that the new VID is accepted. Don't forget to jot down the VIDs on that piece of paper.
NOTE
When the undervolt is unstable, there will be symptoms aside from mprime spitting out an error. Crashes, hangs, and even sudden restarts can be experienced. That's why it's best to undervolt on a system without any important data in it, as a sudden restart may cause data corruption on the hard drive.
7. At this point, that's pretty much it. Once the lowest stable voltage is found, we can opt to add 1 or 2 to that as a safety basket, then move on to the next frequency. As we go up the frequency ladder, the system temperature will also increase. If it reaches 90c at any point, cancel the stress-test. The heatsink may need cleaning and repasting, or do the test in a colder room.
IMPORTANT NOTE
On CPUs with the odd max frequency like 2101000, 2501000-- in other words, those that end in 1000, it's a difficult to trully test the stability at that frequency. It's actually the IDA (Intel Dynamic Acceleration), which is a crude version of the Turbo Boost available in the mobile Core i5/i7 CPUs. It's crude because it only activates when one core is busy and the other one is idle. Because of this, it's hard to force it to activate while running a stress-test without jumping through a few hoops.
Said hoops mean installing the msrtool package, then creating a script file:
Code: Select all
#!/bin/bash
# Enable-IDA
# YMMV
# disable EIST
# https://askubuntu.com/questions/619875/disabling-intel-turbo-boost-in-ubuntu
wrmsr -p0 0x1a0 0x4000850089
wrmsr -p1 0x1a0 0x4000850089
# enable Dual-IDA
# http://forum.notebookreview.com/threads/how-to-enable-intel-dynamic-acceleration-ida-on-both-cores-of-a-core-2-duo.477704/page-48
wrmsr 0x1a0 0x1364862489
echo 0 > /sys/devices/system/cpu/cpu1/online
rdmsr -p0 0x198
wrmsr 0x199 0xa24
wrmsr 0x1a0 0x5364872489
wrmsr 0x1a0 0x1364862489
rdmsr -p0 0x198
echo 1 > /sys/devices/system/cpu/cpu1/online
rdmsr -p0 0x198
rdmsr -p1 0x198
Open a text editor, copy-paste the code above, then save it as "dual-ida-enable.sh" to a directory of your choice, preferably the same one with read_msr.
Like with read_msr, make it executable with:
And like read_msr, this requires root privileges to run.
This code effectively disables all the other clock speeds and locks it to the IDA clock. The only way I know to revert this is to reboot the machine.
Check the VIDs, edit the leftmost one, then start stress-testing while checking with read_msr to see if the undervolt was accepted.
Monitor the temperature as well, as it's definitely going to be hot!!
Happy undervolting!!