After playing around with the individual drive and array defaults I found that the performance can be improved substantially with a few easy tweaks.
Useful resources
I have looked long and hard for formulas in that can be used to obtain results that make sense but ultimately it comes down to testing each value and doing a benchmark test in order to 'measure' the difference it makes in your environment.
A few settings that can be adjusted are listed below. I list them in the order in which I would apply the settings during benchmarking testing, starting with the settings with the biggest impact and ending with the settings with smaller impact according to my findings.
Settings applied to the mdadm RAID array with defaults on my system (Ubuntu 12.04):
Command to apply setting | Default value | Tweaked value | Description |
---|---|---|---|
Blockdev --setra 20480 /dev/md126 | 8192 | 102400 | Read-ahead |
echo 5120 > /sys/block/md126/md/stripe_cache_size | 256 | 5120 | Stripe-cache size |
echo 100000 > /sys/block/md126/md/speed_limit_max | ? | 100 000 | max speed |
It is important to keep in mind that the stripe_cache_size will use a portion of RAM. For example a mdadm RAID array such as mine will use:
stripe_cache_size * block some * number of disks
=32768 * 4k * 4 (active disks)
=512MB of RAM
In my case I have 4GB of RAM and the functions performed on the machine are pretty basic so it is of little concern.
Settings applied to each drive with defaults on my system (Ubuntu 12.04)
Command to apply setting | Default value | Tweaked value | Description |
---|---|---|---|
Blockdev --setra 20480 /dev/md126 | 8192 | 102400 | Read-ahead |
echo 1 > /sys/block/sdX/queue/queue_depth | 31 | 1 | NCQ Queue Depth |
echo 64 > /sys/block/sdX/queue/nr_requests | 128 | 64 | Nr of requests |
echo deadline > /sys/block/sdX/queue/scheduler | default noop deadline [cfq] | deadline | Scheduler |
After hours of testing and a massive spreadsheetI have values that provide substantial performance gains. Here are some benchmark tests.
Before I apply the values persistently I will reset the values by restarting the machine..
The key benchmark with the iozone test is Stride Read, so we compare that now. 2657627 before vs 2818608 after. dd test, 150 MB/s before and 236 MB/s after.
Let's look at the bonnie output. I will spend the most energy on this as I think this is the most informative benchmark: As expected, we see that the Sequential block output is similar to the dd output. Doing dd tests have actually been redundant as the results are also contained in the bonnie output as well, but for the sake of thoroughness I did both tests.
Sequencial block input or read isn't much improved by the tweaking. This is not ideal, although it i reality unfortunately. Read and writes are a balancing act as a read performance improvement will mostly cause a reduction in write performance.
Sequential block rewrite is reading data and then writing it, so it is essentially the reading and writing performance combined. In this case, 103900 with defaults and 136714 with the tweaks in place.
Random seeks are how many random blocks bonnie can read, in this case 519 vs 404.
The +++++ means that the measurement is fast too the point where the error margin is a sizeable percentage of the measurement and the result is therefore inaccurate.
Here is a discussion on a script that tweaks the mdadm RAID array automatically. I used this as a reference although I found some of the settings mentioned here not to make a great difference.
http://ubuntuforums.org/showthread.php?t=1916607
I found that the best way to tweak was to choose some baseline value based on manual tweaking and testing and from there run through a number of values on one setting and compare them to each other. I used the following script to save some time:
In order to analyse the output I used a simple greps like below. I used screen to run the benchmark and logged all the screen output with the -L option.
Importing this into Excel and using the conditional formatting makes digging through the number easier. It is clear from the numbers below that there will not be a size fits all solution. The Sequential input and output is an example of settings that play off against each other.
Another interesting observation is the large impact the /sys/md126/md/queue/scheduler setting has on the sequential block input and output.
Also, it is useful to note that more cache isn't always better
My choice has been made and I will use this script below to configure it after reboot.
In the next post I plan to implement these values persistently.
I found that the best way to tweak was to choose some baseline value based on manual tweaking and testing and from there run through a number of values on one setting and compare them to each other. I used the following script to save some time:
In order to analyse the output I used a simple greps like below. I used screen to run the benchmark and logged all the screen output with the -L option.
Importing this into Excel and using the conditional formatting makes digging through the number easier. It is clear from the numbers below that there will not be a size fits all solution. The Sequential input and output is an example of settings that play off against each other.
Another interesting observation is the large impact the /sys/md126/md/queue/scheduler setting has on the sequential block input and output.
Also, it is useful to note that more cache isn't always better
In the next post I plan to implement these values persistently.
Nice work. I'm thinking about the performance drop seen when 16384 and 32768 are being used as values for stripe_cache_size. Could it be your system started swapping at that point?
ReplyDelete