Sleeping Beauty (http://shira-chan.deviantart.com/art/Twisted-Fairy-Tale-131506604)
介紹一下原始kernel time system與tickless system系統之原理與優缺點。
i.e.
CONFIG_NO_HZ
首先認識Linux time system:
The original kernel timer system (called the "timer wheel") was based on incrementing a kernel-internal value (jiffies) every timer interrupt. The timer interrupt becomes the default scheduling quamtum, and all other timers are based on jiffies. The timer interrupt rate (and jiffy increment rate) is defined by a compile-time constant called HZ. Different platforms use different values for HZ. Historically, the kernel used 100 as the value for HZ, yielding a jiffy interval of 10 ms. With 2.4, the HZ value for i386 was changed to 1000, yeilding a jiffy interval of 1 ms. Recently (2.6.13) the kernel changed HZ for i386 to 250. (1000 was deemed too high).
Linux原始核心的時間系統(又稱為"timer wheel"),是藉由time interrupt來遞增一個Kernel內部的值(jiffies)達成的。這個time interrupt是系統排程的根本,其他的時間都是基於jiffies衍伸出來的。Time interrupt的頻率(或者說是jiffies的增加量)是藉由定義一個固定數HZ(編譯時就必須決定)。不同的開發平台會有不同HZ值。一般來說,HZ大多設定為100,換言之,每個jiffy的區間就是10ms。Kernel 2.4後,i386的HZ可以設定為1000,換言之,每個jiffy的區間就是1ms。最近(Kernel 2.6.13)又改回250(1000太高了)。
James快速腦補:
HZ(compile-time constant),其值為100(意謂著1秒跳100次,因此linux精準度為10ms;也可以增高解析度至100us)。
有何缺點?
This periodic timer event is often called "the timer tick". The timer tick is simple in its design, but has a significant drawback: the timer tick happens periodically, irrespective of the processor state, whether it's idle or busy. If the processor is idle, it has to wake up from its power saving sleep state every 1, 4, or 10 milliseconds. This costs quite a bit of energy, consuming battery life in laptops and causing unnecessary power consumption in servers.
這個time interrupt事件又稱之為"tick"。簡單的設計,但是也有其明顯缺失:這個time interrupt事件週期性的發生,無論process的狀態,也不管CPU是空閒(idle)的或是忙碌(busy)的。就算process是空閒的,他也必須每1,4,或10ms(根據設定)就由睡覺的狀態醒來。代價是能量會有相當的耗損,筆電電池的壽命降低,或是伺服器會花費不必要的電能。
James快速腦補:因為這是一個interrupt,CPU即使是idel也要處理他。造成system power consumption(系統電源的耗損)。
如何改進之? (CONFIG_NO_HZ = y)
Linux has had a partial solution to the timer tick problem for years in the form of the CONFIG_NO_HZ configuration option. If that option is set, the timer tick will be turned off, but only when the CPU is idle. This mode improves the situation considerably; it allows idle CPUs to stay in deep sleep states, reducing power use.
Linux本身就有淺藏的解決方式,就是kernel組態 " CONFIG_NO_HZ"。預設已經是開啟,因此當CPU是idle的時候time tick會被關閉(*重點於此)。可想而知,他CPU可以"深睡",更加省電。
James腦補:舉例說一個process要睡整整1秒,他就真的睡1秒。不會10ms就起來處理time tick。
開啟NO_HZ有何缺點?
Indeed, given that letting sleeping CPUs lie is generally a good policy, one might wonder why this behavior is optional at all. The answer is that it increases the cost of moving into and out of the idle state, (very) slightly increasing the time it takes to get an idle CPU back to work. That cost may be considered excessive in highly latency-sensitive environments. For everybody else, disabling the timer tick for idle CPUs is almost certainly the right thing to do; for battery-powered systems that is doubly true.
的確開啟NO_HZ是個好政策,但你也許會納悶為何他只是個"可選(optional)"。答案是:當要離開空閒(idle)狀態的代價會提高。當CPU要回到正常工作狀態時,花費時間會(非常)輕微的增加。但這個微小的時間增加,對於對時間高度敏感(highly latency-sensitive environments)的裝備會有嚴重的影響。
James腦補結論(Conclusion):
- Default of "NO_HZ" is ON, i.e. CONFIG_NO_HZ = y.
- Less system power consumption when "NO_HZ" is ON, good for battery-powered systems
- NOHZ feature switches off the high-resolution timers on the system (refer to Ref[4])
- 目前NO_HZ預設是開啟的,也就是CONFIG_NO_HZ = y。
- 開啟NO_HZ可以減少耗能,對於吃電池的設備是頗不錯的
- 開啟NO_HZ可能導致時間比較不精準 (可以參考[4])
參考資料(References):
[1] NO_HZ ON/OFF對於系統耗能(system power consumption)的實驗:
http://www.phoronix.com/scan.php?page=article&item=651&num=1
[2] Kernel Time System:
http://elinux.org/Kernel_Timer_Systems
[3] 介紹tickless的優缺點
http://lwn.net/Articles/549580/
[4] 討論串
http://stackoverflow.com/questions/9775042/how-nohz-on-affects-do-timer-in-linux-kernel
沒有留言:
張貼留言