ULE Scheduler in FreeBSD

Sunday, November 25, 2007 7:09:36 PM

Few days ago... I got read some articles about new thread scheduler in Freebsd, and the name of that new scheduler is ULE. And then what are the new features of ULE scheduler or the differences between ULE scheduler with 4BSD scheduler (FreeBSD's old scheduler) ?

Ok... let's discuss it !!!

The current FreeBSD scheduler (4BSD scheduler) has its roots in the 4.3BSD scheduler.
It has excellent interactive performance and efficient algorithms with small loads. It does not, however, take full advantage of multiple CPUs. It has no support for processor affinity or binding. It also has no mechanism for distinguishing between CPUs of varying capability, which is important for SMT (Symmetric Multi-Threading).

FreeBSD inherited the traditional BSD scheduler when it branched off from 4.3BSD. FreeBSD extended the scheduler's functionality, adding scheduling classes and basic SMP support.
Two new classes, real-time and idle, were added early on in FreeBSD. Idle priority threads are only run when there are no time sharing or real-time threads to run. Real-time threads are allowed to run until they block or until a higher priority real-time thread becomes available. When the SMP project was introduced, an interrupt class was added as well. Interrupt class threads have the same properties as real-time except that their priorities are lower, where lower priorities are given preference in BSD. The classes are simply implemented as subdivisions of the available priority space. The time sharing class is the only subdivision which adjusts priorities based on CPU usage. Much effort went into tuning the various parameters of the 4BSD scheduler to achieve good interactive performance under heavy load as was required by BSD's primary user base. It was very important that systems remain responsive while being used as a server. In addition to this, the nice concept was further refined. To facilitate the use of programs that wish to only consume idle CPU slices, processes with a nice setting more than 20 higher than the least nice currently running process will not be permitted to run at all. This allows distributed programs such as SETI or the rc4 cracking project to run without impacting the normal workload of a machine.

And... the ULE scheduler was designed to address the growing needs of FreeBSD on SMP/SMT platforms and under heavy workloads. It supports CPU affinity and has constant execution time regardless of the number of threads. In addition to these primary performance related goals, it also is careful to identify interactive tasks and give them the lowest latency response possible. The core scheduling components include several queues, two CPU load-balancing algorithms, an interactivity scorer, a CPU usage estimator, a slice calculator, and a priority calculator.

The original FreeBSD scheduler maintains a global list of threads that it traverses once per second to recalculate their priorities. The use of a single list for all threads means that the performance of the scheduler is dependent on the number of tasks in the system, and as the number of tasks grows, more CPU time must be spent in the scheduler maintaining the list. A design goal of the ULE scheduler was to avoid the need to consider all the runnable threads in the system to make a scheduling decision.

The ULE scheduler creates a set of three queues for each CPU in the system. Having per-processor queues makes it possible to implement processor affinity in an SMP system.

One queue is the idle queue, where all idle threads are stored. The other two queues are designated current and next. Threads are picked to run, in priority order, from the current queue until it is empty, at which point the current and next queues are swapped and scheduling is started again. Threads in the idle queue are run only when the other two queues are empty. Realtime and interrupt threads are always inserted into the current queue so that they will have the least possible scheduling latency. Interactive threads are also inserted into the current queue to keep the interactive response of the system acceptable. A thread is considered to be interactive if the ratio of its voluntary sleep time versus its runtime is below a certain threshold. The interactivity threshold is defined in the ULE code and is not configurable. ULE uses two equations to compute the interactivity score of a thread. For threads whose sleep time exceeds their runtime, the following equation is used :

When a thread’s runtime exceeds its sleep time, the following equation is used instead :

The scaling factor is the maximum interactivity score divided by two. Threads that score below the interactivity threshold are considered to be interactive; all others are noninteractive. The sched_interact_update() routine is called at several points in a thread’s existence—for example, when the thread is awakened by a wakeup() call—to update the thread’s runtime and sleep time. The sleep-time and runtime values are allowed to grow only to a certain limit. When the sum of the runtime and sleep time passes the limit, the values are reduced to bring them back into range. An interactive thread whose sleep history was not remembered at all would not remain interactive, resulting in a poor user experience. Remembering an interactive thread’s sleep time for too long would allow the thread to have more than its fair share of the CPU. The amount of history that is kept and the interactivity threshold are the two values that most strongly influence a user’s interactive experience on the system.

Noninteractive threads are put into the next queue and are scheduled to run when the queues are switched. Switching the queues guarantees that a thread gets to run at least once every two queue switches regardless of priority, which ensures fair sharing of the processor.

--------

Now, i will try to recompile my FreeBSD kernel and append the ULE Scheduler as thread scheduler on my system...

[blu3c4t@mahardhika ~]$ cd /usr/src/sys/i386/conf
[blu3c4t@mahardhika /usr/src/sys/i386/conf]$ su
Password:
[root@mahardhika /usr/src/sys/i386/conf]# cp GENERIC ULEKERNEL
[root@mahardhika /usr/src/sys/i386/conf]# vi ULEKERNEL

[edit the configuration file]
1. Comment out line "options SCHED_4BSD" with a # in the front. And add this line :

options SCHED_ULE # ULE scheduler

2. Change the ident line as follows :

ident GENERIC

Change the line to read :

ident ULEKERNEL

[save the configuration file]

It's time to compile the kernel.
[root@mahardhika /usr/src/sys/i386/conf]# cd /usr/src
[root@mahardhika /usr/src]# make buildkernel KERNCONF=ULEKERNEL
[root@mahardhika /usr/src]# make installkernel KERNCONF=ULEKERNEL

It's done, let's reboot now !!!

[root@mahardhika /usr/src]# shutdown -r now

Then after the reboot...

[root@mahardhika ~]# uname -a
FreeBSD mahardhika.stttelkom.ac.id 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Sun Nov 25 23:02:06 WIT 2007 blu3c4t@mahardhika.stttelkom.ac.id:/usr/obj/usr/src/sys/mahardhika i386

Summary

Until now there are many debates about ULE Sceheduler performance compare to 4BSD Scheduler, but as a FreeBSD fan, I hope the new scheduler will bring advance improvement to FreeBSD.

You can see ULE vs. 4BSD performance benchmark at this link:
http://www.thejemreport.com/mambo/content/view/113/

Word around the campfire has been that the ULE scheduler is in some way “faster” than the 4BSD scheduler in FreeBSD. While conducting a benchmarking project to compare hardware performance, I performed all of my testing with both the ULE and the 4BSD schedulers to show the difference in performance. Read on for the results.

The Hardware

For this article I acquired the following hardware to use for two systems. They shared the same optical drive, hard drive, video card, RAM, cables, power supply, and chassis. Only the motherboard and CPU were changed to switch from the AMD machine to the Intel machine. This was done to prevent variations that could be caused by hardware manufacturing flaws or differences in output due to brand.

Asus K8V Deluxe
AMD Athlon64 3200+
Thermaltake K8 Silent Boost HSF
Intel D875PBZ (rev. 301)
Intel Pentium4 3.2E
Corsair PC3200 TwinX-LL 1024MB kit
Western Digital 36GB Raptor SATA hard drive
Sony DDU1621 DVD-ROM
ATI Radeon 9800Pro All-In-Wonder 128MB
Antec TrueBlue 480w power supply
Skyhawk Galaxy case with front, rear, side, and cowl fans

The CPUs were sent to me by their respective manufacturers. The AMD processor is a standard OEM edition available to computer manufacturers and through some retail channels. The Intel processor is a “Confidential” series pre-release sample. I know I said I would never write a review of pre-release hardware, but that was in reference to motherboards, which can often perform better than their future retail box incarnation (for example the Asus P4S8X and all of the Intel E7205-based motherboard pre-releases were significantly better performers than the final release edition). My experience with pre-release CPUs (of which I have seen and tested four) is that they are always of the same technology as the final retail or OEM editions. It’s one thing to make a super-powered motherboard for the press; it’s entirely another thing to modify a CPU for the same nefarious purposes. To be certain, I asked my press contact at Intel if there have been any changes since the Confidential pre-release samples, and he said that they were technologically identical to the consumer release, with the exception of errata corrections, stepping numbers and the multiplier lock (meaning I can change the clock multiplier in the BIOS, a feature which is disabled in the consumer release to discourage overclocking). I trust Intel’s word on the matter because my experience with them has indicated that they’re not liars when it comes to issues like this. Just the same, the possibility exists that my results can be different than those obtained with a retail box P4 3.2E, but if that is the case the numbers would likely be in favor of the retail processor, not the pre-release sample. Again, we’re talking about a possibility (in a random Universe, anything is possible) attached to a small likelihood of a significant performance difference.
The heatsink/fan (HSF) unit that I used for the Intel processor came from Intel. It’s a modified version of the traditional socket478 fan, except it has a larger copper core than the previous edition and the fins on the heatsink are in a sort of star pattern. The locking mechanism is the same. The heatsink compound I used was already on the bottom of the heatsink in a small gray pad. Intel provided a syringe of extra compound, but I didn’t have cause to use it.
The Thermaltake K8 Silentboost is an excellent solid-copper HSF. It’s a good thing, too — it was my only choice. It seems there aren’t (or weren’t when I bought this unit two months ago) many manufacturers that make HSFs for AMD64 processors. For the Athlon64 processor I used the standard-issue white heatsink compound, which is verified and certified by AMD.
The RAM was sent to me by Corsair for this and other benchmarking projects. It is the same retail box kit that you can buy through any authorized reseller. I could have requested RAM from a number of other manufacturers, but I chose Corsair for its high level of compatibility with motherboards, its low latency and reliable performance.
The WD Raptor is the fastest SATA drive on the market, and I acquired it for this and other benchmarking projects.
The Radeon 9800Pro AIW was sent to me by ATI for a previous review of SciTech’s SNAP Graphics drivers. I chose it for this review because it is a reasonable choice for a high-end single CPU workstation, and because although I didn’t do any graphics testing in this review, I plan on doing several graphic-intensive benchmarking projects in the future and I will be using this card for those tests. It helps to keep things as standard as possible to maintain cross-compatibility with my reviews.
The Antec TrueBlue 480 is both quiet and powerful. I had a long internal debate over whether I should get this power supply or the Vantec Stealth 420. Both are excellent supplies, but in the end I decided that the slightly cheaper Antec would be a better choice for this project because of its automatic fan control (the Vantec has a manual switch on the back) and its higher voltage and amperage ratings. The blue LED is superfluous — you can hardly see it when the case covers are on.
The Skyhawk case was a poor choice, but it’s all I had available to me. It is not FCC approved because of the acrylic window in the side, although I eliminated that variable by leaving the side cover off for all of my testing. This was also to improve ventilation and maintain a more consistent temperature inside the system. This case is totally unsuitable for a system based on the Prescott core because of its high operating temperature; despite all of the fans it has in it, ten seconds of idle operation with the side cover on forced the CPU fan to speeds of nearly 5000RPM.
Each system was assembled with care and all wires and connectors were correctly connected according to the manual. The BIOS was adjusted as necessary and the RAM sticks were in the proper slots for best performance. I decided to conduct my tests in a real computer chassis because, oddly, no other reviewers seem to do that. They bare-board everything, which means that they will never discover problems intrinsic to chassis assembly, such as the trouble I had with the Prescott system’s fan noise. This benchmarking project was designed to closely mimic a real workstation system, not a fictional lab testing environment.

The Software

The operating system I used was FreeBSD 5.2.1-RELEASE. If you’d like to learn more about how I configured the operating system and how I devised my benchmarking methods, or if you’d like to learn how to benchmark hardware using FreeBSD, I’ve written a separate article about it here.
I used the standard Unix time command to conduct stopwatch tests, stream and ubench for synthetic tests, and OpenSSL, oggenc, and cdparanoia for my real-world tests. I did not conduct any testing in X — that would be a totally separate review, and the research and testing for it have already begun.
I generated statistics for comparing the schedulers using ministat, which is a part of the FreeBSD base system. I didn’t make any graphics to show differences in performance. If you want to see pretty graphs that mislead readers and suggest flawed conclusions, you’ll be disappointed with this review. You shouldn’t need a graph or chart to put this data in perspective anyway — it’s pretty straightforward.

Stopwatch Tests

All time is listed in seconds and each number represents the mean average of the real time (the total elapsed time), user time (the time it takes to execute the utility), and system overhead time of three distinct test iterations.
It’s simple: I timed how long it took to compile the base system with varying numbers of concurrent processes. I also compiled Apache version 2.0.48_3 using no concurrent processes. I experimented with doing three buildworld iterations with ULE, then recompiling the kernel with 4BSD and running the same tests. I found that it didn’t produce a measurable difference in the results if I ran all nine scheduler tests in a row (with restarts in between, of course) or if I switched every three.
For the Apache2 build test I built the port and let it download and install all of the necessary dependencies. I then uninstalled Apache2 only — leaving the dependencies in place and the downloaded source code in the distfiles directory — and restarted in single-user mode, where Apache2 was rebuilt and timed. The time includes clean time; the exact command was time make install clean

Pentium4 Real Time

Concurrent Processes	ULE Real	Stddev ULE	4BSD Real	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	2371.9067	1.2120369	2346.2867	1.8889503	1.09194% +/- 0.15331%	95%	1.587	4BSD
-j3	2091.4333	2.3625692	2096.52	3.6458332	None	95%	N/A	N/A
-j4	2007.8267	2.5874569	1999.0033	0.68075938	0.441387% +/- 0.214512%	95%	1.89187	4BSD

Pentium4 User Time

Concurrent Processes	ULE User	Stddev ULE	4BSD User	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	2251.96	5.2583172	2221.36	2.1062763	1.37753% +/- 0.408695%	95%	4.00539	4BSD
-j3	2434.9	2.3562895	2365.0167	0.99349551	2.95488% +/- 0.173294%	95%	1.80819	4BSD
-j4	2499.9833	2.7431065	2416.1367	2.8990401	3.47028% +/- 0.264748%	95%	2.82215	4BSD

Pentium4 System Time

Concurrent Processes	ULE Sys	Stddev ULE	4BSD Sys	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	434.92333	2.0543207	408.68667	1.2877241	6.41975% +/- 0.950825%	95%	1.71442	4BSD
-j3	477.29667	2.0015078	445.87333	1.4139425	7.04759% +/- 0.880873%	95%	1.73281	4BSD
-j4	499.21667	1.6450633	465.16	2.2281607	7.3215% +/- 0.95429%	95%	1.95843	4BSD

The Pentium4 times appear to be impossible; according to the numbers for -j3 and -j4, it takes longer to execute the utility than it does to complete the entire process. This is due to Hyper-Threading — the user and system times are exactly twice what they would otherwise be because there are two virtual CPUs. Notice also that the Pentium4 isn’t significantly faster than the Athlon64/i386 despite the advantage of Hyper-Threading. This could be due to the vastly different design (longer pipeline, more latency, smaller pathways, etc. — see my article on Prescott technology for details) in the Prescott core.

Athlon64/i386 Real Time

Concurrent Processes	ULE Real	Stddev ULE	4BSD Real	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	2271.8933	0.29194748	2290.25	4.171918	0.801514% +/- 0.292666%	95%	2.95721	ULE
-j3	2097.2167	0.35837597	2096.6267	2.0888354	None	95%	N/A	N/A
-j4	1997.2967	0.21221059	2016.3667	2.0763028	0.945761% +/- 0.165896%	95%	1.47582	ULE

Athlon64/i386 User Time

Concurrent Processes	ULE User	Stddev ULE	4BSD User	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	1392.84	1.472379	1420.85	1.3815571	1.97136% +/- 0.227751%	95%	1.42769	ULE
-j3	1393.6233	0.26633312	1427.1533	0.6092892	2.34943% +/- 0.074676%	95%	0.470195	ULE
-j4	1395.93	0.71756533	1427.6767	1.3597549	2.22366% +/- 0.172599%	95%	1.08716	ULE

Athlon64/i386 System Time

Concurrent Processes	ULE Sys	Stddev ULE	4BSD Sys	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	245.01667	1.818855	246.61	1.6531183	None	95%	N/A	N/A
-j3	258.03333	0.84890125	263.91667	0.84884235	2.22924% +/- 0.729036%	95%	0.848872	ULE
-j4	263.05	0.31432467	272.46333	1.4701814	3.4549% +/- 0.884357%	95%	1.06307	ULE

The Athlon64/i386 times were actually faster with ULE than 4BSD, which is a trend not generally repeated by the Pentium4 and Athlon64/AMD64. It might be that ULE can’t handle heavy CPU loads when doing multiple parallel makes as well as 4BSD can, or it might be an architectural or technological difference that causes this anomaly.

Athlon64/AMD64 Real Time

Concurrent Processes	ULE Real	Stddev ULE	4BSD Real	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	2217.12	1.0737318	2132.8067	4.1955969	3.95316% +/- 0.325444%	95%	3.06235	4BSD
-j3	2067.6467	3.7753984	2019.8833	2.3351517	2.36466% +/- 0.352239%	95%	3.13899	4BSD
-j4	2014.0867	0.81977639	1965.4633	1.3387432	2.47389% +/- 0.128008%	95%	1.11002	4BSD

Athlon64/AMD64 User Time

Concurrent Processes	ULE User	Stddev ULE	4BSD User	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	1438.36	1.0769865	1435.54	1.2510795	0.196442% +/- 0.184304%	95%	1.16728	4BSD
-j3	1428.4667	1.6228473	1438.2333	1.0848656	0.679074% +/- 0.217533%	95%	1.38032	ULE
-j4	1429.0467	0.18583146	1439.45	1.7102924	0.72273% +/- 0.19155%	95%	1.21648	ULE

Athlon64/AMD64 System Time

Concurrent Processes	ULE Sys	Stddev ULE	4BSD Sys	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
-j2	383.19	3.2156492	372.8	0.99458534	2.78702% +/- 1.44707%	95%	2.38008	4BSD
-j3	394.78333	0.90234879	389.79	2.3138496	1.28103% +/- 1.02119%	95%	1.75615	4BSD
-j4	401.29667	1.0276348	396.16667	2.6338628	1.29491% +/- 1.14378%	95%	1.99916	4BSD

As you can see, in general the 4BSD scheduler is very slightly faster than the ULE scheduler when performing a buildworld with parallel makes. In some cases it’s only a matter of a few seconds, in others it’s a matter of several minutes.
For my next test I compiled Apache2. With the Pentium4 and the Athlon64 in i386 mode there was no measurable difference in any of the three aspects of compile time. With many more test runs I could have probably shown a very small difference in performance, but after three tests I didn’t think it was worth the effort. However the AMD64 edition had different results:

Apache 2 Real Time

ULE Real	Stddev ULE	4BSD Real	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
145.98333	0.2136196	137.90667	0.21007935	5.85662% +/- 0.348202%	95%	0.211857	4BSD

Apache 2 User Time

ULE User	Stddev ULE	4BSD User	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
86.786667	0.085049005	78.933333	0.17009801	9.94932% +/- 0.386147%	95%	0.134474	4BSD

Apache 2 System Time

ULE Sys	Stddev ULE	4BSD Sys	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
51.163333	0.31564748	50.993333	0.20231988	None	95%	N/A	N/A

Synthetic Benchmarks

Synthetic tests can reveal information that you might not otherwise be able to obtain, but in general you should not put a lot of stock in them. These numbers are not necessarily useful for comparing between systems, but they do show a significant difference in scheduler performance.
I tested with two synthetic utilities: stream and ubench. Stream showed no difference in memory bandwidth between the two schedulers, but ubench showed a rather noticeable difference in the Pentium4 system and a slight difference in the Athlon64/AMD64 system. The only test case that was inconclusive was the Athlon64 in i386 mode which produced the same exact numbers in two out of three test runs.

Ubench on Pentium4

ULE Ubench	Stddev ULE	4BSD Ubench	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
118046.67	267.1036	137239.33	106.32654	13.9848% +/- 0.335738%	95%	203.285	4BSD

Ubench on Athlon64/AMD64

ULE Ubench	Stddev ULE	4BSD Ubench	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
71229.333	41.956326	71351.667	4.6188021	0.171451% +/- 0.094813%	95%	29.8468	4BSD

The ubench test is an old Unix benchmark test, kind of the “old standby,” like 3DMark in the Windows world. It is absolutely meaningless — it produces a number that rates the CPU and RAM performance, and that number is only useful when comparing it to other ubench numbers.
It seems to be quite buggy, as I never once got it to complete its testing procedure. It would do the CPU test and then exit on a signal 6 (in 64-bit mode) or signal 11 (in i386 mode) when doing the memory test. Despite its inaccuracy in judging CPU power, the results it shows for the schedulers is consistent with the results in other tests: again 4BSD seems to have a slight advantage.

Real-World Tests

This is the most useful of all of the data I collected because it shows how a system will perform in real-world scenarios. I didn’t test a lot of different programs here because many of the tests that I think would be best must be performed in X11. I tried ripping a CD with cdparanoia, but the results were too close to say that there was a meaningful difference between schedulers. The point of this project is to show where there are performance differences, and if there are none then I’m not going to spend the time putting inconclusive data into tables and generating statistics.
Just in case you’re curious, the CD I tested with is LA Woman by The Doors, and it took roughly 660 seconds to rip it to the hard drive. From there I encoded the tracks with oggenc from the vorbis-tools port. The times below are, as above, listed in seconds and they represent mean averages from three separate testing runs. The exact command was time oggenc * and it was run in a directory containing only the ripped WAV files from the cdparanoia test. The only conclusive difference in times was in the Athlon64/i386 test; the Pentium4 and the Athlon64/AMD64 times were virtually the same between schedulers.

Oggenc Real Time

ULE Real	Stddev ULE	4BSD Real	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
253.82333	0.011547005	258.54	0.12288206	1.82435% +/- 0.0765118%	95%	0.0872735	ULE

Oggenc User Time

ULE User	Stddev ULE	4BSD User	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
251.59	0.11532563	256.46333	0.29501412	1.90021% +/- 0.19795%	95%	0.223979	ULE

Oggenc Sys Time

ULE Sys	Stddev ULE	4BSD Sys	Stddev 4BSD	Difference?	Confidence	Student’s T	Faster?
1.0933333	0.12583057	0.94	0.19	None	95%	N/A	N/A

Not a very big difference in oggenc times, but a measurable one nonetheless.
Lastly I used OpenSSL from the FreeBSD base system as a test (click here for the OpenSSL documentation). The output was piped to a text file for each run. The exact command used was openssl speed >run1.txt replacing the number in the text file name to correspond with the number of the test run.
There are so many results generated for the OpenSSL benchmark that it would take days to put it all into proper files, generate statistics and then put it all into tables. To make matters worse, 4BSD was faster in some of the areas of the test and ULE was faster in others. The best I can do is make this information available to anyone who wants to download it to compare it for themselves. Please see the FTP information at the end of this article if you’d like to download the raw data for this and other tests that I performed.
When reviewing the data, please note that OpenSSL in FreeBSD has hand-optimized assembler code for i386 and will therefore be more favorable to the Pentium4 in this test scenario; the AMD64 code is all in C, so the code will perform differently. According to FreeBSD developers it is possible to optimize the code for AMD64 in the same way, but it would increase the clutter of the base system and it’s a matter of debate whether or not it should be done.

Conclusions

The true purpose of collecting all of this data was to attempt to determine what the performance difference between the test systems was, with a special emphasis on showing the difference in performance for the Athlon64 in both 64-bit and 32-bit modes. I performed all of my testing with both the ULE and 4BSD schedulers to see if they made any difference between architectures and technologies, and to provide this data for FreeBSD developers with the hope that they can use it to improve the performance of the newer ULE scheduler. I don’t know how useful this data will be to them or to anyone else, but here it is nonetheless.
In general I found the 4BSD scheduler to be faster when the system was running concurrent processes. There is one thing that I regrettably could not show, and although I don’t have numbers to prove it, compiling multiple programs at once (in multiple terminal windows) will slow the ULE scheduler to a crawl where the 4BSD will keep right on going at only a slightly slower pace. I found this happening on all three systems, and it was a problem for me because I multitask extensively.
This review will hopefully stand as both a basis for more testing (for myself and others) and as a measuring stick by which all future comparisons of this nature are judged. I look forward to retesting in the future when more changes have been made, and next time I’ll use more real-world tests.

Opensource - Information

Saturday, November 5, 2011

ULE vs BSD scheduler