IPv6 Launch Day at the ReactOS Foundation

April 19th, 2012

On the 6th of June 2012, the Internet Society has decided to start an important and world wide event: the IPv6 Launch. This means that on this days, several Internet actors (website editors, network operators, hardware ventors) will make what they own IPv6 compatible (in a transparent way!). And will ensure it will stay that way. This is not the IPv6 day.

Regarding Internet then, this is a major event. And an agreement to get IPv6 launch is a great decision. This is why we [The ReactOS Foundation] decided to associate our name to this event. This means that several of the ReactOS servers will be compatible with IPv6 (not the OS itself!). A lot of work will have to be done (even if it was already done partly previously) to ensure everything gets ready on the 6th of June. A kind of roadmap is available here: IPv6.

This actually requires work. Not only to update configurations (this could be the easier part…). The real problems come from software that don’t support IPv6 properly and that need a rewrite or even worse a full rewrite. This is what I will be working on up to the 6th of June to get everything fully functional.

As an example, in order to “help” DenyHosts (which doesn’t handle IPv6 even with custom regexp), a new tool has been developed to run nearby it. It will catch all the non-allowed connections from an IPv6 address to SSH and will drop the IP in /etc/hosts.deny. All the tools that may be developed will be dropped at GitHub under GPL licence (ForbidHostsv6 is already there, follow the link). To permit everyone to make use of them, in case they also need them for the switch.

Let’s share our knowledge about that switch, to make it as easier as possible (and the smoother possible as well!).

Nagios plugins

February 28th, 2012

All the servers we have at the ReactOS Foundation are monitored with Icinga (a Nagios fork). Icinga is compatible with both nagios plugins and NRPE (Nagios Remote Plugin Executor). If Nagios plugins are enough to monitor basic services on our servers, with specific services and value to monitor, those are not covering enough. So, as many system administrators (I believe), I went to Nagios exchange to get more plugins to have those matching up our needs.

First problem is that with those plugins, most of the time you have overkill plugins, that do much more than what is required. They also not necessarily match what you need (best example is our hardware raid, I had to modify a script from nagios-exchange).

Second major problem is that those plugins are most of the time (not to say all the time) scripts written either in bash/python/perl. If that’s suitable for small monitoring, it comes to be a problem with larger solutions (~1000 nodes).

Even if at the Foundation we don’t have 1000+ nodes (yet? - unbelievable), I really wanted we had native plugins, written in either C/C++ to match specifically our needs (and kick out some problems). This lead to the development of plugins. While our internal scripts plugins were private, I decided to release those to the outside to allow everyone having the same issue to enjoy those.

Those always match name and behavior (arguments and return - especially in performance data) of the previous scripts we were using.

For the moment, only check_ram (100% internal one), check_acpupsd, and check_mountpoints are currently available (the only ones to have been developed yet). More are to come.

You can find them at: http://svn.reactos.org/svn/project-tools/trunk/nagios/

Feel free to comment or to sent patches to improve them!

ReactOS plans for 2012

January 22nd, 2012

This year, instead of looking back on previous year to sum up what has been done, I will speak about what is planned.

First thing is the long awaited 0.3.14 release. It is on the way to be released and has been branched. This will be an important ReactOS release. We took time to prepare it, and it comes with numerous and great changes. Unfortunately, as several rewrites took place, some regressions are to be expected. And some people may not be able to install ReactOS any longer. In case it happens (in VMs), change your virtualisation software (or downgrade it in case it is Virtual Box). We know that is a real problem. But in spite of our efforts, we could not kick it out. It will require huge work. But, as expressed there are ways to work around.

Most of my work on ReactOS will take place in background. In 2011, I became one of the ReactOS systems administrators. Which means, ensuring servers are running properly and up to date, but also deploy new things to make developers everyday life easier. This year, we plan to deploy two more test bots. One on VMware ESX, another on VMware Player on Linux. Both handled through libvirt and our sysreg tool. I already started implementing the support in our tool.

Another thing will be the (also long awaited) switch to CMake. We will drop support for our own solution (rbuild) to use CMake to handle ReactOS builds. This has been postponed until release and until ReactOS Build Environments (RosBEs) are ready. Both are about to be completed. On RosBE-Unix, I took over on Colin, as he has less time to give to ReactOS. We also have a last build slave running rbuild builds. Once CMake switch will be done, it will be time to tell him good bye. It will be a real switch for the project. The last rbuild builder has done more than 10k builds!

I also plan to deploy another static analysis tool on our server to check ReactOS code quality. We already have cppcheck (with quite huge configuration, by the way…) that returns pretty good results. We also make use of Coverity with success. The idea would be to add another and rather different tool: sixgill. This tool appears to be successfully used by the Mozilla Foundation.

The Foundation also have a Windows Server 2003 license that we would like to use to set up a test environment for developers. That way, they could write tests for functions, run them on Windows 2003, and implement/fix the right way on ReactOS. Method which is also called: test-driven development. Rather useful for us!

Finally, if I have still a bit of time (looks difficult!), I would like to finish all the things I have started/planned last year on ReactOS. But I guess it will not be that easy…

Introducing the PierreFS file system

November 27th, 2011

It’s been a long time since the last blog update. One of the reasons for such an absence was that I was really busy at CERN. During my internship there (at LHCb experiment), my mission was to improve the way their diskless farm is working.

It was previously relying on the Red Hat diskless tools to allow ~15 nodes to run without disk, gathering their OS & their data from another server. But, this system and its implementation by Red Hat is really limited. Furthermore, starting with RHEL6 (Red Hat Enterprise Linux 6), the support for those tools has been dropped. The LHCb was to switch to another method.

One of the solution that was immediately looked at is file systems union. File systems union allow merging several directories (called branches) into one, in a transparent way. Some of those directories can be in read-only mode. Then, changes will only be applied to the writeable directories. So, for LHCb, the underlying idea was to use file systems union to provide easily manageable nodes. One shared root file system would be managed, and then snapshots (one per node) would be merged with shared root into different directories (one per node). And then nodes would boot on their directory. That was, you have each node with its personality and own file system easily. Without the constraints of the Red Hat tools.

We evaluated several file systems union drivers to select the one that was fitting our needs the best. Unfortunately, if several weren’t that easy to deploy, they were even worse to use. They were designed for totally different use (like, for example, LiveCD).

This is why, during a weekend, I designed a new union file system that would match their needs. As I was more creative for concepts than for names, I temporarily called it “PierreFS”. Feel free to find another one. If this file system takes most of its concept from the legacy UnionFS file system, it adds more features that are designed to highly reduce useless/redundant copyups (ie, files that were copied from read-only branch to read-write one, to be editable). One of the major feature of the PierreFS file system is its ability to split data & metadata when performing a copyup. It can handle both separately, to prevent data redundancy and allow easy updates of data, while metadata are modified. It has also a feature that permits the deletion of some copyups when it detected their are useless. Finally, this file system comes with some limitations since it’s targeting specific configurations. It can only merge one read-only & one writeable branches.

All the features described upper (including those from UnionFS) were implemented as a FUSE file system, to test viability of the file system and to validate the concepts. Unfortunately, this doesn’t include all the thought features. Some more like caching were first considered (and are still not dropped yet). The driver successfully passed our tests, and met LHCb expectations.

I’ve started portage to the Linux kernel as a driver for CERN. I hope I’ll be able to release it quickly. This work is once again based on UnionFS team work. This is a good basis to learn and understand.

You can find the FUSE driver at sourceforge.net (no packaging has been done - won’t be done).

This file system has been presented during the ICALEPCS11 conference at Grenoble, and has been described in the WEMMU05 paper & poster (with a comparison with the other file systems). All the conference material will be available on the conference site.

ReactOS & GSoC (and all the rest…)

March 22nd, 2011

The 18th of March has been quite a great day for the ReactOS project. Indeed, on that day, Google released the list of the accepted mentoring organisation for the Summer of Code. And for 2011, ReactOS is in. Last time the ReactOS project was accepted was in 2006. So, this has been something quite magic for the project.

But, you may wonder: what are Google Summer of Code? The idea is simple. Each year, Google will select organisations (ie, Free Open Source Software projects) based on applications they send. Then, those organisations, called mentoring applications, will be able to ask “slots”. Those slots are students that the organisation will mentor on a project, an idea. The important part is that, to ensure students will come (and organisations will apply), Google is offering $5500 per completely finished mission ($5000 go to the student, and $500 to the organisation). That way, students who don’t have any internship can earn money during the summer, and FOSS organisations can earn money as well.

This year, the ReactOS will host 5 slots. Which means that we will welcome 5 five students. You can find some ideas of projects to work on here. But students are free to purpose another subject when sending an application. Students can’t apply yet and have to wait until the 28th of March (before, they have to learn a bit about the projects).

Mentors you may except to have on ReactOS aren’t known yet. But, you’re likely to find Aleksey Bragin in the list (it would be weird if he wasn’t in!). Actually, list will be known once students and projects will have been chosen. This is done that way to ensure that mentors will match at best subjects (and prevent any issue with the mentor). Anyway, several ReactOS developers already applied to be mentor. So no shortage of mentors in sight!

Another topic regarding ReactOS: I’ve been presenting ReactOS at the ISIMA on the 17th of February. Part of the presentation has been shot and you can find the uploaded video here. I may do another ReactOS presentation in France this year, but nothing has been confirmed yet. So, follow the ReactOS news for the moment!

I hope you’ll enjoy the video and the information given in. If you want to get the PDF used for the ReactOS presentation, you can find them here.

That’s all for the moment (nothing related to development yet). More will follow (about dev & GSoC). And don’t hesitate to join the ReactOS project. GSoC are a great opportunity!

Let’s go back to 2010

January 7th, 2011

Indeed, now 2010 is over, it may be interesting to go back to it. Not for the same reasons that led the ReactOS to go back to 2010 (cf: r50529, a revert needed due to a bug on boot image handling). In fact, it is interesting to go back to 2010, just to see what has been done, what was planned, what has failed. Finally, what we can say and remember about that year.

First of all, I would like to speak about my servers. As you may know (or not) I currently own two servers, known as www.heisspiter.net and www2.heisspiter.net, and I also rent a third server at OVH, called www3.heisspiter.net. I was not speaking about them, because I was not really taking care of them. And this led to many and important issues (bad performances, bots, and so on). During a weekend, I decided to switch www3 to ipv6, which finally worked. But, it made me understand that my servers needed love, somehow. So, now, I am working back on them. The idea that they have reached some maturity point and can work alone is definitely wrong.  Indeed, 2009 was already a pretty bad year for heisspiter.net and 2010 was terrible. No evolution, several issues, servers down, … A quick look at statistics show that people also stop coming on the server. And I cannot blame them. 2010 was a really bad year for heisspiter.net, and 2011 cannot be worse. I will just do my best not to make it the same.
An encouraging note regarding heisspiter.net, at least. Some evolutions have started. I was talking about ipv6, but also a webmail for users arrived, upload service is back, mail server has been fixed, servers software have been updated, and heisspiter.net internal tools fixed. This actually explains why heisspiter.net is more stable since December!

Other major point… Of course, ReactOS! You may have seen, reading my other posts that the project did important step into stability and features. Some rewrites, some parts becoming more mature. MM rewrite, with Heap rewrite (for user-mode land) force developers to fix the ReactOS code to corrupt less memory (or to less corrupt memory?). And this works. Especially when fixes are applied to those rewrites. This also comes after some hard year for ReactOS, with no releases, and nothing to release, due to broken trunk. But, here, it is past!
My modest goal, for 2011, is to prove that ReactOS has gained some maturity now. Some testers are already pushing to get 0.4, and I would like to show we are not that far. And I will try to show it on my domain of work on ReactOS, ie filesystems and kernel. With the help of Johannes Anderwald, and Art Yerkes, I attempt to make ReactOS boot from Microsoft FastFAT driver. Johannes has brought some code for tunnels handling in FsRtl, Art code for MCBs and CC. Finally, I come with notifications (still) and motivation to make all that stuff working together (which is not, at the moment). This is a very, very interesting experiment since it kinda stresses ReactOS and forces me to work on ReactOS part I promised I would never work on (I am speaking of CC!). On another side, I will also keep on working on other parts of the kernel as I did previously, trying to improve it and match Windows 2003. I will also switch a bit on FreeLDR, some bugs are calling me there!

About personal projects, I have been quite active during 2010, even if I did not publish about them. One of the project I wanted to publish about before I forgot is a C++ garbage collector. I designed it for several uses, and finally it is more a memory manager than a garbage collector. Its purpose is simple: giving you memory whenever you need it and keeping track about it. It can also performs some operations on it to make your program debug easier such as: memory marking, memory zone tagging. It can also allocate non-paged memory, check against corruption, and so on. It has been designed to work in multi-threaded environment and provides functions for that. For example, when you share memory zones between threads, sometimes you even do not recall who is using what, how long. Here garbage collector becomes useful. Each threads when it uses a zone just needs to reference the memory zone. And once it is done, it dereferences it. Simple mechanism, but that ensure the memory zone will be released once every thread is done with it.
I am not totally done with that project (that is perhaps why I did not publish about it yet) and I plan to finish it and make it a bit closer to a garbage collector. And giving it the ability to allocate and release objects.

Other project I have been working on (and I am still actively working on) is an IRCd “new generation” written in C++. Its purpose is quite simple, implementing the five RFC concerning IRC, optionally adding extra often used/needed features. But, the new thing is that it comes with services implemented in (if built with, of course!). This is quite new, and interesting in my opinion. When you need to rapidly deploy an IRCd, configuring both IRCd, services (when you found the good ones!) can be a pain. With that IRCd, everything comes in. Thus it makes services really efficient as they directly communicate with the server (in a proper way, nothing messy!). And there is no need for SVSMODE, SVSJOIN, etc, commands or equivalents, here you just use services. At the moment, the core IRCd is almost complete and works really well. Services are mostly non existant (excepted OpServ, obviously).
For 2011, I plan to finish that IRCd, and perhaps to use it on heisspiter.net. Time will tell.

Finally, this is the shortened version, but there would be so much to say… Best thing is to keep reading ReactOS’ mailing-lists and this blog to keep informed!

Happy new year ;).

Coming this autumn in ReactOS

September 23rd, 2010

If you are following the ReactOS community, you may have spotted some information about my recent work there. Nothing linked to notifications that time. Let’s come back on how I had the idea to work on that part of ReactOS…

While looking at regressions, I found an interesting issue in the bug #5145. In fact, it was not the issue itself, but the way it was appearing. It was producing a BSOD with bug check code 0×7B (INACCESSIBLE_BOOT_DEVICE), I deeply had the feeling it should have failed earlier with the bug described. So, I looked down in the code and spotted why it was not failing earlier. Indeed, our ARC names handling was pretty old and not returning appropriate information (it was designed in a NT4.0 way, and not returning whether it had failed).

When booting, both ReactOS and Windows need a bootloader. On Windows, it’s called ntldr (up to Windows 2003; starting with Windows Vista, Microsoft switched to winload & bootmgr) and on ReactOS it’s called FreeLdr (or even WinLdr). This loader is in charge of loading the kernel (ntoskrnl or ntkrnlmp on MP architectures) into memory to process to system boot. While starting the kernel it also provides a LOADER_PARAMETER_BLOCK. This structure contains some important information about the system state, especially the partition from which its starts and on which the system is located, and its disks. But, the loader is providing those booting data using ARC names (example of ARC name: multi(0)disk(0)rdisk(0)partition(2)). You may have already seen that Windows is not using such names (nor is ReactOS). It is using \Device\Harddisk0\Partition2. So, that ARC names handling is responsible for: linking ARC names and “Windows” names, but also for finding on which device (and more specifically, on which partition) the system is booting. If it cannot find it, boot process is aborted, and system is gracefully shut down by a bug check using code 0×69 (IO1_INITIALIZATION_FAILED). This means that IO Initialisation Phase 1 failed. That was the error I was excepting to show up, instead of that inaccessible boot device (which means that boot device has been found…).

That is why, when looking at ReactOS ARC names handling code (populated with FIXMEs and hacks), I took an important decision: rewriting it properly, and in a Windows NT5 way. It was important, because the purpose was switching our kernel from a major revision to another higher revision. From NT4.0 to NT5.2 (Windows 2003). But that version change was partly due to two new things: Windows Mount Manager and GPT handling. Indeed starting with Windows 2000 (NT5.0), Windows has been using a Mount Manager, used to mount volumes in a PNP way, and that was already creating them their device name (plus referencing them to registry). So, new ARC names handling is taking advantage of that mount manager to do less work. And it is also using new kernel functions (old one having been extended) to manage partitions (as partition table can be either in MBR format or in GPT format).

So, that rewrite started being a real challenge. Especially since I wanted to adopt a serious implementation way. I was implementing ALL called functions, in order to provide a clean implementation, and also a working one, that would not come with a bunch of stubs. Luckily (or smartly, depends on the way you see it), Microsoft guys left in ARC names handling a legacy mode (coming from NT4.0) in case some drivers would not answer to Mount Manager PNP requests. This meant I did not need to implement the whole Mount Manager in ReactOS (we read an empty registry table) and ReactOS can still boot using legacy mode! First issue avoided. But next issues could not be avoided. ARC names handling is calling IoReadPartitionTableEx() responsible for reading both MBR & GPT. So, I had to implement it… And to add some new features to ReactOS.

I guess you know understand what will come soon in ReactOS. Yes, the ReactOS kernel will now handle properly the GPT. Do not misread me, I am only talking about the kernel. All the other parts of the OS, not relying on the kernel are still unaware about what GPT is. I recently showed my results through a screenshot you can find here. On that screen, you can see ReactOS booting on CD to permit its install, but you can see new IoReadPartitionTableEx() in action, first displaying the four partitions it found in MBR on first disk, and then displaying the two partitions (over 128) it found on GPT with their information. As you may have read, everything is not working yet, it is also is bit hacked to produce such result (GPT partition table checksum computation just fails, as error message shows). Once I was done with ARC names handling, and needed functions with GPT, as trunk was still locked, I decided to finish implementing all the missing FSTUB API. In fact, were missing all functions that were dealing with GPT. So finally, that “small” patch that aimed to rewrite a part of the kernel to fix some bugs and provide a proper implementation ended in a full implementation of a package in the kernel.

But, everything is not that great. First of all, an issue raised in the early while implementing GPT in ReactOS kernel, which made me mail the ros-dev mailing-list. Is there a room for improvement? I do think so, especially since I am nearly done with it. Microsoft is not really matching the EFI standard (and not only on the point I explained in my mail) and ReactOS could implement it properly. It would be a nice advert for the project. Furthermore, to manage GPT properly, ReactOS would need a better storage stack. Most of our current storage stack comes from NT4 DDK (as license permitted it). But if kernel drops NT4 compatibility, it needs a bit more from storage stack. I quickly implemented basic needs. But I could not implement deeper needs. Those are just hacked away in the storage stack at the moment. And since no one in the project really understands how storage stack works, it will be a blocking issue. Finally, I still need to fix some issues in the code I implemented to make it working properly. This may take some more time…

Anyway, in spite of those issues, I really hope I will able to commit a first patch with basic features working in ReactOS this autumn. Only advice I can give you about that is to follow ReactOS’ life, not to miss that. Talking of which, looks like a release is coming for ReactOS ;)

React0S’ status

August 19th, 2010

It’s been a long since I last wrote about ReactOS. This was mainly due to me being away from the project. Or, at least, passive. I was working on notifications, used by FSD (File System Driver). I finally ended my awayness period by writing a documentation about notifications, contenting all the information I gathered during my research about them. Then, I actively started working on them, coding them into ReactOS.

First of all, let’s quickly sum up what notifications are. To make it short, notifications purpose is to notify someone when a change occurs. In an OS, it means you can notify when a volume is mounted, when a file is added to a directory, when a directory is deleted, and so on. There are two ways to deal with those notifications in Windows/ReactOS. State notifications (mount, unmount, lock, unlock) are handled by PnP manager. On disk data changes are handled by both the FSD and FsRtl through a package of functions. So, how to use them? A client application (mostly in user-mode) registers a notification and waits for it to be complete. That’s that easy. Internally, things are getting harder, but the system are equivalent. For state notifications, PnP manager maintain a list of registered notifications. Each time a state change occurs, the driver (or even the kernel) responsible for that change has to call a PnP function which is IoReportTargetDeviceChange() or its asynchronous implementation: IoReportTargetDeviceChangeAsynchronous(). For FSD, there’s even an easier way to notify which is calling FsRtlNotifyVolumeEvent() giving the event that occured. FsRtl will do all the needed stuff and call PnP manager. Once PnP manager is called with a reported change, it just browses the notifications list, finds those that matches the report, and complete them. Caller is then informed about the change. What about on disk changes now? It works the same, just replace PnP with FsRtl. FsRtl maintains the list on FSD demand, browse notifications, takes reports and so on. To register a notification, FSD just calls FsRtlNotifyFullChangeDirectory(), and to report a change: FsRtlNotifyFullReportChange(). It doesn’t have to do more.

So, now, what’s present in ReactOS, and what’s not?

That question is hard to answer. But, let’s try to make the answer as clear as possible.

  • About state changes notifications, technically, everything’s present in ReactOS kernel since revision r47837. It isn’t implemented as it’s in Windows, but that’s already a good begin for having such notifications. I also added some reports. For example, FAT driver reports when it successfully mounts a volume. But, in fact, nothing works. For the simple reason that, IoReport* functions need a PDO (Physical Device Object) to perform the report. Or, Windows (and ReactOS as well) calls the drivers to get the given PDO. In fact, you have a stack of drivers. Higher level is FSD, and lower level is the one that communicates with disk. So, to get the PDO, Windows calls the higher level driver, to ask for relations, and PDO. Then, drivers pass the request to the driver lower they know about in the stack until the last one which complete the request, giving the PDO. It goes down the stack. This side of notifications isn’t implemented as it’s done using PNP requests, and our drivers doesn’t handle them. I recently implemented that support to our FAT driver (r48560), but as the rest of the drivers doesn’t handle it, it does fix nothing. Work will have to be done!
  • About on disk changes notifications, that’s the exact contrary. They work, but aren’t implemented. That could look a bit weird, but that’s not. In fact, those notifications are working in my working copy, but I didn’t release them yet, to have time fixing them. You may have already seen the screens I published about them: http://pierre.heisspiter.net/rostests/notifications.png & http://pierre.heisspiter.net/rostests/notifications3.png. First was the earliest test I did that worked. Full of bugs, of leakages, but it worked. In fact, it’s showing a Microsoft applications designed to test notifications (why using something else? :P) that you can find here. The way it works is easy. You start it giving a directory, and it will register notification on that directory for files changes. And a notification in root directory, for directories changes. So, on first screenshot, you see the application monitoring C:\ drive, and me saving a new file to C:\ called newfile.txt. The application successfully got the report. But given the code, that time, I was really lucky it worked! Second screen comes later, and shows the both notifications working. Saving two files, creating a directory, and the applications (same app started twice with two different directories) printing about both. There, code was getting cleaner and cleaner. Now, code is in a consolidation state. Which means I’m trying to make it rock solid, and I also try to understand details I still don’t get. I hope I may commit that code soon. First, notifications won’t be as complete as in Windows, but it’s a good step into having them. Only issue is that nothing is using them on ReactOS. No application (even not explorer) is registering notifications, and no driver is reporting changes (ie FastFAT/CDFS). So, something will have to be done there as well.

Now, what else? After having fought to get a trunk freeze, I switched to fixing ReactOS instead of keeping on working on notifications. As you might have noticed if you follow the project, the OS recently regressed to a state ever reached. Only a few applications are working, having ReactOS booting is hard, it appears those are due to some memory corruption, deep in the OS. Knowing the origin of such corruptions is quite hard. Indeed, recently, MM (Memory Manager) has been rewritten, and made less permissive than the version we had previously. So, we’ve got three choices: or new MM is broken, or rewrite throw some light on defecting ReactOS components (even in the kernel), or… both! Personally, I tend to believe that the last solution is ours. Even if that sucks. Such revisions show that MM is responsible for a part. Now, what? All the rest is OK? I don’t think so, either think Aleksey Bragin, coordinator of the project, and developer as well. We both agree after short talk that our memory corrupter could be FSD. Indeed, those have been designed and coded in the early ages of ReactOS, when kernel was poor. And then, they have been more hacked than improved to take advantage of kernel new features. I would even go farther actually. Saying that FSD and kernel are both responsible for that status. FSD drivers for calling the kernel with bad/broken parameters and the kernel for lacking proper checks. I’m thinking to a particular part of the kernel here. I’m talking about CC (Cache Controller). This part is involved in caching data, delaying read/write to the disk. So, it’s a heavy MM client. Our current implementation is really poor due to the complexity of the caching process. Checks are low. So, this plus broken driver can make huge damages. I think that’s the situation we’re in. Time will tell. I’m currently trying to track any (really, many are… Only a bit!) broken call to CC by our FSD that might have side effects. And I’m as well thinking about a hackish solution that may be found. For whatever reason, my working copy, with all the changes to FSD and kernel it has (and even to user-mode components!) is largely more stable than trunk. Why? I don’t know. But an hackish solution could be found in, just to help releasing and pushing fastfat_new to replace our current old fastfat driver. But, I’ll talk about fastfat_new and all the projects we (Aleksey and I) have about it in another entry.

Just keep in touch with the project, we’ll keep impressing you, in spite of those… blockers! Once those will be fixed, next release will be great. Really.

How to check for a valid pointer?

June 6th, 2010

Recently, for one of the projets I’m working on, I’ve been asked a really naive question: “How can I be sure a function pointer is valid?”. If the question is naive, the answer isn’t that easy to find. In most programs, when pointers are checked, we just check whether the pointer is null. If not, we assume the pointer is valid. If that test is sufficient for most cases, it happens sometimes it’s not enough.

So my goal here was to find a way to catch wrong non null pointers. I immediately thought about huge and crappy solutions where pointer checks would have been slower than crashing and restarting.
Then, close to the end, you do stupid stuff, such as typing: man end. Here was the solution. 3 external variables are provided by ld when linking a program, and defined by loader when a program is started: etext, edata, end.
Let’s switch back to the structure of a binary. When you build a program, using GCC and without playing with sections, your program is cut into 3 parts: .text, .data, .bss. Interesting part for us is .text. This is were program code is stored. So, when you’re using a function pointer, its address will point into .text section, or it’s not valid. etext meaning end of text (section) is the address of the first instruction after the text section, so every pointer address has to be lower than etext. This gives the first way to check a function pointer. Then, I wondered: pointer has to be higher than something. What? And how to know. What was easy to find: base address of the binary in memory. Indeed, when you start a program, its code is stored in memory. So, I had to find the address of the first instruction. How? While browsing the web, I found that another symbol was also provided by ld: start. This is the address of the entrypoint of the program in memory (most of the time, the main() function of a program is the entrypoint). And most of the time, OS when loading a program puts entrypoint at the begin of the text section in memory.
So, I wrote the following function:
int is_fct_ptr_valid(void *p)
{
  extern char _etext, _start;
  return (((char*) p < &_etext) && ((char*) p > &_start));
}
This way, you can check in a more accurate way if a pointer function is valid.

Then, another question raised in my brain: “OK, you can check functions pointers in a nice way, what about memory now?”. Memory is something harder to check, and I was again a bit lost about how to proceed. I found a first way: most of the time, memory of a program is stored at the end of the program representation in memory. So, pointer address has be higher than end (cf: previous paragraph). And, in fact, having higher address is easy. You just have to use sbrk(0). sbrk is the function you can use to increment heap size of your program, it takes the size of the increment in parameter. 0 means no increment, so it just returns the current higher address.
So, I implemented that and tested. But it failed. I was really stupid thinking that would always work. In fact, in a program, you’ve got two kinds of memory: heap and stack. The method described above is only good at checking heap. Now, the question was: how to check stack then? There was no direct method way to check it, as it was possible for all the rest. Then, I thought about something a bit tricking. When you call a function, memory for local variables is allocated from stack. In fact, you just substract size you need from SP (Stack Pointer) and then, use stack past that given SP. And you do it each time you call a function, even inside another function. And main() is the main function. Then, if you know the address of the stack in main, you know that every other pointer has to be lower than it. And when you call the function to check memory, you know that’s the last called function, so that the pointer to check has to be higher than a pointer you would have in the function. The tricky method was born.
char * sstack;
int is_mem_ptr_valid(void *p)
{
  char estack = 0;
  extern char _end;
  return (((((char*) p > &_end) && (p < sbrk(0))) || (((char*) p < sstack) && ((char*) p > &estack))));
}
int main()
{
  char start_stack;
  sstack = &start_stack;
  /* … */
}
Here, you have the whole process to check memory.

Now, the test program, to show you the whole process:

/* Pointers checkings example */
/* Author: Pierre Schweitzer */

#define _BSD_SOURCE 1
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <unistd.h>

typedef struct _call_t
{
  void (*function)(void *);
  void * data;
} call_t;

char * sstack;

int is_fct_ptr_valid(void *p)
{
  extern char _etext, _start;
  return (((char*) p < &_etext) && ((char*) p > &_start));
}

int is_mem_ptr_valid(void *p)
{
  char estack = 0;
  extern char _end;
  return (((((char*) p > &_end) && (p < sbrk(0))) || (((char*) p < sstack) && ((char*) p > &estack))));
}

void stupid(void * data)
{
  printf(”function’s been called :)\n”);
}

void callfunction(call_t * ct)
{
  if (is_mem_ptr_valid(ct))
  {
    if (is_fct_ptr_valid(ct->function))
    {
      (ct->function)(ct->data);
    }
    else
    {
      printf(”Incorrect function pointer: %p\n”, ct->function);
    }
  }
  else
  {
    printf(”Incorrect memory pointer: %p\n”, (void *)ct);
  }
}

int main()
{
  char start_stack;
  call_t call1, call2, * call3;

  sstack = &start_stack;

  call1.function = stupid;
  call1.data = NULL;
  call2.function = (void(*)(void *))time(NULL);
  call2.data = NULL;

  call3 = malloc(sizeof(call_t));
  call3->function = stupid;
  call3->data = NULL;

  printf(”Test #1: function will be called\n”);
  callfunction(&call1);
  printf(”Test #2: function error will be raised\n”);
  callfunction(&call2);
  printf(”Test #3: memory error will be raised\n”);
  callfunction((call_t *)time(NULL));
  printf(”Test #4: function will be called\n”);
  callfunction(call3);

  return 0;
}

The functions given above, even if they are more accurate, are not fail-proof. Several assertions have been made because they are true in most cases. But, in case you don’t have contiguous memory, in case entrypoint isn’t at the begin of memory, those functions would be senseless. Furthermore, those functions only check if pointer points to a valid memory zone, not if the content are valid. Your program can still have issue due to of-by-one mistakes or such.
But, that’s a nice way to begin! :)

How to check for a valid PDO?

September 27th, 2008

Let’s first define what a PDO is. PDO is an acronym for Physical Device Object. A PDO is a kernel object to describe a real physical device (contrary of logical or virtual device). It’s represented in kernel space by a structure called DEVICE_OBJECT (because it applies to all device objects). And then this structure is shared between kernel and drivers. Here is the structure
typedef struct _DEVICE_OBJECT {
  CSHORT Type;
  USHORT Size;
  LONG ReferenceCount;
  PDRIVER_OBJECT DriverObject;
  struct _DEVICE_OBJECT* NextDevice;
  struct _DEVICE_OBJECT* AttachedDevice;
  PIRP CurrentIrp;
  PIO_TIMER Timer;
  ULONG Flags;
  ULONG Characteristics;
  volatile PVPB Vpb;
  PVOID DeviceExtension;
  DEVICE_TYPE DeviceType;
  CCHAR StackSize;
  union {
    LIST_ENTRY ListEntry;
    WAIT_CONTEXT_BLOCK Wcb;
  } Queue;
  ULONG AlignmentRequirement;
  KDEVICE_QUEUE DeviceQueue;
  KDPC Dpc;
  ULONG ActiveThreadCount;
  PSECURITY_DESCRIPTOR SecurityDescriptor;
  KEVENT DeviceLock;
  USHORT SectorSize;
  USHORT Spare1;
  PDEVOBJ_EXTENSION DeviceObjectExtension;
  PVOID Reserved;
} DEVICE_OBJECT, *PDEVICE_OBJECT;

But before using it, kernel must check whether it received a good PDO. Why kernel? Because most generally PDO is allocated by a driver using IoCreateDevice, which fails in case of error and driver can stop properly. If it doesn’t fail, driver keeps pointer in memory and doesn’t need to check it further. But kernel is receiving such pointers from drivers and must ensure they are good (for example, checking they are not random memory addresses). So, what does kernel do to check those pointers? First, it checks for a non-null pointer. No need to continue if pointer is null. It doesn’t check whether PDO pointer is valid, it checks one of it’s member. As you can see upper, there’s a pointer in DEVICE_OBJECT to a DEVOBJ_EXTENSION structure. Internally, an other structure is used: EXTENDED_DEVOBJ_EXTENSION. It extents the “normal” structure, giving more informations, used by the kernel.
This structure looks like that
typedef struct _EXTENDED_DEVOBJ_EXTENSION
{
  // …
  ULONG ExtensionFlags;
  PVOID DeviceNode;
  struct _DEVICE_OBJECT* AttachedTo;
  // …
}
EXTENDED_DEVOBJ_EXTENSION, *PEXTENDED_DEVOBJ_EXTENSION;
This structure provides an interesting opaque pointer to a DEVICE_NODE structure. This structure, that driver should never change, is the one used by kernel to check the validity of the PDO. So, it first checks if the DeviceNode pointer is null. If not, it checks the Flags member of the structure to find if the flag DNF_ENUMERATED is set (which ensure that the PDO has been correctly initialised). When those two conditions are verified, the PDO is considered as valid.

In case of a failed test, what does kernel do? In fact, there a two different way to handle that. A cool one, for none important cases, and a more drastic one for cases where PDO must be valid. When a non valid PDO is received in a not “important” function, it can just leave with a STATUS_INVALID_DEVICE_REQUEST status and let called handle this case. In important cases, kernel must stop Windows execution to prevent any problems. Only one solution, the call to KeBugCheckEx function (producing BSOD…). KeBugCheckEx is called with specific parameters to help debugging. First parameter, the BugCheck code is set to 0xCA. It’s to indicate that it’s the PNP manager (part of the IO branch) that encounters an error and that it can’t recover it. Then come the first BSOD parameter, 0×2 it’s to indicate that PNP manager received an invalid PDO. And finally, we put the PDO address (to be able to have more informations using WinDbg). Then MSDN speaks about a third parameter. The experience shows that Windows (at least XP) doesn’t fill it.

Here is the way we could see that in C. First, we’ll define a helper macro to check the PDO:
#define IopIsValidPhysicalDeviceObject(PhysicalDeviceObject)
((((PEXTENDED_DEVOBJ_EXTENSION)PhysicalDeviceObject->DeviceObjectExtension)->DeviceNode) && (((PEXTENDED_DEVOBJ_EXTENSION)PhysicalDeviceObject->DeviceObjectExtension)->DeviceNode->Flags & DNF_ENUMERATED))

In this macro, we check the PDO as Windows kernel does. Why Iop prefix? Io because it’s part of the IO branch, and p because it’s a private function. And then let’s see the two cases, first one with return:
NTSTATUS IoSomeFunction(PDEVICE_OBJECT PhysicalDeviceObject)
{
  if (!IopIsValidPDO(PhysicalDeviceObject))
  {
    return STATUS_INVALID_DEVICE_REQUEST;
  }
  // …
}

And the second case:
VOID IoSomeFunction(PDEVICE_OBJECT PhysicalDeviceObject)
{
  if (!IopIsValidPDO(PhysicalDeviceObject))
  {
    KeBugCheckEx(PNP_DETECTED_FATAL_ERROR, 0×2, PhysicalDeviceObject, 0, 0);
  }
  // …
}