![]() |
|
Spaces home Tom's Handy Dandy SpacePhotosProfileFriendsMore ![]() | ![]() |
|
Tom's Handy Dandy SpaceJune 13 How-To: Extend Windows Search to index compound file formats like .ZIP filesWindows Search 3.x/4.x has a great story for how to extend it to index a new file type (implement an IFilter), and it has a great story for how to extend it to support a new data store (by implementing an ISearchProtocol handler) but it does not have a very clear mechanism for indexing a compound file format such as a ZIP file so that individual items in the file can can be returned as individual results. This blog attempts to describe how you would go about achieving that with the current WS3.X/4.x architecture. I will use .ZIP as an example. A zip file starts in the file system namespace (aka, FILE://c:/test/test.zip ) but within it it can have sub-folders with individual items in it. Test.Zip |-folder1 | |-folder2 | |- FileX.txt |- FileY.doc
The FILE protocol handler will discover when FILE://c:/test/test.zip changes by monitoring file system change logs, and it will invoke an IFilter registered for .ZIP files on that file when the file changes, but it has no knowledge of the internal structure of the .ZIP file itself. The ZIP file is essentially a data store. The trick is to figure out a way to tell the indexer that it is a data store so that individual items can be indexed and retrieved as unique entities. The first step is to create ISearchProtocol handler for ZIP files. The key is to design into that protocol handler the ability bind to the source file. An example way of doing that would be to have the root folder name be escaped path to the .ZIP file. From there, you can use a hierarchy syntax like any other file format. Here is an example of what I mean:
Using the above sample data for c:\test\test.zip, my unique URLs would be
As you can see, the protocol handler has the information necessary to bind to the .ZIP file and enumerate the child urls including the inner files so that they can be bound to and indexed by the .DOC and .TXT IFilters. A couple of key details:
Now you have a protocol handler which can process and expose the data of the .Zip file as individual items. (Of course, for it to work with the shell, you need to implement a shell namespace extension via IShellFolder for the ZIP://* urls, but that's a seperate topic.) There is only 1 fly in the ointment. Windows Search requires a start page for a protocol for knowing what urls to do incremental crawls on and also to know which urls should be ignored when they are found. But we can't start with a url for each .Zip file, because we don't know where each zip file is. The ZIP protocol handler's start page url needs to be able to enumerate all at the root all of the escaped paths of all of the .ZIP files (which aren't necessarily in the FILE: namespace...aka, their could be a MAPI url which points to a .ZIP file as an attachment). How do you solve this? This is where you get a bit tricky. What you should do is to register a root such as ZIP:/// as a start page. This basically says, "all zip files start here.". Then your protocol handler when processing the root ZIP: url should generate the list of child urls to emit by querying Windows Search for all of the urls with System.FileExtension = 'Zip'. Escape those urls to remove the slashes and return them as child urls and you are good to go. An example query to retrieve what you want would be:
Periodically, windows search will do an incremental crawl on your ZIP:/// root url, at which point you simply reflect back the list of urls which windows search already is maintaining which are zip urls. If a deletion was discovered in the native store where the zip file is stored, then it won't show up in your enumeration and that branch of the tree in the zip will get removed. NOTE: To bind to the .ZIP data for another protocol handler you should ideally go through the IShellFolder for that url to bind to the storage of the object, and not assume it is always a file...this will allow you to work with attachments in mail stores, etc. NOTE 2: When emitting child urls for each .ZIP file you should use PKEY_Search_UrlToIndexWithModificationTime to pass the System.DateModified of the actual ZIP file so that the indexer only crawls it if it has changed. This will work just fine, but there is a further optimization which you can do. Ideally, you would like to have your ZIP urls indexed right when they are created or modified and not have to wait for an incremental crawl to discover their new state. You could in theory monitor the file system yourself for .ZIP file changes, but that's a bit heavy handed, and also won't work for other stores such as MAPI. he best way to accomplish this is a bit tricky again...create an IFilter for the .ZIP file. Whenever your IFilter is invoked, it's because that URL has been discovered or changed. At that point, generate an event for the zip url appropriate for the source url via the IGatherNotifyInline interface. This gives you the ability to immediately tell the indexer there is new data to be indexed without having to wait for the incremental crawl.
January 23 Can someone define the "System.Identity" PKEY_Identity search keyword?
This is the concept of identity which Outlook Express on XP (and Windows Mail on Vista) has. On many machines people login as the default user…but have multiple email accounts set up as identities in OE. When we index, the data for each user in OE is segmented by having the GUID (see HKEY_CURRENT_USER\Identities\ ) for the identity emitted on the emails for each user. Then at query time we constraint the results to items where there is no identity or the identity is the same as the current logged in identity. If I was to try to come up with a description that the documentation should have: Items which are not controlled by system ACLs but belong to the identity as stored by OutlookExpress/Windows Mail under HKEY_CURRENT_USER\Identities key emit PKEY_Identity as the GUID of that identity formatted as a string without ‘{‘, ‘}’, or ‘-‘ chars. The windows search UI will automatically add a constraint to the queries it does to show only results which have no identity or which match the current identity. If you are building a custom query against the index and want to display just the data for a given identity then you should add a constraint to your query against PKEY_Identity. Example: To get the current identity look at HKCU\Identities\Last User Id = “{5E72BE7B-8EFC-43E8-B48B-1B3B12768F02}” The following query will return all items for this identity: SELECT System.ItemUrl FROM SystemIndex WHERE CONTAINS(System.Identity, ‘5E72BE7B8EFC43E8B48B1B3B12768F02’) If you want to get everything which is NOT associated with an identity, then you should use SELECT System.ItemUrl FROM SystemIndex WHERE CONTAINS(System.Identity, ‘0000000000000000’) If you want all items which have no identity or match the current identity you can do this: SELECT System.ItemUrl FROM SystemIndex WHERE CONTAINS(System.Identity, ‘5E72BE7B8EFC43E8B48B1B3B12768F02 OR 0000000000000000’) November 06 Windows Live OneCare FamilySafety is a truly cruddy productSome disclaimer here. I have spent 13+ years at Microsft. I am a big Microsoft fan. I think lots of really talented people at Microsoft make really great software. I also understand how hard it can be to create software.
There is no friggen excuse for the pile that is called Windows Live Family Safety OneCare.
My needs are pretty dang simple. I have 3 computers. I have 2 kids. I want to configure them so that they don't see porn and crap like that. I want them to have access to an email program and messenger so they can chat with family and a limited circle of friends.
So I download and install the latest Microsoft Windows Live suite, which has Windows Live Mail Desktop, Windows Live Messenge, Windows Live Signin Assistent and Windows Live OneCare FamilySafety <good lord, I don't think I could up all of those names...>
The promise is that the windows live brand is going to be such a synergistic experience that it will vault Microsoft to the upper realms of the internet. Instead, they have shown that unless you make software that makes sense and truly works correctly that you might as well stay home.
This software sucks. It shows such a lazy disregard for good software design, passion for the product and customer that it sickens me to have the Microsoft name on it. Ah hell, who am I kidding, it's endemic of the crap Microsoft has been putting out there for years despite my best efforts and the minority of people who can design something half decent.
I don't even know where to start. The installer is nice. Good job with that guys. But then I have to give it my live ID, no problem. It keeps trying to ask me to migrate my old settings from MSN (which was a decent piece of software if they had only not stopped development on it 3 years ago.) by navigating me to a page which I have to login in with my passport and then it says, "this page doesn't exist." Nice job guys, chalk one up for QA. Eventually it decides to accept me.
So I try to add my kids passports as children. But each time I try to add them it it comes back and says, "something is missing from their profile." So I have to login each passport several times until the profile page finally decides that each live-id has enough information. Okaaay. That was trully annoying.
Now to set up the contact list for each child. Is there an import button? Nope. I have to type in each FirstName/LastName/Email for each contact. <sigh> Then I get to do it again for the next child. 50 X 2 = 100 contacts I had to input by hand. OK.
On each computer I use fast user-switching so that each person has their own environment. This means that I have to login to each OS account. Messenger pops up and I have to enter in the live ID at that point...OK, that's fine. When I try to access the internet it asks AGAIN for the passport. (WTF? I just gave it messenger and XP!!!!) OK, so I give it the passport again. And I check the box which says, "Hey, do this automtically please" I like that since each user has their own windows account.
But it won't let me do that, because my users don't have password accounts. Sooooo I have to go and add passwords for all 4 users on 3 systems. Now I come back to OneCareFamilySafety and it will let me enter in the users passport and password for "automatically login". OK, this has been a royal pain in the ass, but I have managed to do this for all 4 users X 3 machines.
I want to set up email support for my family as well, so I configure Windows Live Desktop Email with Hotmail technology(tm) to access the hotmail accounts of my kids passports. EXCEPT THAT IT WON'T WORK WITH IT!!!! Why should it, IT ONLY COMES FROM THE SAME COMPANY WITH THE SAME INSTALLER!!!!
Turns out you can only use the web version of hotmail with onecare familysafety suckfest. OK, I guess I will configure hotmail to be the default email client. It's not optimal, but I guess it will work. OH YEAH, EXCEPT THAT VISTA DOESN'T LET YOU DO THAT ANY MORE!!! You can't configure a webmail as the default email client on vista!
OKKKKKKAAAAY.....so when I login as my son, ....Messenger pops up and auto-logins in as him. So far so good, except if you recall, the contact list I spent an hour inputing (where they required me to input firstname/lastname/emailaddress.) the messenger UI only shows the email addresses. No, why would I want to use the friendly name I just spent all of that time inputing. Instead of Grandpa Laird, no I get eckdes1993@onesite.net . Yep, my 7 year old is really gonna figure that one out.
And to make matters even better, the auto-login feature flat out does NOT work. Every single user on every single box, when I login and try to access the internet comes up and asks for the username and password. NOT the last used username, oh no, you get to scroll through the user names and select the one that you have just input in 50 times because this software doesn't coordinate or work well together.
At this point, I just want to get it working well enough that my kids can get to their favorite sites. But what happens when you try to use IE? Why, the default home page of IE is a the RUNONCE, which isn't on the approved list. WHY THE HELL NOT? Don't we trust our own hack home page?
I would approve it, except that when I try to, it again complains that I need to migrate my MSN family settings and after typing in my password for the 1000 time redirects me back to the broken link.
I can't think of anything about this experience that worked other than the installer. What a waste of time and money and MY time and disk space.
I THINK THIS ENTIRE TEAM SHOULD BE FIRED. Incompetence on such a grand scale is stunning. Did they really accept money over the last 3 years coming up with *this*? Do they really feel like they are making the world a better place?
This whole experience sickens me...and convinces me that the Microsoft of today is sick as well. What baboonery. If you are going to claim that windows live is going to provide a best of breed experience because of synergy...you have to friggen deliver a best of breed experience AND it has to use synergy.
This is such a load of steaming poo I am going to take a shower to cleanse my sould from the stain it put on it.
p.s. I like the new messenger, I like the new hotmail, I like the domain hosting on hotmail, I like the live writer stuff. This is 99% squarely on the feet of OneCare FamilySafety.
June 12 Windows x64 design decisions befuddle meI've been looking at porting an application to work on windows X64. They did some very strange things with this release, which I guess is to make things easier for people porting their code, but IMHO makes the long term way confusing. The basic rules around 64 bit is that a 64 bit app can only load 64 bit dlls and a 32 bit app can only load 32 bit dlls. That's reasonable enough. Also, Microsoft has gone to great lengths to create a Wow64 layer which allows a 32bit application to run native. That's cool too. But you start to look at how they did things and its very strange. 1. Where is System64? When the transistion from 16 to 32 bits was made there was a problem with where things lived. The Windows team came up with a sensible solution. C:\windows\system is for 16 bit dlls (because win3.X used that path for 16 bit windows.) c:\windows\system32 was for 32 bit dlls. So you would think that 64 bit dll's go into c:\windows\system64, right? NOPE. Instead...when you are running a 32 bit app on XP x64, the OS lies to the app and tells it is putting it's dll's into c:\windows\system32, but instead it puts them into c:\windows\system32Wow64. oookaay...So where do the 64 bit DLLs go? Why it puts them into c:\windows\system32...of course! So the 64 bit DLLs are in the directory labeled 32 and the 32 bit dlls are in the directory labeled 64. <sigh> 2. Where is InprocServer64? The original windows with COM servers did the same trick. If you had an inproc server and it was 16 bit DLL, it was called InProcServer. If you had a 32 bit implementation it was called InProcServer32. So you would expect that if you had a in proc server which was 64 bit it would be registered as InProcServer64, right? NOPE. Again, they decided to have the API lie based on the bitness of the appliaction calling the windows API. So InProcServer32 puts to a 32 bit DLL if you are a 32 bit app and a 64 bit dll if you are a 64 bit app. What if you want to publish a COM component for both 32 bit and 64 bit? I think you are out of luck. WTF-99, that's what I say. <sigh, sigh> 3. 32 bit COM servers can be used by using COM marshallers...but they dont. So if you have a 64 bit app and you call a 32 bit com out of proc server it all works, because COM knows how to marshal across process boundaries. And COM has the ability of making any inproc COM server run in another process (look up DllSuragate on MSDN)...so you would think that if you had a 64 bit app which tried to use an InprocServer which happened to be the wrong bit-ness they would just move it into an out of proc 32 bit surragate. They don't. This would be OK if you could register a COM server for both, but as I said in point #2, I don't think you can. So how am I supposed to create a COM Server that can be used by anyone? The only way I can see to do it is to always have it be out of proc. This causes perf problems of course. I don't think I can create both a 32 bit and 64 bit COM handler, so I am not certain how on earth OLE will work, when you don't know the bitness of the container you are in. <sigh, sigh, sigh> I would love to be wrong about these points. It seems like they are bending over backwards for this stuff, at the expense of us having to live with badly named directories and confusing and miosleading registry keys for a long long time. Alas...that's my rant against the wind for today. -Tom December 14 MSN Toolbar Suite has finally shipped!I am a developer on the MSN Toolbar Suite doing the desktop search component and it has finally released to the world! Whew, that was a lot of work, but it is really gratifying to see the positive comments out there! A couple of things that people have made which I would like to comment on are: A. IE is required (Firefox rulez) etc.
B. Microsot if just copying Google/Apple/Copernic (fill in your favorite DS product here.)
At any rate, had to get that off my chest. More will come later. Best regards! -Tom
December 02 I'm so tired of C++ and linkersI had an urgent change to make to our code-base. All I had to do was to move some code from one component over to another component. The actual coding took only about an hour. The amount of time figuring out the magic set of .LIBs, flags, include paths, libpaths, #defines etc. which would allow my project to BUILD was 5 freakin hours! Contrast this with .NET and a system which is self-describing and I'm just crying in my beer. <bah!>
December 01 First Blog in Blog-landGolly, here I am at work instead of home, and messing with the new spaces. Kinda cool. |
|||||||||||
|
|