Hi, I'm JT and these are my thoughts on community, content management, Plain Black, and WebGUI.

WebGUI 8: Search

User: JT
Date: 8/12/2008 12:14 pm
Views: 1726
Rating: 3    Rate [

+

|

-

]
Send to a Friend

While WebGUI's internal search engine works great, it has become clear to me that we're going to need to replace it with something better.

One of the things I want to do with WebGUI 8 is switch WebGUI to use the InnoDB storage engine rather than MyISAM. This will allow us to have foreign keys, and database level transactions. However, if we make this switch, then the search engine will break, because we rely on MySQL FullTextSearch fields to do our search. And those field types are not allowed in InnoDB.

In addition, the current search isn't capable of stemming. That is, if I type in "jobs", it should search for "job" as well; or "mining" will search for "mine". There's no way to add that using the current search engine.

There are a couple other features I want to add, which can be added to the current search mechanism, but we might as well add them to the new search instead. They are spell check and plugins.

Spell check is a rather trivial thing to add. When you search it checks your search terms for spelling errors and then offers you an alternative search. So if you search for "collabration systm" it could say "Did you mean collaboration system?".

The second thing I want to add are query filter plugins. These plugins would allow you to attach your own functionality to WebGUI search. So if the pattern matched number operator number (like 9*9) and if there were a plugin that could handle math, that plugin would respond with 81 rather than a search result. That may not seem terribly useful, but consider these ideas. You could have a plugin that would search google, so the user could type "google:cars" and it would search google for
"cars" and return the results inside your site design. Let's say you were a parts dealer, you could also have a search plugin that looked for part number patterns and returned links to those specific parts in your inventory database. So if the search saw "LNB9065" it could directly link to the "Champion Spark Plug" in your product catalog.

Replacing search is a somewhat big todo item along with the other stuff we have planned, but there may be some small API changes when we do it, so WebGUI 8 is the perfect place to make that happen.

Replies

Flat
Re: WebGUI 8: Search
User: perlmonkey2
Date: 8/12/2008 1:10 pm
Rating: 10    Rate [

+

|

-

]
Status: Approved

Will the new Search index the pages ahead of time?  If so you could get really crazy and categorize the indexed words on each page based on the 'sense' the word was used.  Perl has some great WordNet libs to do this and it would allow returns to be grouped into categories.  So if they search for 'bat', you could have your articles about flying rodents seperate from your articles about baseball intruments.  You could even use multiple word searches to determine the sense of each word used in the query to determine which category should be shown first.  Just my $.04 (used to be $.02, but with inflation and all, it costs twice as much).


Re: WebGUI 8: Search
User: JT
Date: 8/12/2008 1:27 pm
Rating: 17    Rate [

+

|

-

]
Status: Approved

We'll likely be using Sphinx (http://www.sphinxsearch.com/) or KinoSearch (http://search.cpan.org/~creamyg/KinoSearch-0.162/lib/KinoSearch.pm) for indexing behind the scenes, which will handle all this automatically for us.


Re: WebGUI 8: Search
User: apeiron
Date: 8/12/2008 1:13 pm
Rating: 25    Rate [

+

|

-

]
Status: Approved

 I think stemming should be an option rather than a default. I can think of counterexamples for when stemming is a bad idea. If I search for "The Doors", I certainly am not interested in results for "door".

That said, everything you've mentioned here has me quite interested in wG 8.


Re: WebGUI 8: Search
User: JT
Date: 8/12/2008 1:28 pm
Rating: 5    Rate [

+

|

-

]
Status: Approved

Google and other search engines automatically do stemming without making it an option. The end user never knows the difference because the original word is weighted higher than the stemmed word.


Re: WebGUI 8: Search
User: knowmad
Date: 8/12/2008 2:33 pm
Rating: 3    Rate [

+

|

-

]
Status: Approved

JT,

I'm glad to see you are thinking about the future of search. As evident by the success of Google, Yahoo and other search portals, search is an integral feature of an enterprise CMS.

As I've been working on my "WebGUI Search" presentation for the WUC, I've been amazed at the power of this feature as it is currently implemented and have also come across some of the limitations you mention. There are some great features worth keeping such as the extendable document indexing and user-controlled page synopsis.

The key features that I see important in the next version of search include advanced search with multi-field searching and customizable ordering of results, real relevancy scores, and keyword weighting (e.g., title, header, image or first paragraph keywords have more relevance than others).

However, there are some implementation changes that could be done today to improve the current system. These include the following:

  • logging and reporting of search queries (this is important for understanding how a user interacts with your web site)
  • better control of pagination (similar to SQL Report)
  • better control of length of the synopsis (similar to options available in RSS feeds)
  • more asset info in results (e.g., asset revision date, relevance score)

 

William

----
Knowmad Technologies
http://www.knowmad.com


Re: WebGUI 8: Search
User: yhkhoe
Date: 8/13/2008 2:09 pm
Rating: 3    Rate [

+

|

-

]
Status: Approved

If the search is going to be rewritten completely, it would be nice if the API was not completely written for searching assets only. There are parts of WebGUI that search things other then assets, like users and Thingy's things. Developing these kind of things would be easier if they could use (part of) the search API.

Yung


Re: WebGUI 8: Search
User: JT
Date: 8/18/2008 10:26 am
Rating: 2    Rate [

+

|

-

]
Status: Approved

It really can't be though, and it wouldn't help with Thingy anyway. Here's why:

  1. The system needs to adhere to privileges, therefore it needs to know about the assets it's indexing, and it needs to be able to instantiate them. Things are different objects, and simply don't operate by the same rules as assets.
  2. It also needs to know about the assets because it needs to know their lineage so it can limit the scope of the search to that lineage. Things don't have a lineage, though the thingy does.
  3. The search can limit by class name, which again is looking at asset classes, not arbitrary other stuff.
  4. Thing privileges, display mechanisms, urls, etc, all work differently, so you're going to have to write a specific search system anyway. Same goes for other non-thingy content.
The good news is that this is why I'm talking about the search plugins. A search indexer plugin could be written for Thingy (or anything else for that matter), to handle different non-asset content types.


Re: WebGUI 8: Search
User: baylink
Date: 8/15/2008 2:06 pm
Rating: 2    Rate [

+

|

-

]
Status: Approved

Two (ok, three :-) thoughts, stolen bodily from Other People's Work:

1) (from Palm Pilots) What ought to be searchable is something that only a given wobject actually knows; ie: you're probably going to have to think about some sort of callback architecture, where each wobject provides a function that returns the indexable data it knows about.

2) (from websites) Within a large website, there may be many domains to which searchability needs to be limitable: sometimes you want to search the text of the website; sometimes you want to search this knowledge base, sometimes you want to search that pile of wiki pages; on large enough sites you only rarely want to search everything.

3) The default for multi-word searches is *AND* (damnit!).  Hey; if ePinions can get this wrong, and *keep it wrong for 8 years*, it's worth mentioning.  Peter Merholz was working there when I noticed it, and I mentioned it to him, and it's *still* broken: there' s way to do an AND search on ePinions, which is why I don't bother writing there anymore.


Re: WebGUI 8: Search
User: JT
Date: 8/15/2008 2:18 pm
Rating: 2    Rate [

+

|

-

]
Status: Approved

You'll then be happy to know that these are all already features of WebGUI 7's search. =)


Re: WebGUI 8: Search
User: baylink
Date: 8/15/2008 2:24 pm
Rating: 12    Rate [

+

|

-

]
Status: Approved

All three of them?

Ok, call me Captain Obvious.  :-)

This is a nice new protocol for us, JT: I say "wouldnt it be nice if" and you say "read the manual, dummy" instead of "tough crap".  :-)


Re: WebGUI 8: Search
User: knowmad
Date: 8/15/2008 5:12 pm
Rating: 11    Rate [

+

|

-

]
Status: Approved

And I'll be going over all three of these (plus more) in my talk at the WUC.

William

----
Knowmad Technologies
http://www.knowmad.com


Re: WebGUI 8: Search
User: arjan
Date: 8/31/2008 11:44 am
Rating: 9    Rate [

+

|

-

]
Status: Approved

I think you did some very good suggestions in you talk at the WUC. The practical difficulties you've experienced are the reason that I usually used an SQL report for search instead of the search. And I also liked your suggestions about how to use the search, such as a) using get to post to b) the page the search is on instead of to the search asset itself for example.

It's great that it is now possible to show a link to the assets container instead of to the asset itself. This does mean that the same link can show up several times although the descriptions differ. I don't know if the assetId of the asset found is available as a template variable in the search results, but if so, then the the #assetId part could be added to this container-url. Perhaps something for the new default templates?

What I often do when I create an SQL report for searching is using a) relevance and b) categories. In this example you see an icon to indicated the kind of content you've found; a discussion, a tutorial, a question, etc.

 

Kind regards,

Arjan Widlak

United Knowledge
Internet for the public sector

www.unitedknowledge.nl


Re: WebGUI 8: Search
User: elnino
Date: 8/19/2008 7:48 am
Rating: 5    Rate [

+

|

-

]
Status: Approved

Hi! 

I'm glad to hear the search asset will be reviewed. When we initially installed 7.3.x, we found that the search asset was in adequate for our needs. So we ended up using google's custom search - which is always an option for people.

While, I understand why you did it, at the time, the webgui search allowed for searching individual assets (which is great), but it only returned the individual assets [that you are searching] (which was bad for us). Where as, we wanted to search the individual assets, but we wanted to view them in the context of the pagelayout that they were found on - hopefully that makes sense.

Maybe it's been addressed, if not, I imagine that others are probably looking for a similar function - so I thought I'd mention it if it hasn't already been.

LN


Re: WebGUI 8: Search
User: JT
Date: 8/19/2008 8:32 am
Rating: -3    Rate [

+

|

-

]
Status: Approved

I can't remember if it was 7.4 or 7.5, but we added that feature. Look for the "Use Containers" switch in the search asset.


Re: WebGUI 8: Search
User: knowmad
Date: 8/19/2008 8:43 am
Rating: 8    Rate [

+

|

-

]
Status: Approved

It's a new feature of 7.5.x and much appreciated!

----
Knowmad Technologies
http://www.knowmad.com


Re: WebGUI 8: Search
User: elnino
Date: 8/19/2008 8:59 am
Rating: 3    Rate [

+

|

-

]
Status: Approved

VERY cool. Thank you! you folks are always on top of it!

LN


PreviousBackNext