Last post Jul 26, 2006 12:13 PM by baycom
Jun 27, 2004 08:58 PM|filemon007|LINK
Jun 28, 2004 10:45 AM|filemon007|LINK
Jun 28, 2004 11:21 AM|bhopkins|LINK
Jul 01, 2004 04:04 PM|Sedgewick|LINK
Jul 01, 2004 04:07 PM|Sedgewick|LINK
Oct 22, 2005 06:23 AM|baycom|LINK
Oct 22, 2005 08:39 AM|slope|LINK
Oct 22, 2005 12:46 PM|baycom|LINK
we will probably release it as a commercial module, with full source. Timeframe for release is probably a week, but before that I will have a demo (both user and admin) up and running, so that any suggestions may be included in the first release.
Oct 22, 2005 07:06 PM|slope|LINK
Oct 22, 2005 10:09 PM|JMyung|LINK
Oct 23, 2005 04:47 AM|baycom|LINK
Oct 23, 2005 01:51 PM|JMyung|LINK
Oct 24, 2005 02:09 PM|baycom|LINK
I created a quick demo of the search engine at http://www.opendnn.com/Products/OpenSearchEngine/Demo/tabid/67/Default.aspx
I have not given any admin rights, and there is no real explanation of how the search works.... I will add more details in the days to come.
For now the spider is limited to the site it's on: www.opendnn.com , so you can search for anything on the site. However, in real life you can spider any site you want and as many as you want. As long as they are visible by the google bot, they will be visible
by the Search engine.
The search results will be visible once you execute a search. I believe this to be a nice feature, that allows you to not display an empty results module if there is no need to. The option can be turned on or off.
That's all for now, I will post any updates here, and would appreciate any feedback.
Oct 24, 2005 02:47 PM|simonduz1|LINK
Is there any type of "relavance display" built in to the search results..? I search for multiple keywords and the results are not always what I was looking for. This is very useful to me..
Thanks for the demo.
Oct 24, 2005 05:32 PM|Sailu_tp|LINK
Oct 24, 2005 07:21 PM|dandrade|LINK
Oct 24, 2005 07:23 PM|slope|LINK
Oct 26, 2005 08:23 AM|baycom|LINK
It's not easy to make a comparison of the Default DotNetNuke search and our serach engine, primarily because I was not able to find any documentation on the inner workings of the DNN search engine.
What I could find are bits and pieces, and here is the summary:
The DotNetNuke Search is comprised of 4 applications:
The Scheduled Task - This carries out the process of collating and indexing everything
The Search Results module - This handles finding and displaying results
The Module Indexer - This indexes each module instance. The indexer is also responsible for updating the RSS feeds.
The Search Parser - This takes the text, cleans it, and parses it into the word tables that are used to carry out the searches
Here is a brief blog:
Possible Issues with the current DNN Search Engine:
- In the blog above, there is also a mention of an outof memory error when the sites to be indexed are too large.
- Please correct me if I am wrong, but only the current portal can be indexed and searched. No external websites (even if they are DNN websites can be indexed).
- Only Modules that implement the ISearcheable interface can be indexed
- All Indexes are stored and retrieved in the DB (more sql space needed, possibly slower than file searches).
- Limitedquery syntax functionality (ex... no possibility or limited search options capabilities like adding relevance to a phrase or word, and/or searches etc...).
The advantages I see with the module we developed are the following:
- No limitations of the portals that can be indexed, being these portals DNN or not.
- If we are talking about indexing a DNN portal, all modules in the site(s) will be indexed, weather they implement the iSearchable interface or not. So this could speed up development.
- All indexes are stored in files, that makes for faster results.
- Advanced query sintax to fine-tune results (see examples at
Perhaps one of the most compelling feature (as far as we were concerned when we developed the module) is that the search is based on Lucene, which is an excellent search engine and in continuous evolution, as well as open-source.
The other feature is certainly the ability to crawl and index any site, just like any search engine would.
I hope this helps. I am sure that there are many more pros and cons, and I will be more than happy to address them as they come up. (one of them for example is ordering by relevance... I will be working on that next).
Oct 26, 2005 11:01 AM|DocHoliday|LINK
Oct 26, 2005 11:30 AM|baycom|LINK
Currently Lucene.Net can only index pure text (including html), but not .doc nor .pdf etc... see this link for more details:
However, this is something that I am interested in, and we will certainly come up with a future version that does support office documents and .pdf parsing.
Best Regards and Thank you.
Oct 28, 2005 02:33 PM|grasshopper|LINK
Oct 28, 2005 03:38 PM|edgett|LINK
Having the ability to index Microsoft Office and PDF documents would be incredibly popular.
Also, you may want to get in touch with Chris Cant at
http://www.phdcc.com/seesearchwords/ to see about using his SeeSearchWords functionality.
Oct 28, 2005 06:43 PM|baycom|LINK
Oct 28, 2005 06:59 PM|adefwebserver|LINK
Oct 29, 2005 12:30 AM|FaithMan|LINK
Hope your module works multilingually! Since it's not always the case with default DNN 3's search. Looking forward to it!
Oct 29, 2005 03:51 AM|baycom|LINK
It does index and return results in multiple languages with the correct characters... It does so by looking for the charset attribute in the HTML, and then setting the encoding to the charset value. If there is no charset, it will default to western-european
I have indexed both english and spanish sites (spanish has characters such as ñ and many accents í,ó,ü... that are not cought by utf8) with good results.
Oct 29, 2005 12:09 PM|dandrade|LINK
Oct 29, 2005 01:48 PM|baycom|LINK
I would not port lucene to VB.NET because of the size and the fact that lucene is always being improved, but the search modules and the spider can be converted to vb.net with a converter tool seamlessly. It is only when one tries to go from vb.net to c#
that there could be conversion issues.
Nov 14, 2005 08:44 PM|kmotte|LINK
Nov 18, 2005 06:38 AM|slope|LINK
Nov 21, 2005 02:35 PM|dandrade|LINK
Nov 21, 2005 02:57 PM|dandrade|LINK
Nov 21, 2005 05:31 PM|baycom|LINK
deleting the src .zip from the package was the right thing to do in order to reduce the size. This will not affect the functionality nor installation of the module.
Nov 25, 2005 05:44 PMemail@example.com|LINK
Nov 26, 2005 09:48 AM|GHunter|LINK
1. Is there any sort of "Relevance" or "Rating" of the search results? Having the best matches listed first is ideal for multiple site searches.
2. Can the search Spider be blocked? I would hate to buy the module to find my multi-site searches won't work due to the spider being blocked.
3. Can there be multiple instances of the search module on my portal? Like Search Module 1 searches sites A,B,C and Search Module 2 searches sites D,E,F
Dec 05, 2005 02:24 PM|baycom|LINK
unfortunately there is no such capability built in. It could probably be done (I can point out the spot where all character set validations are done).
Dec 05, 2005 02:29 PM|baycom|LINK
1. There is a relevance rating, and all listing are sorted by relevance... the only thing is that it is not displayed. The code can be easily modified to do so.
2. The search spider will not spider those pages that have the appropriate html spider blocking tags.
3. There can be multiple instances of the search input and search results modules, but they will all show results from the same indexes (all sitest that are spidered).
Jul 26, 2006 12:13 PM|baycom|LINK
For those that have been following this thread, asking about PDF indexing capabilities,
I would like to announce that we finally have implemented this capability.
As a result, Open-SearchEngine is now capable of indexing regular HTML content, as well as MSOffice documents and PDF documents.
More information on our site:
or on SnowCovered: