A good faceted SiteSearch engine: Solr

October 12, 2007

Solr Solr is an enterprise-ready, Lucene-based search server that supports faceted searching, hit highlighting, and multiple output formats.

Solr features caching, replication support for load balancing, faceting, highlighting, more-like-this, simple HTTP protocol supporting XML, JSON, and other formats. Lucene powers Solr’s RESTful web services.
Solr is an enterprise-ready, Lucene-based search server that supports faceted searching, hit highlighting, and multiple output formats.

Solr Uses the Lucene Search Library and Extends it!

* A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys
* Powerful Extensions to the Lucene Query Language
* Support for Dynamic Faceted Browsing and Filtering
* Advanced, Configurable Text Analysis
* Highly Configurable and User Extensible Caching
* Performance Optimizations
* External Configuration via XML
* An Administration Interface
* Monitorable Logging
* Fast Incremental Updates and Snapshot Distribution
* XML and CSV/delimited-text update formats

An example of a Solr admin screen:

Solr Admin

Advertisements

A very fast Site-Search engine : Sphinx (with Python API:-)

October 1, 2007

Sphinx

I found this search engine very, very fast…I try to run my own benchmarks and the results are very impressive !

Sphinx is a full text search engine for database content but you can also read XML text files. It was specially designed to integrate well with SQL databases and scripting languages. For very large datasets you can have distributed indices and searching.

Features :

– high indexing speed (upto 10 MB/sec on modern CPUs)

– high search speed (avg query is under 0.1 sec on 2-4 GB text collections)

– high scalability (upto 100 GB of text)

– supports distributed searching

– supports MySQL natively (MyISAM and InnoDB tables are both supported)

– supports phrase searching

– supports phrase proximity ranking, providing good relevance

– supports English and Russian stemming

– supports any number of document fields

– supports document groups

– supports stopwords

– supports different search modes (“match all”, “match phrase”)

– generic XML interface which grealy simplifies custom integration

– pure-PHP (ie. NO module compiling etc) searchd client API- A set of API libraries for popular Web scripting languages (PHP, Python, Perl, and Ruby are bundled)

I’m planning to use it more extensively for indexing and searching … For me it’s end of Lucene (in my products)…

Thanks Andrew Aksyonoff ( For information he began developing Sphinx in 2001)

Site : www.sphinxsearch.com and Site powered by Sphinx