Simple and extremely useful. URLinfo is a tool for handling web pages: finding information about it, finding headers, internal and external links, etc…
For debug, SEO, extract links.. with google server (Google App Engine)
I found this search engine very, very fast…I try to run my own benchmarks and the results are very impressive !
Sphinx is a full text search engine for database content but you can also read XML text files. It was specially designed to integrate well with SQL databases and scripting languages. For very large datasets you can have distributed indices and searching.
– high indexing speed (upto 10 MB/sec on modern CPUs)
– high search speed (avg query is under 0.1 sec on 2-4 GB text collections)
– high scalability (upto 100 GB of text)
– supports distributed searching
– supports MySQL natively (MyISAM and InnoDB tables are both supported)
– supports phrase searching
– supports phrase proximity ranking, providing good relevance
– supports English and Russian stemming
– supports any number of document fields
– supports document groups
– supports stopwords
– supports different search modes (“match all”, “match phrase”)
– generic XML interface which grealy simplifies custom integration
– pure-PHP (ie. NO module compiling etc) searchd client API- A set of API libraries for popular Web scripting languages (PHP, Python, Perl, and Ruby are bundled)
I’m planning to use it more extensively for indexing and searching … For me it’s end of Lucene (in my products)…
Thanks Andrew Aksyonoff ( For information he began developing Sphinx in 2001)
Boilerplate for maintainable, distributable, testable python code by Matt Harrison http://panela.blog-city.com/boilerplate_for_maintainable_distributible_testable_python.htm
Another Python main() functions by Guido van Rossum
In addition to syntax changes, Python 3000 (around 2008) has plenty of new features. This video is a preview of a keynote to be given at PyCon 2007 but a lot of changes are presented.
Example: In Python 3000, print is no longer a statement, but a function. So:
print: print “this is a test”
becomes: print(“this is a test”)
I have a suggestion: UnitTest for all our modules, now!
A good test suite will save you from a huge headache when all of these changes come down.
Mark Pilgrim’s excellent Universal Feed Parser is a great tool for parsing even ill-formed feeds. Universal Feed Parser is a Python module for downloading and parsing syndicated feeds. It also parses several popular extension modules, including Dublin Core and Apple’s iTunes extensions. To use Universal Feed Parser, you will need Python 2.1 or later. Universal Feed Parser is not meant to run standalone; it is a module for you to use as part of a larger Python program.
With this module you can Parsing a feed from a string, a local file or a remote URL:
Beautiful Soup is an HTML/XML parser for Python that can turn even invalid markup into a parse tree. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. This class turns HTML into a tree-like nested tag-soup list of Tag objects and text snippets. A Tag object corresponds to an HTML tag. It knows about the HTML tag’s attributes, and contains a representation of everything contained between the original tag and its closing tag (if any). It’s easy to extract Tags that meet certain criteria.
It commonly saves programmers hours or days of work!