Great tool in Python (http headers and html links)

May 20, 2008

Simple and extremely useful. URLinfo is a tool for handling web pages: finding information about it, finding headers, internal and external links, etc…

For debug, SEO, extract links.. with google server (Google App Engine)

http://url-info.appspot.com/

url info

Advertisements

A very fast Site-Search engine : Sphinx (with Python API:-)

October 1, 2007

Sphinx

I found this search engine very, very fast…I try to run my own benchmarks and the results are very impressive !

Sphinx is a full text search engine for database content but you can also read XML text files. It was specially designed to integrate well with SQL databases and scripting languages. For very large datasets you can have distributed indices and searching.

Features :

– high indexing speed (upto 10 MB/sec on modern CPUs)

– high search speed (avg query is under 0.1 sec on 2-4 GB text collections)

– high scalability (upto 100 GB of text)

– supports distributed searching

– supports MySQL natively (MyISAM and InnoDB tables are both supported)

– supports phrase searching

– supports phrase proximity ranking, providing good relevance

– supports English and Russian stemming

– supports any number of document fields

– supports document groups

– supports stopwords

– supports different search modes (“match all”, “match phrase”)

– generic XML interface which grealy simplifies custom integration

– pure-PHP (ie. NO module compiling etc) searchd client API- A set of API libraries for popular Web scripting languages (PHP, Python, Perl, and Ruby are bundled)

I’m planning to use it more extensively for indexing and searching … For me it’s end of Lucene (in my products)…

Thanks Andrew Aksyonoff ( For information he began developing Sphinx in 2001)

Site : www.sphinxsearch.com and Site powered by Sphinx


Maintainable, distributable, testable python code

September 5, 2007

main pythonSome suggestions to write a main() function :

Boilerplate for maintainable, distributable, testable python code by Matt Harrison http://panela.blog-city.com/boilerplate_for_maintainable_distributible_testable_python.htm

Another Python main() functions by Guido van Rossum
http://www.artima.com/weblogs/viewpost.jsp?thread=4829


Python 3000 project (Py3k or Python 3.0)

March 22, 2007

In addition to syntax changes, Python 3000 (around 2008) has plenty of new features. This video is a preview of a keynote to be given at PyCon 2007 but a lot of changes are presented.

Example: In Python 3000, print is no longer a statement, but a function. So:

print: print “this is a test”

becomes: print(“this is a test”)

I have a suggestion: UnitTest for all our modules, now!

A good test suite will save you from a huge headache when all of these changes come down.


Python & RSS: Universal Feed Parser

March 19, 2007

Universal Feed ParserMark Pilgrim’s excellent Universal Feed Parser is a great tool for parsing even ill-formed feeds. Universal Feed Parser is a Python module for downloading and parsing syndicated feeds. It also parses several popular extension modules, including Dublin Core and Apple’s iTunes extensions. To use Universal Feed Parser, you will need Python 2.1 or later. Universal Feed Parser is not meant to run standalone; it is a module for you to use as part of a larger Python program.

With this module you can Parsing a feed from a string, a local file or a remote URL:

Universal Feed Parser Examples

Click here for download and more information.


Python HTML/XML parser (error-tolerant)

March 18, 2007

BeautifulSoupBeautiful Soup is an HTML/XML parser for Python that can turn even invalid markup into a parse tree. It provides simple, idiomatic ways of navigating, searching, and modifying the parse tree. This class turns HTML into a tree-like nested tag-soup list of Tag objects and text snippets. A Tag object corresponds to an HTML tag. It knows about the HTML tag’s attributes, and contains a representation of everything contained between the original tag and its closing tag (if any). It’s easy to extract Tags that meet certain criteria.

It commonly saves programmers hours or days of work!


Pygments: A very good Python syntax highlighter

March 16, 2007

pygments Pygments is a syntax highlighting engine written in Python. That means, it will take source code (or other markup) in a supported language and output a processed version (in different formats) containing syntax highlighting markup.

Here is a small example for highlighting Python code:

Python syntax highlighter