links for 2010-02-20
-
Python module for scraping web pages. Creates a Region object of a chunk of sourcecode with a start, an end, source URL, headers, raw content and plain text content. Does not create a parse tree. Maybe interesting to try compared with modules like BeautifulSoup.
chris @ February 21, 2010