Software: nexusaboutDon't you hate all those RSS feeds with only title and links and no content? This little program will help you. It turns those RSS feeds into ones with full content. It does this by reading the original RSS file, following every link, downloading the content and putting it into the RSS files as HTML snippets. To separate the real content from the fluff like menus, advertising etc. it applies a little trick: Most web pages use invisible HTML tables for layout. Often the menu is in the left column, some advertising in the right column, and the real content in the middle column. But sites differ in how they make use of the tables. Nexus computes for each table cell (between <td> tags) a content to tag ratio, normally the table cell with the real content has a lot more content compared to the number of tags then all the other cells. And so nexus finds the content cell and strips everything else. This works amazingly well for a lot of pages, for some it fails completely and for many it is sort of ok. Try it yourself on your favourite feeds. Note that copyright still applies. You probably don't have the right to redistribute the generated RSS files, but private use should be ok. status
A proof-of-concept implementation of nexus written in Perl is available. For some RSS feeds it works extremely well, for others the results are rather poor. downloadDownload Perl code here: nexus. You will need the XML::RSS and LWP::Simple modules. futureThis program could be improved a lot by writing better heuristics to find the real content. Relative links (for instance to images) are not fixed, so many images will not appear on the generated RSS. This can be considered a feature or a bug. Nexus creates ugly RSS files with escaped HTML code. It would be better to create properly namespaced XHTML inside RSS, but for the moment this has to do because the XML::RSS lib does it this way. | ||||
|