A plaintext RSS crawler

Find a file

Denis Lehmann 5fc3666cbe edit README		2020-04-14 23:41:37 +02:00
fonts	add Canterbury font	2020-04-14 12:14:40 +02:00
images	add logo images	2020-04-14 12:14:59 +02:00
scripts	correct filename	2020-04-14 23:15:44 +02:00
.gitignore	Initial commit	2020-02-27 12:44:18 +01:00
config.toml	change config	2020-04-14 22:20:59 +02:00
default.nix	add nix python environment	2020-04-13 16:15:27 +02:00
LICENSE	Initial commit	2020-02-27 12:44:18 +01:00
README.md	edit README	2020-04-14 23:41:37 +02:00
requirements.txt	order requirements	2020-04-14 22:48:31 +02:00
spiderss.py	change error handling	2020-04-14 23:22:45 +02:00

README.md

spiderss - a plaintext RSS crawler

spiderss is a plaintext RSS crawler, based on feedparser, python-readability and html2text Articles are parsed as Markdown files from the original article web page and stored on the filesystem.

Features

Categories
Delete articles after a few days
Distinguish new from read articles
Store loved articles forever
OPML import

Installation

NixOS

Just call nix-shell in the project directory.

Legacy OS

Install the requirements with pip install -r requirements.txt.

Usage

./spiderss.py [-h] [-V] [-v] [-c CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show version and exit
  -v, --verbose         verbose output
  -c CONFIG, --config CONFIG
                        config file (default: ./config.toml)

Config

The config file is written in TOML and has the following variables:

base_dir: The base directory where your articles are stored.

max_age: The amount of days, your articles are kept on the filesystem. Articles in the loved folder are skipped.

feed: Is a feed element. It has the following attributes:
category: Category of the feed.
name: Name of the feed.
url: URL of the feed.

Why?

Because plaintext is God.

How can I read the articles?

Use your favourite Markdown viewer, or just the console. spiderss integrates nice with the ranger filemanager to browse categories.

How does it work?

Edit the config.toml file to your liking and run the script. The script creates a folder structure the following way:

base_directory
| - category
    | - feedname
        | - new
        | - read
    | - another feedname
        | - new
        | - read
| - another category
    | - a third feedname
        | - new
        | - read
| - loved

Every feed gets a new and a read subfolder. Article files are stored in the new folder. Move them to the read folder if you're done reading them. You can do this easily e.g. by setting a keybinding in ranger.

A special folder, called loved, is created on startup in the base_directory. It is intended for articles, you want to keep. Articles inside are never deleted, even if they expire the max_age.