# spiderss - a plaintext RSS crawler

![spiderss logo](images/logo.png)

__spiderss__ is a plaintext RSS crawler, based on [feedparser](https://github.com/kurtmckee/feedparser), [python-readability](https://github.com/buriy/python-readability) and [html2text](https://github.com/Alir3z4/html2text)
Articles are parsed as Markdown files from the original article web page and stored on the filesystem.

## Features

- Categories
- Delete articles after a few days
- Distinguish _new_ from _read_ articles
- Store _loved_ articles forever
- OPML import

## Installation

### NixOS

Just call `nix-shell` in the project directory.

### Legacy OS

Install the requirements with `pip install -r requirements.txt`.

## Usage

```
./spiderss.py [-h] [-V] [-v] [-c CONFIG]

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show version and exit
  -v, --verbose         verbose output
  -c CONFIG, --config CONFIG
                        config file (default: ./config.toml)
```

### Config

The config file is written in TOML and has the following variables:

__base_dir__: The base directory where your articles are stored.

__max_age__: The amount of days, your articles are kept on the filesystem. Articles in the __loved__ folder are skipped.

__[[feed]]__: Is a feed element. It has the following attributes:\
__category__: Category of the feed.\
__name__: Name of the feed.\
__url__: URL of the feed.

## Why?

Because plaintext is God.

## How can I read the articles?

Use your favourite Markdown viewer, or just the console.
__spiderss__ integrates nice with the [ranger](https://github.com/ranger/ranger) filemanager to browse categories.

## How does it work?

Edit the `config.toml` file to your liking and run the script.
The script creates a folder structure the following way:

```
base_directory
| - category
    | - feedname
        | - new
        | - read
    | - another feedname
        | - new
        | - read
| - another category
    | - a third feedname
        | - new
        | - read
| - loved
```

Every feed gets a __new__ and a __read__ subfolder.
Article files are stored in the __new__ folder.
Move them to the __read__ folder if you're done reading them.
You can do this easily e.g. by setting a keybinding in ranger.

A special folder, called __loved__, is created on startup in the base_directory.
It is intended for articles, you want to keep.
Articles inside are never deleted, even if they expire the max_age.