Skip to content

Creating posts from RSS feeds in Flarum

Blog
1 1 883 1
  • posttorss.webp

    One of the things that one of my projects has been doing successfully for a few months is querying RSS feeds, then using the Flarum API to create discussions as posts

    Want this ? Sure you do ! Below are the steps, including all scripts etc to make this work

    Firstly, you will need the flarum api client from here

    Installation

    composer require maicol07/flarum-api-client

    Configuration

    In order to start working with the client you might need a Flarum master key:

    • Generate a 40 character random, hard to guess string - this is the Token needed for this package (you can use a generator for this - a good example is here)
    • Manually add it to the api_keys table using phpmyadmin/adminer or another solution.

    The master key is required to access non-public discussions and running actions otherwise reserved for Flarum administrators.

    Install SimplePie

    Next, install SimplePie to parse the RSS feeds

    composer require simplepie/simplepie

    Create storage DB

    Now access your database using phpmyadmin (or something similar) and create a new database called “feed”

    With the database created, run the following script which will create a table called “queue” with a few simple columns

    CREATE TABLE `queue` (
      `id` bigint(20) NOT NULL,
      `url` varchar(500) NOT NULL,
      `title` varchar(500) NOT NULL,
      `seen` int(1) NOT NULL DEFAULT 0
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
    

    As your “feed” database gets bigger, it’ll need some form of index to make it simpler and faster to search. Create as follows in phpmyadmin

    ALTER TABLE `queue`
      ADD PRIMARY KEY (`id`),
      ADD KEY `title` (`title`),
      ADD KEY `url` (`url`);
    

    Finally, we’ll set an AUTO INCREMENT on the ID field of the table

    ALTER TABLE `queue`
      MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
    COMMIT;
    

    Create credentials file

    For security reasons, we “include” a details.php file (you can call this whatever you like - just remember to reflect any change of name in the below main script) outside of the web root. We are going to be running this from PHP-CLI anyway, so it shouldn’t be exposed

    details.php in my case is being included like the below - it’s located at the root of my domain, but outside of the web root

    include("/var/www/vhosts/metabullet.com/details.php");

    Your details.php file should contain this

    <?php
    
    // Variables for posting to Twitter
    define('CONSUMER_KEY', 'YOUR_KEY'); 
    define('CONSUMER_SECRET', 'YOUR_SECRET');
    define('ACCESS_TOKEN', 'YOUR_ACCESS_TOKEN');
    define('ACCESS_TOKEN_SECRET', 'YOUR_ACCESS_TOKEN_SECRET);
    
    $header = array(
        "Authorization: Token THE_TOKEN_YOU_GENERATED_EARLIER",
        "Content-Type: application/json",
    );
    
    // Create DB connection
    $servername = "localhost";
    $login = "YOUR_DB_USER";
    $dbpw = "YOUR_DB_PASSWORD";
    $dbname = "feed";
    $conn = new mysqli($servername, $login, $dbpw, $dbname);
    
    // Check connection
    if ($conn->connect_error) {
        die("Connection failed: " . $conn->connect_error);
    } else {
        echo "Connected to database\n";
    }
    ?>
    

    Create the RSS parser script

    Create a new PHP file called rssparser.php - again, located outside of the web root

    <?php
    //use Abraham\TwitterOAuth\TwitterOAuth;
    @$url = $argv[1];
    @$max = $argv[2];
    if (!$url) {
        die("\n ***** You must provide a URL to process *****\n");
    }
    if (!$max) {
        die("\n ***** You must provide a quantity to process *****\n");
    }
    include "details.php";
    require 'vendor/autoload.php';
    $feed = new SimplePie();
    $feed->enable_cache();
    $feed->set_cache_location("/home/phenomlab/system/.cache");
    $feed->force_feed();
    $feed->set_timeout(30);
    $feed->set_feed_url("$url");
    $feed->init();
    $feed->handle_content_type();
    $feed->enable_order_by_date(true);
    $number = $feed->get_item_quantity($max);
    foreach ($feed->get_items(0, $number) as $items) {
        echo "\033[32m\nProcessing story | " . $items->get_title() . "\n\033[0m";
        $description = str_replace("View Entire Post &rsaquo;", "", $items->get_description());
        $description = str_replace("<img", "\n\n<img", $items->get_description());
        $description = str_replace('<img src="', '', $items->get_description());
        $description = str_replace('" />', '', $items->get_description());
        $description = strip_tags(html_entity_decode($items->get_description()), "<img>") . "\n";
        $description .= "\n" . '[Link to original article](' . $items->get_link() . ')' . "\n\n";
        //echo 'Description: ' . $description . "\n";
        $content = $items->get_content(true);
        //echo '[Link to original article](' .$item->get_link() . ')'."\n";
        // Define variables for use later on in the script
        $subject = $items->get_title();
        $body = trim($description);
        $link = $items->get_link();
        // Query the database for each item. Perform action based on results
        $stmt = $conn->prepare('SELECT url, seen FROM queue WHERE url = ?');
        $stmt->bind_param('s', $link);
        $stmt->execute();
        $stmt->store_result();
        $stmt->bind_result($checklink, $seen);
        $stmt->fetch();
        // Test to see if we have processed these before. If we have, skip them to avoid duplicates
        if (!$checklink || !$seen) {
            echo "Checking " . $link . " \nLine item does not exist - \033[32m\[Processing]\n\033[0m ";
            // Processing new items. Insert record into database to prevent duplication on subsequent processing runs
            $seen = 1;
            $stmt = $conn->prepare('INSERT INTO queue (url, title, seen) VALUES(?, ?, ?)');
            $stmt->bind_param("ssi", $link, $subject, $seen);
            $stmt->execute();
            // Process each newly identified unique post into Flarum using the API
            $ch = curl_init();
            curl_setopt($ch, CURLOPT_URL, 'https://hub.phenomlab.net/api/discussions');
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
            curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
            curl_setopt($ch, CURLOPT_POST, 22);
            curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode((array(
                'data' => array(
                    'type' => "discussions",
                    'attributes' => array(
                        'title' => "$subject",
                        'content' => "$body",
                    ),
                    'relationships' => array(
                        'tags' => array(
                            'data' => array(
                                array(
                                    'type' => 'tags',
                                    'id' => "23",
                                ),
                            ),
                        ),
                    ),
                ),
            ))));
            curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
            $result = curl_exec($ch);
            echo $result;
            //$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
            //$status = $subject . ' ' . $link . ' #infosec #security #technology #phenomlab';
            //$post_tweets = $connection->post("statuses/update", ["status" => $status]);
        }
        // Item has already been processed. Continue loop until count exhausted
        else {
            echo "Checking " . $checklink . "\nLine item already processed - \033[33m[Ignored]\n\033[0m";
        }
    }
    

    Important notes

    @$max = $argv[2]; is the number of RSS items that the script will parse for each resource URL

    curl_setopt($ch, CURLOPT_POST, 22); - “22” in this case is the ID of the user I want to post as. This user needs admin rights.

    array(
    'type' => 'tags',
     'id' => "23"
    )
    

    This array tells the Flarum API in which tag to post. In this case, “23” is the ID of the “news” tag.

    Test it !

    To test your script to ensure it’s working, run from the CLI and the working directory of where your files are located. Note, that the RSS URL will need to change to the one you’re interested in targeting, and the number afterwards is the amount of articles you want to pull at once.

    php rssparser.php http://feeds.bbci.co.uk/news/rss.xml 10

    Watch for the output on the screen. The first time this is run, the script will create posts for all new RSS feeds it has no reference for. Whilst each post item is created, the “feed” database is populated so that subsequent runs are not duplicated.

    Now what ?

    I have this rssparser.php scheduled to run every hour.

    Enjoy - let me know if you have any issues getting this to work.

  • phenomlabundefined phenomlab referenced this topic on
  • phenomlabundefined phenomlab referenced this topic on

Related Topics
  • 3 Votes
    5 Posts
    520 Views
    @crazycells I know that Flarum has (or had) an extension that lists the users. That wasn’t even in the core - as basic as that is.
  • Flarum - WordPress Journey

    WordPress flarum wordpress quitting
    59
    22 Votes
    59 Posts
    6k Views
    @Sala I will likely need admin access to your site for this.
  • 1 Votes
    2 Posts
    435 Views
    @Hari I think you’re referring to this https://sudonix.com/topic/170/creating-posts-from-rss-feeds-in-flarum However, this code was never designed to work with WordPress, but you could leverage the WP-CLI to do something similar without too much effort.
  • Flarum SEO is worst and i still want to use it 😭

    General flarum
    15
    6 Votes
    15 Posts
    2k Views
    @Hari said in Flarum SEO is worst and i still want to use it : Flarum is coded in such a way where it tells spiders not to crawl any internal links by adding nofollow tag. How stupid this is Yes, I agree this doesn’t make any sense. If you compare to WordPress, then (via a plugin of course) you can set the attribute as you wish. It doesn’t make any sense to take a blanket approach. I guess I unerstand why they are doing this, but it’s not an optimum SEO methodology. @Hari said in Flarum SEO is worst and i still want to use it : For few minutes i thought i should register a domain called flarumSEOsucks.com They’d probably sue you for using the Flarum name in a URL
  • installing flarum with plesk

    Solved Configure flarum
    78
    26 Votes
    78 Posts
    9k Views
    @phenomlab thanks a lot, have a nice day
  • move out from flarum to wordpress

    WordPress wordpress flarum migration
    87
    52 Votes
    87 Posts
    11k Views
    @Hari Glad to see this went so well, and that you’ve finally departed the Flarum ecosystem
  • Flarum

    General flarum
    4
    0 Votes
    4 Posts
    532 Views
    @jac said in Flarum: @phenomlab said in Flarum: @jac yes, but they are lacking severely in the sense that they still do not have a functional GDPR extension. In addition, Flarum uses PHP as it’s backend infrastructure meaning it’s going to be so much slower than NodeBB will ever be. Absolutely, I hope Node up their game . Certainly plugin wise anyway
  • CSS Help on my Flarum

    Solved Customisation
    5
    2
    2 Votes
    5 Posts
    759 Views
    @mike-jones Yes, you’ll typically see this type of behaviour if there is another style that has higher priority in the sense that yours will be overridden. Using !important will override the higher preference, but should be used sparingly rather than everywhere.