Creating posts from RSS feeds in Flarum

phenomlab · Edited 15/03/2022, 17:39

One of the things that one of my projects has been doing successfully for a few months is querying RSS feeds, then using the Flarum API to create discussions as posts

Want this ? Sure you do ! Below are the steps, including all scripts etc to make this work

Firstly, you will need the flarum api client from here

Installation

composer require maicol07/flarum-api-client

Configuration

In order to start working with the client you might need a Flarum master key:

Generate a 40 character random, hard to guess string - this is the Token needed for this package (you can use a generator for this - a good example is here)
Manually add it to the api_keys table using phpmyadmin/adminer or another solution.

The master key is required to access non-public discussions and running actions otherwise reserved for Flarum administrators.

Install SimplePie

Next, install SimplePie to parse the RSS feeds

composer require simplepie/simplepie

Create storage DB

Now access your database using phpmyadmin (or something similar) and create a new database called “feed”

With the database created, run the following script which will create a table called “queue” with a few simple columns

 CREATE TABLE `queue` (
  `id` bigint(20) NOT NULL,
  `url` varchar(500) NOT NULL,
  `title` varchar(500) NOT NULL,
  `seen` int(1) NOT NULL DEFAULT 0
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

As your “feed” database gets bigger, it’ll need some form of index to make it simpler and faster to search. Create as follows in phpmyadmin

 ALTER TABLE `queue`
  ADD PRIMARY KEY (`id`),
  ADD KEY `title` (`title`),
  ADD KEY `url` (`url`);

Finally, we’ll set an AUTO INCREMENT on the ID field of the table

 ALTER TABLE `queue`
  MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
COMMIT;

Create credentials file

For security reasons, we “include” a details.php file (you can call this whatever you like - just remember to reflect any change of name in the below main script) outside of the web root. We are going to be running this from PHP-CLI anyway, so it shouldn’t be exposed

details.php in my case is being included like the below - it’s located at the root of my domain, but outside of the web root

include("/var/www/vhosts/metabullet.com/details.php");

Your details.php file should contain this

 <?php
 
// Variables for posting to Twitter
define('CONSUMER_KEY', 'YOUR_KEY'); 
define('CONSUMER_SECRET', 'YOUR_SECRET');
define('ACCESS_TOKEN', 'YOUR_ACCESS_TOKEN');
define('ACCESS_TOKEN_SECRET', 'YOUR_ACCESS_TOKEN_SECRET);
 
$header = array(
    "Authorization: Token THE_TOKEN_YOU_GENERATED_EARLIER",
    "Content-Type: application/json",
);
 
// Create DB connection
$servername = "localhost";
$login = "YOUR_DB_USER";
$dbpw = "YOUR_DB_PASSWORD";
$dbname = "feed";
$conn = new mysqli($servername, $login, $dbpw, $dbname);
 
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
} else {
    echo "Connected to database\n";
}
?>

Create the RSS parser script

Create a new PHP file called rssparser.php - again, located outside of the web root

 <?php
//use Abraham\TwitterOAuth\TwitterOAuth;
@$url = $argv[1];
@$max = $argv[2];
if (!$url) {
    die("\n ***** You must provide a URL to process *****\n");
}
if (!$max) {
    die("\n ***** You must provide a quantity to process *****\n");
}
include "details.php";
require 'vendor/autoload.php';
$feed = new SimplePie();
$feed->enable_cache();
$feed->set_cache_location("/home/phenomlab/system/.cache");
$feed->force_feed();
$feed->set_timeout(30);
$feed->set_feed_url("$url");
$feed->init();
$feed->handle_content_type();
$feed->enable_order_by_date(true);
$number = $feed->get_item_quantity($max);
foreach ($feed->get_items(0, $number) as $items) {
    echo "\033[32m\nProcessing story | " . $items->get_title() . "\n\033[0m";
    $description = str_replace("View Entire Post &rsaquo;", "", $items->get_description());
    $description = str_replace("<img", "\n\n<img", $items->get_description());
    $description = str_replace('<img src="', '', $items->get_description());
    $description = str_replace('" />', '', $items->get_description());
    $description = strip_tags(html_entity_decode($items->get_description()), "<img>") . "\n";
    $description .= "\n" . '[Link to original article](' . $items->get_link() . ')' . "\n\n";
    //echo 'Description: ' . $description . "\n";
    $content = $items->get_content(true);
    //echo '[Link to original article](' .$item->get_link() . ')'."\n";
    // Define variables for use later on in the script
    $subject = $items->get_title();
    $body = trim($description);
    $link = $items->get_link();
    // Query the database for each item. Perform action based on results
    $stmt = $conn->prepare('SELECT url, seen FROM queue WHERE url = ?');
    $stmt->bind_param('s', $link);
    $stmt->execute();
    $stmt->store_result();
    $stmt->bind_result($checklink, $seen);
    $stmt->fetch();
    // Test to see if we have processed these before. If we have, skip them to avoid duplicates
    if (!$checklink || !$seen) {
        echo "Checking " . $link . " \nLine item does not exist - \033[32m\[Processing]\n\033[0m ";
        // Processing new items. Insert record into database to prevent duplication on subsequent processing runs
        $seen = 1;
        $stmt = $conn->prepare('INSERT INTO queue (url, title, seen) VALUES(?, ?, ?)');
        $stmt->bind_param("ssi", $link, $subject, $seen);
        $stmt->execute();
        // Process each newly identified unique post into Flarum using the API
        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, 'https://hub.phenomlab.net/api/discussions');
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
        curl_setopt($ch, CURLOPT_POST, 22);
        curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode((array(
            'data' => array(
                'type' => "discussions",
                'attributes' => array(
                    'title' => "$subject",
                    'content' => "$body",
                ),
                'relationships' => array(
                    'tags' => array(
                        'data' => array(
                            array(
                                'type' => 'tags',
                                'id' => "23",
                            ),
                        ),
                    ),
                ),
            ),
        ))));
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
        $result = curl_exec($ch);
        echo $result;
        //$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
        //$status = $subject . ' ' . $link . ' #infosec #security #technology #phenomlab';
        //$post_tweets = $connection->post("statuses/update", ["status" => $status]);
    }
    // Item has already been processed. Continue loop until count exhausted
    else {
        echo "Checking " . $checklink . "\nLine item already processed - \033[33m[Ignored]\n\033[0m";
    }
}

Important notes

@$max = $argv[2]; is the number of RSS items that the script will parse for each resource URL

curl_setopt($ch, CURLOPT_POST, 22); - “22” in this case is the ID of the user I want to post as. This user needs admin rights.

 array(
'type' => 'tags',
 'id' => "23"
)

This array tells the Flarum API in which tag to post. In this case, “23” is the ID of the “news” tag.

Test it !

To test your script to ensure it’s working, run from the CLI and the working directory of where your files are located. Note, that the RSS URL will need to change to the one you’re interested in targeting, and the number afterwards is the amount of articles you want to pull at once.

php rssparser.php http://feeds.bbci.co.uk/news/rss.xml 10

Watch for the output on the screen. The first time this is run, the script will create posts for all new RSS feeds it has no reference for. Whilst each post item is created, the “feed” database is populated so that subsequent runs are not duplicated.

Now what ?

I have this rssparser.php scheduled to run every hour.

Enjoy - let me know if you have any issues getting this to work.

sudonix

Creating posts from RSS feeds in Flarum

Installation

Configuration

Install SimplePie

Create storage DB

Create credentials file

Create the RSS parser script

Important notes

Test it !

Now what ?

Related Topics

What's your view on RSS - is it dead technology?

Flarum itself can not run on 4 core 6gb ram vps!

Flarum - WordPress Journey

how to change flarum configuration from apache to nginx?

moving flarum from sub directory to subdomain

move out from flarum to wordpress

Why is the phenomlab/fancybox-wrapper repo removed?

Just obtained a new SSL certificate, but the browser shows connection is not secure

	CREATE TABLE `queue` (
	`id` bigint(20) NOT NULL,
	`url` varchar(500) NOT NULL,
	`title` varchar(500) NOT NULL,
	`seen` int(1) NOT NULL DEFAULT 0
	) ENGINE=InnoDB DEFAULT CHARSET=utf8;

	ALTER TABLE `queue`
	ADD PRIMARY KEY (`id`),
	ADD KEY `title` (`title`),
	ADD KEY `url` (`url`);

	ALTER TABLE `queue`
	MODIFY `id` bigint(20) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=1;
	COMMIT;

	<?php

	// Variables for posting to Twitter
	define('CONSUMER_KEY', 'YOUR_KEY');
	define('CONSUMER_SECRET', 'YOUR_SECRET');
	define('ACCESS_TOKEN', 'YOUR_ACCESS_TOKEN');
	define('ACCESS_TOKEN_SECRET', 'YOUR_ACCESS_TOKEN_SECRET);

	$header = array(
	"Authorization: Token THE_TOKEN_YOU_GENERATED_EARLIER",
	"Content-Type: application/json",
	);

	// Create DB connection
	$servername = "localhost";
	$login = "YOUR_DB_USER";
	$dbpw = "YOUR_DB_PASSWORD";
	$dbname = "feed";
	$conn = new mysqli($servername, $login, $dbpw, $dbname);

	// Check connection
	if ($conn->connect_error) {
	die("Connection failed: " . $conn->connect_error);
	} else {
	echo "Connected to database\n";
	}
	?>

	<?php
	//use Abraham\TwitterOAuth\TwitterOAuth;
	@$url = $argv[1];
	@$max = $argv[2];
	if (!$url) {
	die("\n *** You must provide a URL to process ***\n");
	}
	if (!$max) {
	die("\n *** You must provide a quantity to process ***\n");
	}
	include "details.php";
	require 'vendor/autoload.php';
	$feed = new SimplePie();
	$feed->enable_cache();
	$feed->set_cache_location("/home/phenomlab/system/.cache");
	$feed->force_feed();
	$feed->set_timeout(30);
	$feed->set_feed_url("$url");
	$feed->init();
	$feed->handle_content_type();
	$feed->enable_order_by_date(true);
	$number = $feed->get_item_quantity($max);
	foreach ($feed->get_items(0, $number) as $items) {
	echo "\033[32m\nProcessing story \| " . $items->get_title() . "\n\033[0m";
	$description = str_replace("View Entire Post &rsaquo;", "", $items->get_description());
	$description = str_replace("<img", "\n\n<img", $items->get_description());
	$description = str_replace('<img src="', '', $items->get_description());
	$description = str_replace('" />', '', $items->get_description());
	$description = strip_tags(html_entity_decode($items->get_description()), "<img>") . "\n";
	$description .= "\n" . '[Link to original article](' . $items->get_link() . ')' . "\n\n";
	//echo 'Description: ' . $description . "\n";
	$content = $items->get_content(true);
	//echo '[Link to original article](' .$item->get_link() . ')'."\n";
	// Define variables for use later on in the script
	$subject = $items->get_title();
	$body = trim($description);
	$link = $items->get_link();
	// Query the database for each item. Perform action based on results
	$stmt = $conn->prepare('SELECT url, seen FROM queue WHERE url = ?');
	$stmt->bind_param('s', $link);
	$stmt->execute();
	$stmt->store_result();
	$stmt->bind_result($checklink, $seen);
	$stmt->fetch();
	// Test to see if we have processed these before. If we have, skip them to avoid duplicates
	if (!$checklink \|\| !$seen) {
	echo "Checking " . $link . " \nLine item does not exist - \033[32m\[Processing]\n\033[0m ";
	// Processing new items. Insert record into database to prevent duplication on subsequent processing runs
	$seen = 1;
	$stmt = $conn->prepare('INSERT INTO queue (url, title, seen) VALUES(?, ?, ?)');
	$stmt->bind_param("ssi", $link, $subject, $seen);
	$stmt->execute();
	// Process each newly identified unique post into Flarum using the API
	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, 'https://hub.phenomlab.net/api/discussions');
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
	curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
	curl_setopt($ch, CURLOPT_POST, 22);
	curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode((array(
	'data' => array(
	'type' => "discussions",
	'attributes' => array(
	'title' => "$subject",
	'content' => "$body",
	),
	'relationships' => array(
	'tags' => array(
	'data' => array(
	array(
	'type' => 'tags',
	'id' => "23",
	),
	),
	),
	),
	),
	))));
	curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
	$result = curl_exec($ch);
	echo $result;
	//$connection = new TwitterOAuth(CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, ACCESS_TOKEN_SECRET);
	//$status = $subject . ' ' . $link . ' #infosec #security #technology #phenomlab';
	//$post_tweets = $connection->post("statuses/update", ["status" => $status]);
	}
	// Item has already been processed. Continue loop until count exhausted
	else {
	echo "Checking " . $checklink . "\nLine item already processed - \033[33m[Ignored]\n\033[0m";
	}
	}

sudonix

Creating posts from RSS feeds in Flarum

Installation

Configuration

Install SimplePie

Create storage DB

Create credentials file

Create the RSS parser script

Important notes

Test it !

Now what ?

Related Topics

Individual Categories