Hello @phenomlab
My VPS (NodeBB + MongoDB + Redis + nginx + Webmin, single Hetzner box) had repeated multi-minute interruptions: RAM and swap both at 100%, disk I/O spiking, CPU pinned, swap thrashing, not hardware. Kernel logs showed the OOM killer firing repeatedly, always against the ogproxy.service cgroup. The OGProxy Node process ballooned to ~5 GB RSS within minutes before being killed, dragging the whole box into thrashing. MongoDB was fine (~650 MB). Application-level problem in OGProxy, no hosting ticket warranted.
Root causes
No download limit or timeout on ogs({ url }), a single link could pull gigabytes into memory.
Unbounded, never-expiring cache, cache.put(url, results) stored the full ogs object including the entire page HTML (results.html) forever. The main leak.
Dead code, favicon/MetaParser/cheerio block gated on if (results.data && ...), but ogs v6’s root key is result, not data, so it never ran. Removed with its unused imports.
Client-side hammering, the ACP script re-runs previewLinks() on every ajaxify/posts/chat/composer event; failing links were never removed or cached, so the same URL was re-scraped ~50×/min.
Fixes applied
systemd guard rails:
MemoryMax=512M / MemoryHigh=400M, if it ever leaks again, only OGProxy is killed (auto-restarts), not the whole box.
Network egress guard (IPAddressAllow/IPAddressDeny), blocks OGProxy from reaching private ranges and cloud metadata at the kernel level, even via a redirect. Loopback 127.0.0.1 stays allowed because nginx reverse-proxies to 127.0.0.1:2000, and 127.0.0.53/54 stays allowed for the systemd-resolved DNS stub. Verified: legitimate fetches work, 169.254.169.254 is blocked.
Server (server.js
timeout: 15s + downloadLimit: 5 MB, caps memory per request.
redirect: 'follow' with follow: 3, many sites 301/302; without following them they failed with a misleading “Connect Timeout Error”. Hop count bounded to limit SSRF surface.
Browser-like User-Agent + Accept headers.
Success cache 1 h, negative cache 10 min (kills hammering server-side too), 1000-entry cap.
Full error logging (error.result.error + HTTP status; ogs rejects with an object, not an Error).
App-level SSRF guard: static host check + DNS-resolution check (blocks hostnames resolving to private IPs, IPv4 + IPv6).
API key via process.env.OGPROXY_API_KEY with inline fallback.
Client (NodeBB ACP):
:not(.og-processed) on every selector; og-processed class set before the AJAX call so a link is never re-scraped whether it succeeds or fails (root-cause fix for hammering); the generated card’s <a> carries it too.
ignoredHosts now uses bare hostnames and shouldIgnoreDomain compares against the browser-resolved hostname (reliable for relative and absolute hrefs), reliably excludes forum-internal links and the proxy’s own subdomain (no more self-scraping).
isFileUrl strips query string / fragment before testing the extension, so image.png?ssl=1 is detected as a file and not sent for preview.
Result
~5 GB spikes → stable ~25 MB. Hammering gone (each URL appears at most once). Redirect/timeout failures dropped sharply. SSRF closed at both app and kernel level.
Client server contract (unchanged, do not modify ACP response handling)
Server returns the native ogs object on success → data.result / data.html reach the client intact. Failures return HTTP ≥ 400 → handled by the client’s error: callback. The internal __ogproxyFail negative-cache marker never reaches the client. Preview card images (og:image) are unaffected, the isFileUrl change only affects direct-image links, not the images shown inside cards.
Expected remaining failures (not bugs)
Some sites return 400/500 no matter what: Facebook, Reddit (blocks unauthenticated scraping), press sites with cookie/WAF walls, fb.watch (redirect chains > 3), and direct image links with no extension in the path (ogs correctly rejects non-HTML). These degrade gracefully, link stays clickable, negative-cached, never re-scraped. Decision: leave as-is; chasing each anti-bot site isn’t worth fragile workarounds or extra load.
Optional follow-ups (no urgency)
Move the API key fully to the env var (it’s already browser-exposed client-side, so not a real secret, but avoids duplicating it in source).
Run OGProxy under a dedicated non-root user (independent security gain; already heavily mitigated by the kernel network guard).
Files changed
/home/xxxxxxxx/domains/proxy.xxxxxx.xxx/ogproxy/server.js
/etc/systemd/system/ogproxy.service
NodeBB ACP custom JS (link-preview script)
server.js
const express = require('express');
const ogs = require('open-graph-scraper');
const cors = require('cors');
const { URL } = require('url');
const cache = require('memory-cache');
const dns = require('dns').promises;
const net = require('net');
const app = express();
const port = 2000;
// API key from environment, fallback to inline value for compatibility
const apiKey = process.env.OGPROXY_API_KEY || 'YOUR_API_KEY';
// --- Limits / safeguards ---
const REQUEST_TIMEOUT = 15000; // 15s max per fetch
const MAX_CONTENT_BYTES = 5 * 1024 * 1024; // 5 MB max downloaded page
const CACHE_TTL_MS = 60 * 60 * 1000; // success cache: 1h
const FAIL_CACHE_TTL_MS = 10 * 60 * 1000; // negative cache: 10 min
const CACHE_MAX_ENTRIES = 1000; // max cached entries
const MAX_REDIRECTS = 3; // cap redirect hops
// Returns true if an IP string is private / loopback / link-local / reserved
function isBlockedIp(ip) {
if (!ip) return true;
if (net.isIPv4(ip)) {
const p = ip.split('.').map(Number);
if (p[0] === 10) return true;
if (p[0] === 127) return true;
if (p[0] === 0) return true;
if (p[0] === 169 && p[1] === 254) return true; // link-local / cloud metadata
if (p[0] === 192 && p[1] === 168) return true;
if (p[0] === 172 && p[1] >= 16 && p[1] <= 31) return true;
if (p[0] === 100 && p[1] >= 64 && p[1] <= 127) return true; // CGNAT
return false;
}
if (net.isIPv6(ip)) {
const v = ip.toLowerCase();
if (v === '::1') return true;
if (v.startsWith('fc') || v.startsWith('fd')) return true; // unique local
if (v.startsWith('fe80')) return true; // link-local
if (v.startsWith('::ffff:')) return isBlockedIp(v.split(':').pop()); // IPv4-mapped
return false;
}
return true; // not a valid IP -> block by default
}
// Static hostname guard (fast reject before any DNS work)
function isBlockedHost(hostname) {
if (!hostname) return true;
const h = hostname.toLowerCase();
return (
h === 'localhost' ||
h.endsWith('.localhost') ||
h.endsWith('.internal') ||
h.endsWith('.local') ||
(net.isIP(h) && isBlockedIp(h)) // literal IP in URL
);
}
// Resolve hostname and ensure no resolved IP is private (anti-SSRF via DNS)
async function resolvesToPublicIp(hostname) {
try {
const records = await dns.lookup(hostname, { all: true });
if (!records || records.length === 0) return false;
return records.every(r => !isBlockedIp(r.address));
} catch (e) {
return false; // DNS failure -> treat as unsafe
}
}
app.use(cors({ origin: 'https://YOURDOMAIN.EXT' }));
app.get('/ogproxy', async (req, res) => {
let { url } = req.query;
const requestApiKey = req.headers['x-api-key'];
if (requestApiKey !== apiKey) {
return res.status(401).send('Unauthorized');
}
if (!url || typeof url !== 'string') {
return res.status(400).send('Missing URL parameter');
}
if (!url.startsWith('http')) {
try {
url = new URL(url, `${req.protocol}://${req.get('host')}`).href;
} catch (e) {
return res.status(400).send('Invalid URL');
}
}
// Parse + protocol check
let parsedUrl;
try {
parsedUrl = new URL(url);
} catch (e) {
console.warn(`OGProxy reject [${url}]: invalid URL`);
return res.status(400).send('Invalid URL');
}
if (!['http:', 'https:'].includes(parsedUrl.protocol)) {
console.warn(`OGProxy reject [${url}]: invalid protocol`);
return res.status(400).send('Invalid protocol');
}
// Static host guard
if (isBlockedHost(parsedUrl.hostname)) {
console.warn(`OGProxy reject [${url}]: forbidden host (static guard)`);
return res.status(403).send('Forbidden host');
}
// Cache hit (success OR negative) — checked before DNS to stay fast
const cachedResult = cache.get(url);
if (cachedResult) {
if (cachedResult.__ogproxyFail === true) {
return res.status(500).send('Error scraping Open Graph data (cached)');
}
return res.json(cachedResult);
}
// DNS-based SSRF guard: make sure the hostname doesn't resolve to a private IP
if (!(await resolvesToPublicIp(parsedUrl.hostname))) {
console.warn(`OGProxy reject [${url}]: resolves to private IP or DNS fail (SSRF guard)`);
cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
return res.status(403).send('Forbidden host');
}
// ogs options: timeout + download limit + bounded redirects
const options = {
url,
timeout: REQUEST_TIMEOUT,
downloadLimit: MAX_CONTENT_BYTES,
fetchOptions: {
redirect: 'follow',
follow: MAX_REDIRECTS,
headers: {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
'Accept-Language': 'fr-FR,fr;q=0.9,en;q=0.8',
},
},
};
// Enforce cache cap before inserting a new entry
if (cache.keys().length >= CACHE_MAX_ENTRIES) {
cache.clear();
}
try {
const results = await ogs(options);
cache.put(url, results, CACHE_TTL_MS);
return res.json(results);
} catch (error) {
const reason =
(error && error.result && error.result.error) ||
(error && error.message) ||
'unknown';
const status =
(error && error.response && error.response.status) || 'n/a';
console.error(`OGProxy fail [${url}]: ${reason} (HTTP ${status})`);
cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
return res.status(500).send('Error scraping Open Graph data');
}
});
app.listen(port, () => {
console.log(`OGProxy server listening on port ${port}`);
});
NodeBB ACP client script
// ------------------------------------------
// OGPROXY
// ------------------------------------------
/// Function to preview links
function previewLinks() {
$(document).ready(function() {
// Set this flag to true to enable debug logging
var debug = false;
// Get all the links within the content class (posts) and chat, excluding mentions plugin links AND already-processed links
var links = $(".content a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), [component=\"chat/message/body\"] a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .preview-container a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .resolved-message a:not(.og-processed), .adhoc a:not(.og-processed)");
// List of bare hostnames to ignore (forum itself + the OGProxy subdomain, to avoid self-scraping)
var ignoredHosts = [
window.location.hostname,
"proxy.YOURDOMAIN.EXT"
];
// List of paths to ignore
var ignoredPaths = ['/post'];
if (debug) {
// Log the ignored hosts and paths
console.log("OGProxy: Hosts in the ignore list and will not be parsed: " + ignoredHosts.join(", "));
console.log("OGProxy: Paths containing " + ignoredPaths[0] + " are in the ignore list and will not be parsed.");
console.log("OGProxy: Parsing DOM for any URLs that should be converted to previews.");
}
// Iterate over each link
links.each(function() {
var link = $(this);
var url = link.attr("href");
var hostname = link.prop("hostname");
var text = $(this).text();
// Helper function to check if the URL is a file URL.
// Strip query string and fragment first so e.g. "image.png?ssl=1" is still detected.
function isFileUrl(url) {
if (!url) {
return false;
}
var cleanUrl = url.split('?')[0].split('#')[0];
var fileExtensionPattern = /\.(jpg|png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg|webp)$/i;
return fileExtensionPattern.test(cleanUrl);
}
function isFullPath(url) {
// Regular expression to match a full path URL
var fullPathRegex = /^(?:[a-z]+:)?\/\//i;
// Check if the URL matches the full path pattern
return fullPathRegex.test(url);
}
// Helper function to check if the domain should be ignored
// Uses the browser-resolved hostname (reliable even for relative hrefs)
function shouldIgnoreDomain(linkHostname, url, ignoredHosts) {
if (!linkHostname) {
return false;
}
// Ignore if it points to an ignored host AND hits an ignored path
if (ignoredPaths.some(function(path) { return url && url.includes(path); }) && ignoredHosts.includes(linkHostname)) {
return true;
}
// Ignore any link pointing to an ignored host (forum itself, proxy subdomain)
return ignoredHosts.includes(linkHostname);
}
// Helper function to extract the bare hostname from the URL (kept for compatibility)
function extractDomain(url) {
if (url) {
var domain = url.split('/')[2]?.split(':')[0];
return domain;
}
return null;
}
// Process the link if it's not a file URL, not in the ignored domain list, and it's the only content within its parent element
if (!isFileUrl(url) && !shouldIgnoreDomain(hostname, url, ignoredHosts) && link.parent().contents().length === 1) {
var host = window.location.protocol + "//" + hostname;
var faviconApi = "https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=" + host + "&size=32";
if (debug) {
console.log("OGProxy: Getting favicon for URL: " + url);
}
var website = link.prop("hostname");
var altSite = website.replace(/^www\./, "").replace(/\..+$/, "");
var proxy = "https://proxy.YOURDOMAIN.EXT";
var apiKey = "YOUR_API_KEY";
// Mark this link as processed BEFORE the request, so it is never re-scraped
// on subsequent ajaxify/posts.loaded/composer.preview events, whether the
// request succeeds or fails. This stops the request-hammering loop.
link.addClass('og-processed');
// Send an AJAX request to the proxy server to fetch OpenGraph data for the URL
$.ajax({
url: proxy + "/ogproxy?url=" + encodeURIComponent(url),
method: "GET",
headers: {
'X-Api-Key': apiKey
},
success: function(data) {
var result = data.result;
// Extract relevant data from the OpenGraph result or use fallback values
var rawTitle = $(data.html).filter('title').text();
var altTitle = $(result).filter('meta[property="og:title"]').attr('content');
var altDescription = $(result).filter('meta[property="og:description"]').attr('content');
var tempDescription = "This website did not return any description. It might be behind a login or paywall.";
var altImageUrl = $(result).filter('meta[property="og:image"]').attr('content');
//var tempImage = proxy + "/images/404_3.webp";
var tempImage = proxy + "/images/404.png";
var url = result.requestUrl || url;
var title = rawTitle || result.ogTitle || altTitle;
var description = result.ogDescription || altDescription || tempDescription;
var favicon = faviconApi || result.favicon || data.faviconUrl;
var imageUrl = result.ogImage && result.ogImage[0].url || altImageUrl || tempImage;
// Some websites return a relative path for the image URL, which isn't much use, so we need to change this to full
var fullImagePath = host + imageUrl;
var site = result.ogSiteName || altSite;
if (isFullPath(imageUrl) === false) {
imageUrl = fullImagePath;
}
// Test to see if image is broken in the preview card. This might be the result of hotlinking protection, so the image isn't
// rendered as a result. If this is the case, we replace it with the tempImage to keep things looking nice.
$(document).ready(function() {
$('#card-image img').on('error', function() {
// Image failed to load
// Add logic here to handle the broken image
if (debug) {
console.log("OGProxy: Broken image URL: " + imageUrl + " detected. Replacing with " + tempImage);
}
$(this).attr('src', tempImage); // Replace with a placeholder image
});
});
if (debug) {
console.log("OGProxy: Getting data from URL: " + url);
console.log("OGProxy: Getting image URL: " + imageUrl);
}
// Create the HTML for the link preview card
var cardHtml = '<div class="card card-wrapper og-processed">' +
'<a href="' + url + '" class="og-processed">' +
'<div class="card card-preview">' +
'<div class="card-image-container">' +
'<div id="card-image"><img src="' + imageUrl + '"></div>' +
'</div>' +
'<div class="card-body">' +
'<h4 id="sitetitle" class="card-site-title"><img id="favicon" class="card-favicon" src="' + favicon + '">' + site + '</h4>' +
'<h6 class="card-title">' + title + '</h6>' +
'<p class="card-text">' + truncateDescription(description, 150) + '</p>' +
'</div>' +
'</div>' +
'</div>' +
'</a>';
// Replace the original link with the link preview card
link.replaceWith(cardHtml);
},
error: function() {
if (debug) {
console.log("OGProxy: Error fetching OpenGraph data for URL: " + url);
}
// Link stays in the DOM but is already marked .og-processed,
// so it will not be retried on subsequent events.
}
});
}
});
});
}
// Helper function to truncate the description with ellipsis if it exceeds the specified limit
function truncateDescription(description, limit) {
if (description.length > limit) {
return description.substring(0, limit) + '...';
}
return description;
}
$(window).on('action:ajaxify.end', function(data) {
$(document).ready(function() {
previewLinks()
});
});
$(window).on('action:posts.loaded', function(data) {
$(document).ready(function() {
previewLinks()
});
});
$(window).on('action:posts.edited', function(data) {
$(document).ready(function() {
previewLinks()
});
});
/* TEST BUG */
/*
$(window).on('action:chat.loaded', function(data) {
$(document).ready(function() {
previewLinks()
});
});
*/
$(window).on('action:chat.received', function(data) {
$(document).ready(function() {
previewLinks()
});
});
$(window).on('action:composer.preview', function(data) {
$(document).ready(function() {
previewLinks()
});
});
systemd unit (/etc/systemd/system/ogproxy.service)
[Unit]
Description=OGProxy Server
After=network.target
[Service]
ExecStart=/usr/bin/node /home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy/server.js
WorkingDirectory=/home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy
Restart=always
RestartSec=3
RuntimeMaxSec=86400
# --- Memory safeguards ---
MemoryMax=512M
MemoryHigh=400M
# --- Network egress guard (anti-SSRF at kernel level) ---
# Allow loopback (nginx reverse-proxies here on 127.0.0.1:2000) + DNS stub resolver.
# Block all private ranges and cloud metadata so a redirect can't reach them.
# The app-level isBlockedHost() guard still rejects 127.0.0.1 on the initial URL.
IPAddressAllow=127.0.0.1 127.0.0.53 127.0.0.54
IPAddressDeny=10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 169.254.0.0/16 100.64.0.0/10 fc00::/7 fe80::/10
[Install]
WantedBy=multi-user.target