OGProxy - a replacement for iFramely
-
@DownPW sorry for the delay here - real life is getting in the way, but the latest release will be pushed out soon.
-
no problem.
I was waiting for this new version to change my server but I think I’m going to install the old one.Keep the good work bro

-
@DownPW sorry for the delay here - real life is getting in the way, but the latest release will be pushed out soon.
@DownPW sorry for the delay here - real life is getting in the way, but the latest release will be pushed out soon.
Is the new version still in development @phenomlab ?
I ask because personally, I think OG Proxy is really cool, but it uses a lot of RAM on the server, sometimes up to 2GB.
That’s quite a lot.Optimizing it in this area would be really great.
-
I have a pro account of Claude AI and i Tell him to optimize the code for the problem memory conssumtion)
– here his answer :
Root causes of the 2 GB RAM usage
1. Unbounded cache (
memory-cache)The original code stores every visited URL forever, with no cap and no reliable automatic expiration. On an active forum with thousands of unique links, this grows indefinitely.
Fix →
lru-cache: maximum 500 entries (configurable viaCACHE_MAX), automatic TTL expiration at 1 hour. Least-recently-used entries are evicted automatically.2. Redundant fetches per request
For every uncached URL, the original performs: OGS scrape + axios (full page) + cheerio + meta-parser + axios (favicon). If multiple users click the same link simultaneously, all of these are duplicated.
Fix → In-flight request deduplication (
inFlightMap): if 5 requests arrive foryoutube.com/xxxat the same time, only one fetch is fired — the other 4 await the same promise.3. Unbounded favicon buffering
The original downloads any favicon as an arraybuffer without checking its size first. A favicon URL pointing to a large image can load several MB into memory.
Fix → HEAD request first to check
Content-Length, then skip if > 50 KB (FAVICON_MAX). The HTML response body is also explicitly freed (htmlRes.data = null) once parsed.4. Axios with no timeout or size limit
Without a timeout, slow requests remain suspended in memory indefinitely. Without
maxContentLength, a 50 MB page is fully loaded into RAM.Fix →
REQ_TIMEOUT=8000msandMAX_BODY=2MBper outbound request.5. No compression
Without gzip, large JSON responses consume more bandwidth and remain in transit longer.
Fix →
compressionmiddleware.
Near-certain gains:
- The bounded LRU cache will prevent unbounded memory growth. This is most likely the main cause of the 2 GB if your forum has a large number of unique links.
- Timeouts will prevent stalled requests from sitting in memory indefinitely.
- Page and favicon size limits will eliminate pathological edge cases.
Traffic-dependent gains:
- In-flight request deduplication only helps if multiple users click the same link simultaneously — very useful on a busy forum, negligible on a small one.
What may not change:
- If the 2 GB comes from something else (a leak inside
open-graph-scraperitself, or Node.js not returning memory to the OS after objects are freed), this won’t fix everything. Node.js tends to hold onto allocated memory even after objects are released — this is normal V8 behaviour.
In that case, a simple complementary solution is to add an automatic nightly restart to the systemd service:
[Service] Restart=always RuntimeMaxSec=86400 # restart after 24h maxThis doesn’t fix the root cause but prevents accumulation over several days.
– The JS Code (server.js) :
"use strict"; /** * OGProxy - Optimized server * Improvements over original: * - LRU cache (bounded) instead of unbounded memory-cache * - Request deduplication (coalescing) to avoid redundant fetches * - Compression middleware * - Rate limiting * - Helmet security headers * - Axios timeout + max response size * - Favicon size cap + skip if too large * - Graceful shutdown * - Winston structured logging * - Joi input validation * - .env support * - Health check endpoint */ require("dotenv").config(); const express = require("express"); const cors = require("cors"); const helmet = require("helmet"); const compression = require("compression"); const rateLimit = require("express-rate-limit"); const morgan = require("morgan"); const winston = require("winston"); const { LRUCache }= require("lru-cache"); const Joi = require("joi"); const axios = require("axios"); const ogs = require("open-graph-scraper"); const cheerio = require("cheerio"); const metaParser = require("meta-parser"); const { URL } = require("url"); const path = require("path"); // ─── Config ──────────────────────────────────────────────────────────────── const PORT = parseInt(process.env.PORT || "2000", 10); const API_KEY = process.env.API_KEY || "YOUR_API_KEY_HERE"; const ORIGIN = process.env.ORIGIN || "https://your-forum.example.com"; const CACHE_MAX = parseInt(process.env.CACHE_MAX || "500", 10); // max entries const CACHE_TTL = parseInt(process.env.CACHE_TTL || "3600", 10) * 1000; // ms (default 1h) const REQ_TIMEOUT = parseInt(process.env.REQ_TIMEOUT || "8000", 10); // ms per outbound request const MAX_BODY = parseInt(process.env.MAX_BODY || "2", 10) * 1024 * 1024; // MB → bytes const FAVICON_MAX = parseInt(process.env.FAVICON_MAX || "51200",10); // bytes (50 KB) const RATE_WINDOW = parseInt(process.env.RATE_WINDOW || "60", 10) * 1000; // ms const RATE_LIMIT = parseInt(process.env.RATE_LIMIT || "30", 10); // req per window // ─── Logger ──────────────────────────────────────────────────────────────── const logger = winston.createLogger({ level: process.env.LOG_LEVEL || "info", format: winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() ), transports: [ new winston.transports.Console({ format: winston.format.simple() }), new winston.transports.File({ filename: "error.log", level: "error" }), new winston.transports.File({ filename: "access.log" }), ], }); // ─── LRU Cache (bounded) ─────────────────────────────────────────────────── const cache = new LRUCache({ max: CACHE_MAX, // max number of items ttl: CACHE_TTL, // auto-expire entries updateAgeOnGet: false, }); // ─── In-flight request deduplication ────────────────────────────────────── // If two requests arrive for the same URL before the first completes, // the second one waits for the first promise instead of spawning a new fetch. const inFlight = new Map(); // url → Promise // ─── Axios instance (shared, keep-alive, bounded) ───────────────────────── const http = require("http"); const https = require("https"); const axiosInstance = axios.create({ timeout: REQ_TIMEOUT, maxContentLength: MAX_BODY, maxBodyLength: MAX_BODY, headers: { "User-Agent": "Mozilla/5.0 (compatible; OGProxy/2.0; +https://github.com/phenomlab/ogproxy)", "Accept-Language": "en-US,en;q=0.9", }, httpAgent: new http.Agent ({ keepAlive: true, maxSockets: 10 }), httpsAgent: new https.Agent({ keepAlive: true, maxSockets: 10 }), }); // ─── Input validation ───────────────────────────────────────────────────── const urlSchema = Joi.object({ url: Joi.string().uri({ scheme: ["http", "https"] }).required(), }); // ─── App setup ──────────────────────────────────────────────────────────── const app = express(); // Security headers app.use(helmet({ contentSecurityPolicy: false })); // CORS app.use(cors({ origin: ORIGIN })); // Gzip compression app.use(compression()); // HTTP request logging (stream to winston) app.use(morgan("combined", { stream: { write: (msg) => logger.info(msg.trim()) }, })); // Rate limiting per IP app.use( "/ogproxy", rateLimit({ windowMs: RATE_WINDOW, max: RATE_LIMIT, standardHeaders: true, legacyHeaders: false, message: { error: "Too many requests, please try again later." }, }) ); // Static images directory app.use("/images", express.static(path.join(__dirname, "images"))); // ─── Health check ───────────────────────────────────────────────────────── app.get("/health", (_req, res) => { res.json({ status: "ok", uptime: process.uptime(), cacheSize: cache.size, inFlight: inFlight.size, memory: process.memoryUsage(), }); }); // ─── API key middleware ──────────────────────────────────────────────────── function requireApiKey(req, res, next) { const key = req.headers["x-api-key"]; if (!key || key !== API_KEY) { return res.status(401).json({ error: "Unauthorized" }); } next(); } // ─── Core fetch logic ───────────────────────────────────────────────────── async function fetchOGData(targetUrl) { // 1. OGS scrape const { result: ogsResult, error: ogsError } = await ogs({ url: targetUrl }); if (ogsError) throw new Error(`OGS failed: ${ogsError}`); // 2. Fetch HTML (with size cap) let metadata = {}; let faviconB64 = null; try { const htmlRes = await axiosInstance.get(targetUrl, { responseType: "text", decompress: true, }); const $ = cheerio.load(htmlRes.data); // meta-parser on the raw HTML try { const parsed = metaParser(htmlRes.data); metadata = parsed || {}; } catch (_) { /* non-fatal */ } // 3. Favicon – only fetch if small enough let faviconHref = $('link[rel="icon"]').attr("href") || $('link[rel="shortcut icon"]').attr("href"); if (faviconHref) { try { const base = new URL(targetUrl); const faviconUrl = new URL(faviconHref, base).href; // HEAD first to check Content-Length before downloading const headRes = await axiosInstance.head(faviconUrl).catch(() => null); const contentLength = headRes ? parseInt(headRes.headers["content-length"] || "0", 10) : 0; if (contentLength === 0 || contentLength <= FAVICON_MAX) { const iconRes = await axiosInstance.get(faviconUrl, { responseType: "arraybuffer", maxContentLength: FAVICON_MAX, }); const mime = iconRes.headers["content-type"] || "image/x-icon"; faviconB64 = `data:${mime};base64,` + Buffer.from(iconRes.data).toString("base64"); } else { logger.info(`Favicon too large (${contentLength}B), skipping: ${faviconUrl}`); } } catch (err) { logger.warn("Favicon fetch failed", { url: targetUrl, err: err.message }); } } // Free the HTML string early htmlRes.data = null; } catch (err) { logger.warn("HTML fetch failed (using OGS only)", { url: targetUrl, err: err.message, }); } return { ...ogsResult, metaProperties: metadata, faviconUrl: faviconB64, }; } // ─── /ogproxy route ─────────────────────────────────────────────────────── app.get("/ogproxy", requireApiKey, async (req, res, next) => { try { // Validate input let { url: targetUrl } = req.query; const { error } = urlSchema.validate({ url: targetUrl }); if (error) { return res .status(400) .json({ error: `Invalid URL: ${error.details[0].message}` }); } // Normalise (strip trailing slash etc.) targetUrl = new URL(targetUrl).href; // Cache hit? const cached = cache.get(targetUrl); if (cached) { res.setHeader("X-Cache", "HIT"); return res.json(cached); } res.setHeader("X-Cache", "MISS"); // Deduplicate concurrent requests for the same URL if (!inFlight.has(targetUrl)) { const promise = fetchOGData(targetUrl) .then((data) => { cache.set(targetUrl, data); return data; }) .finally(() => inFlight.delete(targetUrl)); inFlight.set(targetUrl, promise); } const data = await inFlight.get(targetUrl); return res.json(data); } catch (err) { next(err); } }); // ─── Centralised error handler ──────────────────────────────────────────── // eslint-disable-next-line no-unused-vars app.use((err, _req, res, _next) => { logger.error("Unhandled error", { message: err.message, stack: err.stack }); res.status(500).json({ error: "Internal server error" }); }); // ─── Start ──────────────────────────────────────────────────────────────── const server = app.listen(PORT, () => { logger.info(`OGProxy listening on port ${PORT}`); logger.info(`Cache: max=${CACHE_MAX} entries, TTL=${CACHE_TTL / 1000}s`); }); // ─── Graceful shutdown ──────────────────────────────────────────────────── function shutdown(signal) { logger.info(`${signal} received – shutting down gracefully`); server.close(() => { logger.info("HTTP server closed"); process.exit(0); }); // Force exit after 10 s if still busy setTimeout(() => { logger.warn("Forcing exit after timeout"); process.exit(1); }, 10_000).unref(); } process.on("SIGTERM", () => shutdown("SIGTERM")); process.on("SIGINT", () => shutdown("SIGINT")); process.on("uncaughtException", (err) => { logger.error("Uncaught exception", { err }); shutdown("uncaughtException"); }); process.on("unhandledRejection", (reason) => { logger.error("Unhandled rejection", { reason }); });
–> What do you think of this code?
-
I have a pro account of Claude AI and i Tell him to optimize the code for the problem memory conssumtion)
– here his answer :
Root causes of the 2 GB RAM usage
1. Unbounded cache (
memory-cache)The original code stores every visited URL forever, with no cap and no reliable automatic expiration. On an active forum with thousands of unique links, this grows indefinitely.
Fix →
lru-cache: maximum 500 entries (configurable viaCACHE_MAX), automatic TTL expiration at 1 hour. Least-recently-used entries are evicted automatically.2. Redundant fetches per request
For every uncached URL, the original performs: OGS scrape + axios (full page) + cheerio + meta-parser + axios (favicon). If multiple users click the same link simultaneously, all of these are duplicated.
Fix → In-flight request deduplication (
inFlightMap): if 5 requests arrive foryoutube.com/xxxat the same time, only one fetch is fired — the other 4 await the same promise.3. Unbounded favicon buffering
The original downloads any favicon as an arraybuffer without checking its size first. A favicon URL pointing to a large image can load several MB into memory.
Fix → HEAD request first to check
Content-Length, then skip if > 50 KB (FAVICON_MAX). The HTML response body is also explicitly freed (htmlRes.data = null) once parsed.4. Axios with no timeout or size limit
Without a timeout, slow requests remain suspended in memory indefinitely. Without
maxContentLength, a 50 MB page is fully loaded into RAM.Fix →
REQ_TIMEOUT=8000msandMAX_BODY=2MBper outbound request.5. No compression
Without gzip, large JSON responses consume more bandwidth and remain in transit longer.
Fix →
compressionmiddleware.
Near-certain gains:
- The bounded LRU cache will prevent unbounded memory growth. This is most likely the main cause of the 2 GB if your forum has a large number of unique links.
- Timeouts will prevent stalled requests from sitting in memory indefinitely.
- Page and favicon size limits will eliminate pathological edge cases.
Traffic-dependent gains:
- In-flight request deduplication only helps if multiple users click the same link simultaneously — very useful on a busy forum, negligible on a small one.
What may not change:
- If the 2 GB comes from something else (a leak inside
open-graph-scraperitself, or Node.js not returning memory to the OS after objects are freed), this won’t fix everything. Node.js tends to hold onto allocated memory even after objects are released — this is normal V8 behaviour.
In that case, a simple complementary solution is to add an automatic nightly restart to the systemd service:
[Service] Restart=always RuntimeMaxSec=86400 # restart after 24h maxThis doesn’t fix the root cause but prevents accumulation over several days.
– The JS Code (server.js) :
"use strict"; /** * OGProxy - Optimized server * Improvements over original: * - LRU cache (bounded) instead of unbounded memory-cache * - Request deduplication (coalescing) to avoid redundant fetches * - Compression middleware * - Rate limiting * - Helmet security headers * - Axios timeout + max response size * - Favicon size cap + skip if too large * - Graceful shutdown * - Winston structured logging * - Joi input validation * - .env support * - Health check endpoint */ require("dotenv").config(); const express = require("express"); const cors = require("cors"); const helmet = require("helmet"); const compression = require("compression"); const rateLimit = require("express-rate-limit"); const morgan = require("morgan"); const winston = require("winston"); const { LRUCache }= require("lru-cache"); const Joi = require("joi"); const axios = require("axios"); const ogs = require("open-graph-scraper"); const cheerio = require("cheerio"); const metaParser = require("meta-parser"); const { URL } = require("url"); const path = require("path"); // ─── Config ──────────────────────────────────────────────────────────────── const PORT = parseInt(process.env.PORT || "2000", 10); const API_KEY = process.env.API_KEY || "YOUR_API_KEY_HERE"; const ORIGIN = process.env.ORIGIN || "https://your-forum.example.com"; const CACHE_MAX = parseInt(process.env.CACHE_MAX || "500", 10); // max entries const CACHE_TTL = parseInt(process.env.CACHE_TTL || "3600", 10) * 1000; // ms (default 1h) const REQ_TIMEOUT = parseInt(process.env.REQ_TIMEOUT || "8000", 10); // ms per outbound request const MAX_BODY = parseInt(process.env.MAX_BODY || "2", 10) * 1024 * 1024; // MB → bytes const FAVICON_MAX = parseInt(process.env.FAVICON_MAX || "51200",10); // bytes (50 KB) const RATE_WINDOW = parseInt(process.env.RATE_WINDOW || "60", 10) * 1000; // ms const RATE_LIMIT = parseInt(process.env.RATE_LIMIT || "30", 10); // req per window // ─── Logger ──────────────────────────────────────────────────────────────── const logger = winston.createLogger({ level: process.env.LOG_LEVEL || "info", format: winston.format.combine( winston.format.timestamp(), winston.format.errors({ stack: true }), winston.format.json() ), transports: [ new winston.transports.Console({ format: winston.format.simple() }), new winston.transports.File({ filename: "error.log", level: "error" }), new winston.transports.File({ filename: "access.log" }), ], }); // ─── LRU Cache (bounded) ─────────────────────────────────────────────────── const cache = new LRUCache({ max: CACHE_MAX, // max number of items ttl: CACHE_TTL, // auto-expire entries updateAgeOnGet: false, }); // ─── In-flight request deduplication ────────────────────────────────────── // If two requests arrive for the same URL before the first completes, // the second one waits for the first promise instead of spawning a new fetch. const inFlight = new Map(); // url → Promise // ─── Axios instance (shared, keep-alive, bounded) ───────────────────────── const http = require("http"); const https = require("https"); const axiosInstance = axios.create({ timeout: REQ_TIMEOUT, maxContentLength: MAX_BODY, maxBodyLength: MAX_BODY, headers: { "User-Agent": "Mozilla/5.0 (compatible; OGProxy/2.0; +https://github.com/phenomlab/ogproxy)", "Accept-Language": "en-US,en;q=0.9", }, httpAgent: new http.Agent ({ keepAlive: true, maxSockets: 10 }), httpsAgent: new https.Agent({ keepAlive: true, maxSockets: 10 }), }); // ─── Input validation ───────────────────────────────────────────────────── const urlSchema = Joi.object({ url: Joi.string().uri({ scheme: ["http", "https"] }).required(), }); // ─── App setup ──────────────────────────────────────────────────────────── const app = express(); // Security headers app.use(helmet({ contentSecurityPolicy: false })); // CORS app.use(cors({ origin: ORIGIN })); // Gzip compression app.use(compression()); // HTTP request logging (stream to winston) app.use(morgan("combined", { stream: { write: (msg) => logger.info(msg.trim()) }, })); // Rate limiting per IP app.use( "/ogproxy", rateLimit({ windowMs: RATE_WINDOW, max: RATE_LIMIT, standardHeaders: true, legacyHeaders: false, message: { error: "Too many requests, please try again later." }, }) ); // Static images directory app.use("/images", express.static(path.join(__dirname, "images"))); // ─── Health check ───────────────────────────────────────────────────────── app.get("/health", (_req, res) => { res.json({ status: "ok", uptime: process.uptime(), cacheSize: cache.size, inFlight: inFlight.size, memory: process.memoryUsage(), }); }); // ─── API key middleware ──────────────────────────────────────────────────── function requireApiKey(req, res, next) { const key = req.headers["x-api-key"]; if (!key || key !== API_KEY) { return res.status(401).json({ error: "Unauthorized" }); } next(); } // ─── Core fetch logic ───────────────────────────────────────────────────── async function fetchOGData(targetUrl) { // 1. OGS scrape const { result: ogsResult, error: ogsError } = await ogs({ url: targetUrl }); if (ogsError) throw new Error(`OGS failed: ${ogsError}`); // 2. Fetch HTML (with size cap) let metadata = {}; let faviconB64 = null; try { const htmlRes = await axiosInstance.get(targetUrl, { responseType: "text", decompress: true, }); const $ = cheerio.load(htmlRes.data); // meta-parser on the raw HTML try { const parsed = metaParser(htmlRes.data); metadata = parsed || {}; } catch (_) { /* non-fatal */ } // 3. Favicon – only fetch if small enough let faviconHref = $('link[rel="icon"]').attr("href") || $('link[rel="shortcut icon"]').attr("href"); if (faviconHref) { try { const base = new URL(targetUrl); const faviconUrl = new URL(faviconHref, base).href; // HEAD first to check Content-Length before downloading const headRes = await axiosInstance.head(faviconUrl).catch(() => null); const contentLength = headRes ? parseInt(headRes.headers["content-length"] || "0", 10) : 0; if (contentLength === 0 || contentLength <= FAVICON_MAX) { const iconRes = await axiosInstance.get(faviconUrl, { responseType: "arraybuffer", maxContentLength: FAVICON_MAX, }); const mime = iconRes.headers["content-type"] || "image/x-icon"; faviconB64 = `data:${mime};base64,` + Buffer.from(iconRes.data).toString("base64"); } else { logger.info(`Favicon too large (${contentLength}B), skipping: ${faviconUrl}`); } } catch (err) { logger.warn("Favicon fetch failed", { url: targetUrl, err: err.message }); } } // Free the HTML string early htmlRes.data = null; } catch (err) { logger.warn("HTML fetch failed (using OGS only)", { url: targetUrl, err: err.message, }); } return { ...ogsResult, metaProperties: metadata, faviconUrl: faviconB64, }; } // ─── /ogproxy route ─────────────────────────────────────────────────────── app.get("/ogproxy", requireApiKey, async (req, res, next) => { try { // Validate input let { url: targetUrl } = req.query; const { error } = urlSchema.validate({ url: targetUrl }); if (error) { return res .status(400) .json({ error: `Invalid URL: ${error.details[0].message}` }); } // Normalise (strip trailing slash etc.) targetUrl = new URL(targetUrl).href; // Cache hit? const cached = cache.get(targetUrl); if (cached) { res.setHeader("X-Cache", "HIT"); return res.json(cached); } res.setHeader("X-Cache", "MISS"); // Deduplicate concurrent requests for the same URL if (!inFlight.has(targetUrl)) { const promise = fetchOGData(targetUrl) .then((data) => { cache.set(targetUrl, data); return data; }) .finally(() => inFlight.delete(targetUrl)); inFlight.set(targetUrl, promise); } const data = await inFlight.get(targetUrl); return res.json(data); } catch (err) { next(err); } }); // ─── Centralised error handler ──────────────────────────────────────────── // eslint-disable-next-line no-unused-vars app.use((err, _req, res, _next) => { logger.error("Unhandled error", { message: err.message, stack: err.stack }); res.status(500).json({ error: "Internal server error" }); }); // ─── Start ──────────────────────────────────────────────────────────────── const server = app.listen(PORT, () => { logger.info(`OGProxy listening on port ${PORT}`); logger.info(`Cache: max=${CACHE_MAX} entries, TTL=${CACHE_TTL / 1000}s`); }); // ─── Graceful shutdown ──────────────────────────────────────────────────── function shutdown(signal) { logger.info(`${signal} received – shutting down gracefully`); server.close(() => { logger.info("HTTP server closed"); process.exit(0); }); // Force exit after 10 s if still busy setTimeout(() => { logger.warn("Forcing exit after timeout"); process.exit(1); }, 10_000).unref(); } process.on("SIGTERM", () => shutdown("SIGTERM")); process.on("SIGINT", () => shutdown("SIGINT")); process.on("uncaughtException", (err) => { logger.error("Uncaught exception", { err }); shutdown("uncaughtException"); }); process.on("unhandledRejection", (reason) => { logger.error("Unhandled rejection", { reason }); });
–> What do you think of this code?
@DownPW yes, I’ve noticed similar and it is to do with the cache because it is stored in memory and not on disk…
However, when I wrote the code, I did allow for a maximum cache age so I’m puzzled as to why this isn’t being honoured.
A review of the code is long overdue to be honest, but it’s slipped down the priority list. The fastest way to resolve this in the short term is to perform a daily restart of the service which will flush the cache.
-
Don’t hesitate to tell me what you think of this modified code.
Until next time.
-
@DownPW Looks ok for the most part, but this looks like it’s truncated?
return { ...ogsResult, metaProperties: metadata, faviconUrl: faviconB64, }; } -
@DownPW Looks ok for the most part, but this looks like it’s truncated?
return { ...ogsResult, metaProperties: metadata, faviconUrl: faviconB64, }; }@DownPW Looks ok for the most part, but this looks like it’s truncated?
return { ...ogsResult, metaProperties: metadata, faviconUrl: faviconB64, }; }exactly, i must see the code, i tell you soon
-
I was experiencing 500 (Internal Server Error) responses from the proxy, visible in the browser console:
GET https://proxy.xxx-xxx.net/ogproxy?url=https%3A%2F%2Fzupimages.net%2Fup%2F26%2F16%2Fld8h.jpg 500 (Internal Server Error)After investigation, I found two root causes:
1. Direct image URLs being sent to the proxy
The custom JavaScript responsible for detecting links and sending them to the proxy was using the following regex to exclude direct image links:
var fileExtensionPattern = /\.(png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg)$/i;Note that .jpg and .webp were missing from the pattern. As a result, links ending in .jpg were not recognized as direct image URLs and were forwarded to the OGProxy, which then tried to scrape them as web pages using open-graph-scraper — causing a 500 error.
The fix was to add the missing extensions:
var fileExtensionPattern = /\.(jpg|png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg|webp)$/i;2. The proxy not following HTTP redirects
Some image hosting services (e.g. zupimages.net) return a 301 redirect from the bare domain to www. When curl follows the redirect manually the image loads fine:
curl -IL https://zupimages.net/up/26/16/ld8h.jpg HTTP/2 301 → https://www.zupimages.net/up/26/16/ld8h.jpg HTTP/2 200However, the proxy’s axios.get() call does not handle this gracefully when open-graph-scraper is involved, resulting in a 500 error being returned to the client.
My questions are:
- Is there a known best practice for handling redirect chains in open-graph-scraper?
- Would passing maxRedirects or followRedirect options explicitly to axios or ogs fix this reliably?
- Is there a cleaner way to pre-filter direct image/file URLs before they reach the proxy, ideally at the NodeBB plugin level rather than in custom JS?
Thanks in advance.
-
Hello @phenomlab
My VPS (NodeBB + MongoDB + Redis + nginx + Webmin, single Hetzner box) had repeated multi-minute interruptions: RAM and swap both at 100%, disk I/O spiking, CPU pinned, swap thrashing, not hardware. Kernel logs showed the OOM killer firing repeatedly, always against the
ogproxy.servicecgroup. The OGProxy Node process ballooned to ~5 GB RSS within minutes before being killed, dragging the whole box into thrashing. MongoDB was fine (~650 MB). Application-level problem in OGProxy, no hosting ticket warranted.Root causes
- No download limit or timeout on
ogs({ url }), a single link could pull gigabytes into memory. - Unbounded, never-expiring cache,
cache.put(url, results)stored the full ogs object including the entire page HTML (results.html) forever. The main leak. - Dead code, favicon/MetaParser/cheerio block gated on
if (results.data && ...), but ogs v6’s root key isresult, notdata, so it never ran. Removed with its unused imports. - Client-side hammering, the ACP script re-runs
previewLinks()on every ajaxify/posts/chat/composer event; failing links were never removed or cached, so the same URL was re-scraped ~50×/min.
Fixes applied
systemd guard rails:
MemoryMax=512M/MemoryHigh=400M, if it ever leaks again, only OGProxy is killed (auto-restarts), not the whole box.- Network egress guard (
IPAddressAllow/IPAddressDeny), blocks OGProxy from reaching private ranges and cloud metadata at the kernel level, even via a redirect. Loopback127.0.0.1stays allowed because nginx reverse-proxies to127.0.0.1:2000, and127.0.0.53/54stays allowed for the systemd-resolved DNS stub. Verified: legitimate fetches work,169.254.169.254is blocked.
Server (
server.js
timeout: 15s+downloadLimit: 5 MB, caps memory per request.redirect: 'follow'withfollow: 3, many sites 301/302; without following them they failed with a misleading “Connect Timeout Error”. Hop count bounded to limit SSRF surface.- Browser-like User-Agent + Accept headers.
- Success cache 1 h, negative cache 10 min (kills hammering server-side too), 1000-entry cap.
- Full error logging (
error.result.error+ HTTP status; ogs rejects with an object, not an Error). - App-level SSRF guard: static host check + DNS-resolution check (blocks hostnames resolving to private IPs, IPv4 + IPv6).
- API key via
process.env.OGPROXY_API_KEYwith inline fallback.
Client (NodeBB ACP):
:not(.og-processed)on every selector;og-processedclass set before the AJAX call so a link is never re-scraped whether it succeeds or fails (root-cause fix for hammering); the generated card’s<a>carries it too.ignoredHostsnow uses bare hostnames andshouldIgnoreDomaincompares against the browser-resolved hostname (reliable for relative and absolute hrefs), reliably excludes forum-internal links and the proxy’s own subdomain (no more self-scraping).isFileUrlstrips query string / fragment before testing the extension, soimage.png?ssl=1is detected as a file and not sent for preview.
Result
~5 GB spikes → stable ~25 MB. Hammering gone (each URL appears at most once). Redirect/timeout failures dropped sharply. SSRF closed at both app and kernel level.
Client
server contract (unchanged, do not modify ACP response handling)Server returns the native ogs object on success →
data.result/data.htmlreach the client intact. Failures return HTTP ≥ 400 → handled by the client’serror:callback. The internal__ogproxyFailnegative-cache marker never reaches the client. Preview card images (og:image) are unaffected, theisFileUrlchange only affects direct-image links, not the images shown inside cards.Expected remaining failures (not bugs)
Some sites return 400/500 no matter what: Facebook, Reddit (blocks unauthenticated scraping), press sites with cookie/WAF walls,
fb.watch(redirect chains > 3), and direct image links with no extension in the path (ogs correctly rejects non-HTML). These degrade gracefully, link stays clickable, negative-cached, never re-scraped. Decision: leave as-is; chasing each anti-bot site isn’t worth fragile workarounds or extra load.Optional follow-ups (no urgency)
- Move the API key fully to the env var (it’s already browser-exposed client-side, so not a real secret, but avoids duplicating it in source).
- Run OGProxy under a dedicated non-root user (independent security gain; already heavily mitigated by the kernel network guard).
Files changed
/home/xxxxxxxx/domains/proxy.xxxxxx.xxx/ogproxy/server.js/etc/systemd/system/ogproxy.service- NodeBB ACP custom JS (link-preview script)
server.jsconst express = require('express'); const ogs = require('open-graph-scraper'); const cors = require('cors'); const { URL } = require('url'); const cache = require('memory-cache'); const dns = require('dns').promises; const net = require('net'); const app = express(); const port = 2000; // API key from environment, fallback to inline value for compatibility const apiKey = process.env.OGPROXY_API_KEY || 'YOUR_API_KEY'; // --- Limits / safeguards --- const REQUEST_TIMEOUT = 15000; // 15s max per fetch const MAX_CONTENT_BYTES = 5 * 1024 * 1024; // 5 MB max downloaded page const CACHE_TTL_MS = 60 * 60 * 1000; // success cache: 1h const FAIL_CACHE_TTL_MS = 10 * 60 * 1000; // negative cache: 10 min const CACHE_MAX_ENTRIES = 1000; // max cached entries const MAX_REDIRECTS = 3; // cap redirect hops // Returns true if an IP string is private / loopback / link-local / reserved function isBlockedIp(ip) { if (!ip) return true; if (net.isIPv4(ip)) { const p = ip.split('.').map(Number); if (p[0] === 10) return true; if (p[0] === 127) return true; if (p[0] === 0) return true; if (p[0] === 169 && p[1] === 254) return true; // link-local / cloud metadata if (p[0] === 192 && p[1] === 168) return true; if (p[0] === 172 && p[1] >= 16 && p[1] <= 31) return true; if (p[0] === 100 && p[1] >= 64 && p[1] <= 127) return true; // CGNAT return false; } if (net.isIPv6(ip)) { const v = ip.toLowerCase(); if (v === '::1') return true; if (v.startsWith('fc') || v.startsWith('fd')) return true; // unique local if (v.startsWith('fe80')) return true; // link-local if (v.startsWith('::ffff:')) return isBlockedIp(v.split(':').pop()); // IPv4-mapped return false; } return true; // not a valid IP -> block by default } // Static hostname guard (fast reject before any DNS work) function isBlockedHost(hostname) { if (!hostname) return true; const h = hostname.toLowerCase(); return ( h === 'localhost' || h.endsWith('.localhost') || h.endsWith('.internal') || h.endsWith('.local') || (net.isIP(h) && isBlockedIp(h)) // literal IP in URL ); } // Resolve hostname and ensure no resolved IP is private (anti-SSRF via DNS) async function resolvesToPublicIp(hostname) { try { const records = await dns.lookup(hostname, { all: true }); if (!records || records.length === 0) return false; return records.every(r => !isBlockedIp(r.address)); } catch (e) { return false; // DNS failure -> treat as unsafe } } app.use(cors({ origin: 'https://YOURDOMAIN.EXT' })); app.get('/ogproxy', async (req, res) => { let { url } = req.query; const requestApiKey = req.headers['x-api-key']; if (requestApiKey !== apiKey) { return res.status(401).send('Unauthorized'); } if (!url || typeof url !== 'string') { return res.status(400).send('Missing URL parameter'); } if (!url.startsWith('http')) { try { url = new URL(url, `${req.protocol}://${req.get('host')}`).href; } catch (e) { return res.status(400).send('Invalid URL'); } } // Parse + protocol check let parsedUrl; try { parsedUrl = new URL(url); } catch (e) { console.warn(`OGProxy reject [${url}]: invalid URL`); return res.status(400).send('Invalid URL'); } if (!['http:', 'https:'].includes(parsedUrl.protocol)) { console.warn(`OGProxy reject [${url}]: invalid protocol`); return res.status(400).send('Invalid protocol'); } // Static host guard if (isBlockedHost(parsedUrl.hostname)) { console.warn(`OGProxy reject [${url}]: forbidden host (static guard)`); return res.status(403).send('Forbidden host'); } // Cache hit (success OR negative) — checked before DNS to stay fast const cachedResult = cache.get(url); if (cachedResult) { if (cachedResult.__ogproxyFail === true) { return res.status(500).send('Error scraping Open Graph data (cached)'); } return res.json(cachedResult); } // DNS-based SSRF guard: make sure the hostname doesn't resolve to a private IP if (!(await resolvesToPublicIp(parsedUrl.hostname))) { console.warn(`OGProxy reject [${url}]: resolves to private IP or DNS fail (SSRF guard)`); cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS); return res.status(403).send('Forbidden host'); } // ogs options: timeout + download limit + bounded redirects const options = { url, timeout: REQUEST_TIMEOUT, downloadLimit: MAX_CONTENT_BYTES, fetchOptions: { redirect: 'follow', follow: MAX_REDIRECTS, headers: { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8', 'Accept-Language': 'fr-FR,fr;q=0.9,en;q=0.8', }, }, }; // Enforce cache cap before inserting a new entry if (cache.keys().length >= CACHE_MAX_ENTRIES) { cache.clear(); } try { const results = await ogs(options); cache.put(url, results, CACHE_TTL_MS); return res.json(results); } catch (error) { const reason = (error && error.result && error.result.error) || (error && error.message) || 'unknown'; const status = (error && error.response && error.response.status) || 'n/a'; console.error(`OGProxy fail [${url}]: ${reason} (HTTP ${status})`); cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS); return res.status(500).send('Error scraping Open Graph data'); } }); app.listen(port, () => { console.log(`OGProxy server listening on port ${port}`); });NodeBB ACP client script
// ------------------------------------------ // OGPROXY // ------------------------------------------ /// Function to preview links function previewLinks() { $(document).ready(function() { // Set this flag to true to enable debug logging var debug = false; // Get all the links within the content class (posts) and chat, excluding mentions plugin links AND already-processed links var links = $(".content a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), [component=\"chat/message/body\"] a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .preview-container a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .resolved-message a:not(.og-processed), .adhoc a:not(.og-processed)"); // List of bare hostnames to ignore (forum itself + the OGProxy subdomain, to avoid self-scraping) var ignoredHosts = [ window.location.hostname, "proxy.YOURDOMAIN.EXT" ]; // List of paths to ignore var ignoredPaths = ['/post']; if (debug) { // Log the ignored hosts and paths console.log("OGProxy: Hosts in the ignore list and will not be parsed: " + ignoredHosts.join(", ")); console.log("OGProxy: Paths containing " + ignoredPaths[0] + " are in the ignore list and will not be parsed."); console.log("OGProxy: Parsing DOM for any URLs that should be converted to previews."); } // Iterate over each link links.each(function() { var link = $(this); var url = link.attr("href"); var hostname = link.prop("hostname"); var text = $(this).text(); // Helper function to check if the URL is a file URL. // Strip query string and fragment first so e.g. "image.png?ssl=1" is still detected. function isFileUrl(url) { if (!url) { return false; } var cleanUrl = url.split('?')[0].split('#')[0]; var fileExtensionPattern = /\.(jpg|png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg|webp)$/i; return fileExtensionPattern.test(cleanUrl); } function isFullPath(url) { // Regular expression to match a full path URL var fullPathRegex = /^(?:[a-z]+:)?\/\//i; // Check if the URL matches the full path pattern return fullPathRegex.test(url); } // Helper function to check if the domain should be ignored // Uses the browser-resolved hostname (reliable even for relative hrefs) function shouldIgnoreDomain(linkHostname, url, ignoredHosts) { if (!linkHostname) { return false; } // Ignore if it points to an ignored host AND hits an ignored path if (ignoredPaths.some(function(path) { return url && url.includes(path); }) && ignoredHosts.includes(linkHostname)) { return true; } // Ignore any link pointing to an ignored host (forum itself, proxy subdomain) return ignoredHosts.includes(linkHostname); } // Helper function to extract the bare hostname from the URL (kept for compatibility) function extractDomain(url) { if (url) { var domain = url.split('/')[2]?.split(':')[0]; return domain; } return null; } // Process the link if it's not a file URL, not in the ignored domain list, and it's the only content within its parent element if (!isFileUrl(url) && !shouldIgnoreDomain(hostname, url, ignoredHosts) && link.parent().contents().length === 1) { var host = window.location.protocol + "//" + hostname; var faviconApi = "https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=" + host + "&size=32"; if (debug) { console.log("OGProxy: Getting favicon for URL: " + url); } var website = link.prop("hostname"); var altSite = website.replace(/^www\./, "").replace(/\..+$/, ""); var proxy = "https://proxy.YOURDOMAIN.EXT"; var apiKey = "YOUR_API_KEY"; // Mark this link as processed BEFORE the request, so it is never re-scraped // on subsequent ajaxify/posts.loaded/composer.preview events, whether the // request succeeds or fails. This stops the request-hammering loop. link.addClass('og-processed'); // Send an AJAX request to the proxy server to fetch OpenGraph data for the URL $.ajax({ url: proxy + "/ogproxy?url=" + encodeURIComponent(url), method: "GET", headers: { 'X-Api-Key': apiKey }, success: function(data) { var result = data.result; // Extract relevant data from the OpenGraph result or use fallback values var rawTitle = $(data.html).filter('title').text(); var altTitle = $(result).filter('meta[property="og:title"]').attr('content'); var altDescription = $(result).filter('meta[property="og:description"]').attr('content'); var tempDescription = "This website did not return any description. It might be behind a login or paywall."; var altImageUrl = $(result).filter('meta[property="og:image"]').attr('content'); //var tempImage = proxy + "/images/404_3.webp"; var tempImage = proxy + "/images/404.png"; var url = result.requestUrl || url; var title = rawTitle || result.ogTitle || altTitle; var description = result.ogDescription || altDescription || tempDescription; var favicon = faviconApi || result.favicon || data.faviconUrl; var imageUrl = result.ogImage && result.ogImage[0].url || altImageUrl || tempImage; // Some websites return a relative path for the image URL, which isn't much use, so we need to change this to full var fullImagePath = host + imageUrl; var site = result.ogSiteName || altSite; if (isFullPath(imageUrl) === false) { imageUrl = fullImagePath; } // Test to see if image is broken in the preview card. This might be the result of hotlinking protection, so the image isn't // rendered as a result. If this is the case, we replace it with the tempImage to keep things looking nice. $(document).ready(function() { $('#card-image img').on('error', function() { // Image failed to load // Add logic here to handle the broken image if (debug) { console.log("OGProxy: Broken image URL: " + imageUrl + " detected. Replacing with " + tempImage); } $(this).attr('src', tempImage); // Replace with a placeholder image }); }); if (debug) { console.log("OGProxy: Getting data from URL: " + url); console.log("OGProxy: Getting image URL: " + imageUrl); } // Create the HTML for the link preview card var cardHtml = '<div class="card card-wrapper og-processed">' + '<a href="' + url + '" class="og-processed">' + '<div class="card card-preview">' + '<div class="card-image-container">' + '<div id="card-image"><img src="' + imageUrl + '"></div>' + '</div>' + '<div class="card-body">' + '<h4 id="sitetitle" class="card-site-title"><img id="favicon" class="card-favicon" src="' + favicon + '">' + site + '</h4>' + '<h6 class="card-title">' + title + '</h6>' + '<p class="card-text">' + truncateDescription(description, 150) + '</p>' + '</div>' + '</div>' + '</div>' + '</a>'; // Replace the original link with the link preview card link.replaceWith(cardHtml); }, error: function() { if (debug) { console.log("OGProxy: Error fetching OpenGraph data for URL: " + url); } // Link stays in the DOM but is already marked .og-processed, // so it will not be retried on subsequent events. } }); } }); }); } // Helper function to truncate the description with ellipsis if it exceeds the specified limit function truncateDescription(description, limit) { if (description.length > limit) { return description.substring(0, limit) + '...'; } return description; } $(window).on('action:ajaxify.end', function(data) { $(document).ready(function() { previewLinks() }); }); $(window).on('action:posts.loaded', function(data) { $(document).ready(function() { previewLinks() }); }); $(window).on('action:posts.edited', function(data) { $(document).ready(function() { previewLinks() }); }); /* TEST BUG */ /* $(window).on('action:chat.loaded', function(data) { $(document).ready(function() { previewLinks() }); }); */ $(window).on('action:chat.received', function(data) { $(document).ready(function() { previewLinks() }); }); $(window).on('action:composer.preview', function(data) { $(document).ready(function() { previewLinks() }); });systemd unit (
/etc/systemd/system/ogproxy.service)[Unit] Description=OGProxy Server After=network.target [Service] ExecStart=/usr/bin/node /home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy/server.js WorkingDirectory=/home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy Restart=always RestartSec=3 RuntimeMaxSec=86400 # --- Memory safeguards --- MemoryMax=512M MemoryHigh=400M # --- Network egress guard (anti-SSRF at kernel level) --- # Allow loopback (nginx reverse-proxies here on 127.0.0.1:2000) + DNS stub resolver. # Block all private ranges and cloud metadata so a redirect can't reach them. # The app-level isBlockedHost() guard still rejects 127.0.0.1 on the initial URL. IPAddressAllow=127.0.0.1 127.0.0.53 127.0.0.54 IPAddressDeny=10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 169.254.0.0/16 100.64.0.0/10 fc00::/7 fe80::/10 [Install] WantedBy=multi-user.target - No download limit or timeout on
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login