OGProxy - a replacement for iFramely

phenomlab

@DownPW sorry for the delay here - real life is getting in the way, but the latest release will be pushed out soon.

DownPW

no problem.
I was waiting for this new version to change my server but I think I’m going to install the old one.

Keep the good work bro

DownPW

@phenomlab said:

@DownPW sorry for the delay here - real life is getting in the way, but the latest release will be pushed out soon.

Is the new version still in development @phenomlab ?

I ask because personally, I think OG Proxy is really cool, but it uses a lot of RAM on the server, sometimes up to 2GB.
That’s quite a lot.

Optimizing it in this area would be really great.

DownPW

@phenomlab

I have a pro account of Claude AI and i Tell him to optimize the code for the problem memory conssumtion)

– here his answer :

Root causes of the 2 GB RAM usage

1. Unbounded cache (`memory-cache`)

The original code stores every visited URL forever, with no cap and no reliable automatic expiration. On an active forum with thousands of unique links, this grows indefinitely.

Fix → lru-cache: maximum 500 entries (configurable via CACHE_MAX), automatic TTL expiration at 1 hour. Least-recently-used entries are evicted automatically.

2. Redundant fetches per request

For every uncached URL, the original performs: OGS scrape + axios (full page) + cheerio + meta-parser + axios (favicon). If multiple users click the same link simultaneously, all of these are duplicated.

Fix → In-flight request deduplication (inFlight Map): if 5 requests arrive for youtube.com/xxx at the same time, only one fetch is fired — the other 4 await the same promise.

3. Unbounded favicon buffering

The original downloads any favicon as an arraybuffer without checking its size first. A favicon URL pointing to a large image can load several MB into memory.

Fix → HEAD request first to check Content-Length, then skip if > 50 KB (FAVICON_MAX). The HTML response body is also explicitly freed (htmlRes.data = null) once parsed.

4. Axios with no timeout or size limit

Without a timeout, slow requests remain suspended in memory indefinitely. Without maxContentLength, a 50 MB page is fully loaded into RAM.

Fix → REQ_TIMEOUT=8000ms and MAX_BODY=2MB per outbound request.

5. No compression

Without gzip, large JSON responses consume more bandwidth and remain in transit longer.

Fix → compression middleware.

Near-certain gains:

The bounded LRU cache will prevent unbounded memory growth. This is most likely the main cause of the 2 GB if your forum has a large number of unique links.
Timeouts will prevent stalled requests from sitting in memory indefinitely.
Page and favicon size limits will eliminate pathological edge cases.

Traffic-dependent gains:

In-flight request deduplication only helps if multiple users click the same link simultaneously — very useful on a busy forum, negligible on a small one.

What may not change:

If the 2 GB comes from something else (a leak inside open-graph-scraper itself, or Node.js not returning memory to the OS after objects are freed), this won’t fix everything. Node.js tends to hold onto allocated memory even after objects are released — this is normal V8 behaviour.

In that case, a simple complementary solution is to add an automatic nightly restart to the systemd service:

[Service]
Restart=always
RuntimeMaxSec=86400   # restart after 24h max

This doesn’t fix the root cause but prevents accumulation over several days.

– The JS Code (server.js) :

"use strict";

/**
 * OGProxy - Optimized server
 * Improvements over original:
 *  - LRU cache (bounded) instead of unbounded memory-cache
 *  - Request deduplication (coalescing) to avoid redundant fetches
 *  - Compression middleware
 *  - Rate limiting
 *  - Helmet security headers
 *  - Axios timeout + max response size
 *  - Favicon size cap + skip if too large
 *  - Graceful shutdown
 *  - Winston structured logging
 *  - Joi input validation
 *  - .env support
 *  - Health check endpoint
 */

require("dotenv").config();

const express     = require("express");
const cors        = require("cors");
const helmet      = require("helmet");
const compression = require("compression");
const rateLimit   = require("express-rate-limit");
const morgan      = require("morgan");
const winston     = require("winston");
const { LRUCache }= require("lru-cache");
const Joi         = require("joi");
const axios       = require("axios");
const ogs         = require("open-graph-scraper");
const cheerio     = require("cheerio");
const metaParser  = require("meta-parser");
const { URL }     = require("url");
const path        = require("path");

// ─── Config ────────────────────────────────────────────────────────────────

const PORT         = parseInt(process.env.PORT   || "2000", 10);
const API_KEY      = process.env.API_KEY          || "YOUR_API_KEY_HERE";
const ORIGIN       = process.env.ORIGIN           || "https://your-forum.example.com";
const CACHE_MAX    = parseInt(process.env.CACHE_MAX    || "500",  10);  // max entries
const CACHE_TTL    = parseInt(process.env.CACHE_TTL    || "3600", 10) * 1000; // ms (default 1h)
const REQ_TIMEOUT  = parseInt(process.env.REQ_TIMEOUT  || "8000", 10);  // ms per outbound request
const MAX_BODY     = parseInt(process.env.MAX_BODY      || "2",    10) * 1024 * 1024; // MB → bytes
const FAVICON_MAX  = parseInt(process.env.FAVICON_MAX   || "51200",10);  // bytes (50 KB)
const RATE_WINDOW  = parseInt(process.env.RATE_WINDOW   || "60",   10) * 1000; // ms
const RATE_LIMIT   = parseInt(process.env.RATE_LIMIT    || "30",   10);  // req per window

// ─── Logger ────────────────────────────────────────────────────────────────

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || "info",
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  transports: [
    new winston.transports.Console({ format: winston.format.simple() }),
    new winston.transports.File({ filename: "error.log",  level: "error" }),
    new winston.transports.File({ filename: "access.log" }),
  ],
});

// ─── LRU Cache (bounded) ───────────────────────────────────────────────────

const cache = new LRUCache({
  max: CACHE_MAX,       // max number of items
  ttl: CACHE_TTL,       // auto-expire entries
  updateAgeOnGet: false,
});

// ─── In-flight request deduplication ──────────────────────────────────────
// If two requests arrive for the same URL before the first completes,
// the second one waits for the first promise instead of spawning a new fetch.

const inFlight = new Map(); // url → Promise

// ─── Axios instance (shared, keep-alive, bounded) ─────────────────────────

const http  = require("http");
const https = require("https");

const axiosInstance = axios.create({
  timeout: REQ_TIMEOUT,
  maxContentLength: MAX_BODY,
  maxBodyLength: MAX_BODY,
  headers: {
    "User-Agent":
      "Mozilla/5.0 (compatible; OGProxy/2.0; +https://github.com/phenomlab/ogproxy)",
    "Accept-Language": "en-US,en;q=0.9",
  },
  httpAgent:  new http.Agent ({ keepAlive: true, maxSockets: 10 }),
  httpsAgent: new https.Agent({ keepAlive: true, maxSockets: 10 }),
});

// ─── Input validation ─────────────────────────────────────────────────────

const urlSchema = Joi.object({
  url: Joi.string().uri({ scheme: ["http", "https"] }).required(),
});

// ─── App setup ────────────────────────────────────────────────────────────

const app = express();

// Security headers
app.use(helmet({ contentSecurityPolicy: false }));

// CORS
app.use(cors({ origin: ORIGIN }));

// Gzip compression
app.use(compression());

// HTTP request logging (stream to winston)
app.use(morgan("combined", {
  stream: { write: (msg) => logger.info(msg.trim()) },
}));

// Rate limiting per IP
app.use(
  "/ogproxy",
  rateLimit({
    windowMs: RATE_WINDOW,
    max: RATE_LIMIT,
    standardHeaders: true,
    legacyHeaders: false,
    message: { error: "Too many requests, please try again later." },
  })
);

// Static images directory
app.use("/images", express.static(path.join(__dirname, "images")));

// ─── Health check ─────────────────────────────────────────────────────────

app.get("/health", (_req, res) => {
  res.json({
    status: "ok",
    uptime: process.uptime(),
    cacheSize: cache.size,
    inFlight: inFlight.size,
    memory: process.memoryUsage(),
  });
});

// ─── API key middleware ────────────────────────────────────────────────────

function requireApiKey(req, res, next) {
  const key = req.headers["x-api-key"];
  if (!key || key !== API_KEY) {
    return res.status(401).json({ error: "Unauthorized" });
  }
  next();
}

// ─── Core fetch logic ─────────────────────────────────────────────────────

async function fetchOGData(targetUrl) {
  // 1. OGS scrape
  const { result: ogsResult, error: ogsError } = await ogs({ url: targetUrl });
  if (ogsError) throw new Error(`OGS failed: ${ogsError}`);

  // 2. Fetch HTML (with size cap)
  let metadata = {};
  let faviconB64 = null;

  try {
    const htmlRes = await axiosInstance.get(targetUrl, {
      responseType: "text",
      decompress: true,
    });

    const $ = cheerio.load(htmlRes.data);

    // meta-parser on the raw HTML
    try {
      const parsed = metaParser(htmlRes.data);
      metadata = parsed || {};
    } catch (_) { /* non-fatal */ }

    // 3. Favicon – only fetch if small enough
    let faviconHref =
      $('link[rel="icon"]').attr("href") ||
      $('link[rel="shortcut icon"]').attr("href");

    if (faviconHref) {
      try {
        const base = new URL(targetUrl);
        const faviconUrl = new URL(faviconHref, base).href;

        // HEAD first to check Content-Length before downloading
        const headRes = await axiosInstance.head(faviconUrl).catch(() => null);
        const contentLength = headRes
          ? parseInt(headRes.headers["content-length"] || "0", 10)
          : 0;

        if (contentLength === 0 || contentLength <= FAVICON_MAX) {
          const iconRes = await axiosInstance.get(faviconUrl, {
            responseType: "arraybuffer",
            maxContentLength: FAVICON_MAX,
          });
          const mime =
            iconRes.headers["content-type"] || "image/x-icon";
          faviconB64 =
            `data:${mime};base64,` +
            Buffer.from(iconRes.data).toString("base64");
        } else {
          logger.info(`Favicon too large (${contentLength}B), skipping: ${faviconUrl}`);
        }
      } catch (err) {
        logger.warn("Favicon fetch failed", { url: targetUrl, err: err.message });
      }
    }

    // Free the HTML string early
    htmlRes.data = null;
  } catch (err) {
    logger.warn("HTML fetch failed (using OGS only)", {
      url: targetUrl,
      err: err.message,
    });
  }

  return {
    ...ogsResult,
    metaProperties: metadata,
    faviconUrl: faviconB64,
  };
}

// ─── /ogproxy route ───────────────────────────────────────────────────────

app.get("/ogproxy", requireApiKey, async (req, res, next) => {
  try {
    // Validate input
    let { url: targetUrl } = req.query;

    const { error } = urlSchema.validate({ url: targetUrl });
    if (error) {
      return res
        .status(400)
        .json({ error: `Invalid URL: ${error.details[0].message}` });
    }

    // Normalise (strip trailing slash etc.)
    targetUrl = new URL(targetUrl).href;

    // Cache hit?
    const cached = cache.get(targetUrl);
    if (cached) {
      res.setHeader("X-Cache", "HIT");
      return res.json(cached);
    }

    res.setHeader("X-Cache", "MISS");

    // Deduplicate concurrent requests for the same URL
    if (!inFlight.has(targetUrl)) {
      const promise = fetchOGData(targetUrl)
        .then((data) => {
          cache.set(targetUrl, data);
          return data;
        })
        .finally(() => inFlight.delete(targetUrl));

      inFlight.set(targetUrl, promise);
    }

    const data = await inFlight.get(targetUrl);
    return res.json(data);
  } catch (err) {
    next(err);
  }
});

// ─── Centralised error handler ────────────────────────────────────────────

// eslint-disable-next-line no-unused-vars
app.use((err, _req, res, _next) => {
  logger.error("Unhandled error", { message: err.message, stack: err.stack });
  res.status(500).json({ error: "Internal server error" });
});

// ─── Start ────────────────────────────────────────────────────────────────

const server = app.listen(PORT, () => {
  logger.info(`OGProxy listening on port ${PORT}`);
  logger.info(`Cache: max=${CACHE_MAX} entries, TTL=${CACHE_TTL / 1000}s`);
});

// ─── Graceful shutdown ────────────────────────────────────────────────────

function shutdown(signal) {
  logger.info(`${signal} received – shutting down gracefully`);
  server.close(() => {
    logger.info("HTTP server closed");
    process.exit(0);
  });

  // Force exit after 10 s if still busy
  setTimeout(() => {
    logger.warn("Forcing exit after timeout");
    process.exit(1);
  }, 10_000).unref();
}

process.on("SIGTERM", () => shutdown("SIGTERM"));
process.on("SIGINT",  () => shutdown("SIGINT"));

process.on("uncaughtException", (err) => {
  logger.error("Uncaught exception", { err });
  shutdown("uncaughtException");
});

process.on("unhandledRejection", (reason) => {
  logger.error("Unhandled rejection", { reason });
});

–> What do you think of this code?

phenomlab

@DownPW yes, I’ve noticed similar and it is to do with the cache because it is stored in memory and not on disk…

However, when I wrote the code, I did allow for a maximum cache age so I’m puzzled as to why this isn’t being honoured.

A review of the code is long overdue to be honest, but it’s slipped down the priority list. The fastest way to resolve this in the short term is to perform a daily restart of the service which will flush the cache.

DownPW

Don’t hesitate to tell me what you think of this modified code.

Until next time.

phenomlab

@DownPW Looks ok for the most part, but this looks like it’s truncated?

  return {
    ...ogsResult,
    metaProperties: metadata,
    faviconUrl: faviconB64,
  };
}

DownPW

@phenomlab said:

@DownPW Looks ok for the most part, but this looks like it’s truncated?
  return {
    ...ogsResult,
    metaProperties: metadata,
    faviconUrl: faviconB64,
  };
}

exactly, i must see the code, i tell you soon

DownPW

I was experiencing 500 (Internal Server Error) responses from the proxy, visible in the browser console:

GET https://proxy.xxx-xxx.net/ogproxy?url=https%3A%2F%2Fzupimages.net%2Fup%2F26%2F16%2Fld8h.jpg 500 (Internal Server Error)

After investigation, I found two root causes:

1. Direct image URLs being sent to the proxy

The custom JavaScript responsible for detecting links and sending them to the proxy was using the following regex to exclude direct image links:

var fileExtensionPattern = /\.(png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg)$/i;

Note that .jpg and .webp were missing from the pattern. As a result, links ending in .jpg were not recognized as direct image URLs and were forwarded to the OGProxy, which then tried to scrape them as web pages using open-graph-scraper — causing a 500 error.

The fix was to add the missing extensions:

var fileExtensionPattern = /\.(jpg|png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg|webp)$/i;

2. The proxy not following HTTP redirects

Some image hosting services (e.g. zupimages.net) return a 301 redirect from the bare domain to www. When curl follows the redirect manually the image loads fine:

curl -IL https://zupimages.net/up/26/16/ld8h.jpg
HTTP/2 301 → https://www.zupimages.net/up/26/16/ld8h.jpg
HTTP/2 200

However, the proxy’s axios.get() call does not handle this gracefully when open-graph-scraper is involved, resulting in a 500 error being returned to the client.

My questions are:

Is there a known best practice for handling redirect chains in open-graph-scraper?
Would passing maxRedirects or followRedirect options explicitly to axios or ogs fix this reliably?
Is there a cleaner way to pre-filter direct image/file URLs before they reach the proxy, ideally at the NodeBB plugin level rather than in custom JS?

Thanks in advance.

DownPW

Hello @phenomlab

My VPS (NodeBB + MongoDB + Redis + nginx + Webmin, single Hetzner box) had repeated multi-minute interruptions: RAM and swap both at 100%, disk I/O spiking, CPU pinned, swap thrashing, not hardware. Kernel logs showed the OOM killer firing repeatedly, always against the ogproxy.service cgroup. The OGProxy Node process ballooned to ~5 GB RSS within minutes before being killed, dragging the whole box into thrashing. MongoDB was fine (~650 MB). Application-level problem in OGProxy, no hosting ticket warranted.

Root causes

No download limit or timeout on ogs({ url }), a single link could pull gigabytes into memory.
Unbounded, never-expiring cache, cache.put(url, results) stored the full ogs object including the entire page HTML (results.html) forever. The main leak.
Dead code, favicon/MetaParser/cheerio block gated on if (results.data && ...), but ogs v6’s root key is result, not data, so it never ran. Removed with its unused imports.
Client-side hammering, the ACP script re-runs previewLinks() on every ajaxify/posts/chat/composer event; failing links were never removed or cached, so the same URL was re-scraped ~50×/min.

Fixes applied

systemd guard rails:

MemoryMax=512M / MemoryHigh=400M, if it ever leaks again, only OGProxy is killed (auto-restarts), not the whole box.
Network egress guard (IPAddressAllow/IPAddressDeny), blocks OGProxy from reaching private ranges and cloud metadata at the kernel level, even via a redirect. Loopback 127.0.0.1 stays allowed because nginx reverse-proxies to 127.0.0.1:2000, and 127.0.0.53/54 stays allowed for the systemd-resolved DNS stub. Verified: legitimate fetches work, 169.254.169.254 is blocked.

Server (server.js

timeout: 15s + downloadLimit: 5 MB, caps memory per request.
redirect: 'follow' with follow: 3, many sites 301/302; without following them they failed with a misleading “Connect Timeout Error”. Hop count bounded to limit SSRF surface.
Browser-like User-Agent + Accept headers.
Success cache 1 h, negative cache 10 min (kills hammering server-side too), 1000-entry cap.
Full error logging (error.result.error + HTTP status; ogs rejects with an object, not an Error).
App-level SSRF guard: static host check + DNS-resolution check (blocks hostnames resolving to private IPs, IPv4 + IPv6).
API key via process.env.OGPROXY_API_KEY with inline fallback.

Client (NodeBB ACP):

:not(.og-processed) on every selector; og-processed class set before the AJAX call so a link is never re-scraped whether it succeeds or fails (root-cause fix for hammering); the generated card’s <a> carries it too.
ignoredHosts now uses bare hostnames and shouldIgnoreDomain compares against the browser-resolved hostname (reliable for relative and absolute hrefs), reliably excludes forum-internal links and the proxy’s own subdomain (no more self-scraping).
isFileUrl strips query string / fragment before testing the extension, so image.png?ssl=1 is detected as a file and not sent for preview.

Result

~5 GB spikes → stable ~25 MB. Hammering gone (each URL appears at most once). Redirect/timeout failures dropped sharply. SSRF closed at both app and kernel level.

Client server contract (unchanged, do not modify ACP response handling)

Server returns the native ogs object on success → data.result / data.html reach the client intact. Failures return HTTP ≥ 400 → handled by the client’s error: callback. The internal __ogproxyFail negative-cache marker never reaches the client. Preview card images (og:image) are unaffected, the isFileUrl change only affects direct-image links, not the images shown inside cards.

Expected remaining failures (not bugs)

Some sites return 400/500 no matter what: Facebook, Reddit (blocks unauthenticated scraping), press sites with cookie/WAF walls, fb.watch (redirect chains > 3), and direct image links with no extension in the path (ogs correctly rejects non-HTML). These degrade gracefully, link stays clickable, negative-cached, never re-scraped. Decision: leave as-is; chasing each anti-bot site isn’t worth fragile workarounds or extra load.

Optional follow-ups (no urgency)

Move the API key fully to the env var (it’s already browser-exposed client-side, so not a real secret, but avoids duplicating it in source).
Run OGProxy under a dedicated non-root user (independent security gain; already heavily mitigated by the kernel network guard).

Files changed

/home/xxxxxxxx/domains/proxy.xxxxxx.xxx/ogproxy/server.js
/etc/systemd/system/ogproxy.service
NodeBB ACP custom JS (link-preview script)

`server.js`

const express = require('express');
const ogs = require('open-graph-scraper');
const cors = require('cors');
const { URL } = require('url');
const cache = require('memory-cache');
const dns = require('dns').promises;
const net = require('net');

const app = express();
const port = 2000;

// API key from environment, fallback to inline value for compatibility
const apiKey = process.env.OGPROXY_API_KEY || 'YOUR_API_KEY';

// --- Limits / safeguards ---
const REQUEST_TIMEOUT = 15000;                // 15s max per fetch
const MAX_CONTENT_BYTES = 5 * 1024 * 1024;    // 5 MB max downloaded page
const CACHE_TTL_MS = 60 * 60 * 1000;          // success cache: 1h
const FAIL_CACHE_TTL_MS = 10 * 60 * 1000;     // negative cache: 10 min
const CACHE_MAX_ENTRIES = 1000;               // max cached entries
const MAX_REDIRECTS = 3;                       // cap redirect hops

// Returns true if an IP string is private / loopback / link-local / reserved
function isBlockedIp(ip) {
  if (!ip) return true;
  if (net.isIPv4(ip)) {
    const p = ip.split('.').map(Number);
    if (p[0] === 10) return true;
    if (p[0] === 127) return true;
    if (p[0] === 0) return true;
    if (p[0] === 169 && p[1] === 254) return true;   // link-local / cloud metadata
    if (p[0] === 192 && p[1] === 168) return true;
    if (p[0] === 172 && p[1] >= 16 && p[1] <= 31) return true;
    if (p[0] === 100 && p[1] >= 64 && p[1] <= 127) return true; // CGNAT
    return false;
  }
  if (net.isIPv6(ip)) {
    const v = ip.toLowerCase();
    if (v === '::1') return true;
    if (v.startsWith('fc') || v.startsWith('fd')) return true;  // unique local
    if (v.startsWith('fe80')) return true;                       // link-local
    if (v.startsWith('::ffff:')) return isBlockedIp(v.split(':').pop()); // IPv4-mapped
    return false;
  }
  return true; // not a valid IP -> block by default
}

// Static hostname guard (fast reject before any DNS work)
function isBlockedHost(hostname) {
  if (!hostname) return true;
  const h = hostname.toLowerCase();
  return (
    h === 'localhost' ||
    h.endsWith('.localhost') ||
    h.endsWith('.internal') ||
    h.endsWith('.local') ||
    (net.isIP(h) && isBlockedIp(h))   // literal IP in URL
  );
}

// Resolve hostname and ensure no resolved IP is private (anti-SSRF via DNS)
async function resolvesToPublicIp(hostname) {
  try {
    const records = await dns.lookup(hostname, { all: true });
    if (!records || records.length === 0) return false;
    return records.every(r => !isBlockedIp(r.address));
  } catch (e) {
    return false; // DNS failure -> treat as unsafe
  }
}

app.use(cors({ origin: 'https://YOURDOMAIN.EXT' }));

app.get('/ogproxy', async (req, res) => {
  let { url } = req.query;
  const requestApiKey = req.headers['x-api-key'];

  if (requestApiKey !== apiKey) {
    return res.status(401).send('Unauthorized');
  }
  if (!url || typeof url !== 'string') {
    return res.status(400).send('Missing URL parameter');
  }
  if (!url.startsWith('http')) {
    try {
      url = new URL(url, `${req.protocol}://${req.get('host')}`).href;
    } catch (e) {
      return res.status(400).send('Invalid URL');
    }
  }

  // Parse + protocol check
  let parsedUrl;
  try {
    parsedUrl = new URL(url);
  } catch (e) {
    console.warn(`OGProxy reject [${url}]: invalid URL`);
    return res.status(400).send('Invalid URL');
  }
  if (!['http:', 'https:'].includes(parsedUrl.protocol)) {
    console.warn(`OGProxy reject [${url}]: invalid protocol`);
    return res.status(400).send('Invalid protocol');
  }

  // Static host guard
  if (isBlockedHost(parsedUrl.hostname)) {
    console.warn(`OGProxy reject [${url}]: forbidden host (static guard)`);
    return res.status(403).send('Forbidden host');
  }

  // Cache hit (success OR negative) — checked before DNS to stay fast
  const cachedResult = cache.get(url);
  if (cachedResult) {
    if (cachedResult.__ogproxyFail === true) {
      return res.status(500).send('Error scraping Open Graph data (cached)');
    }
    return res.json(cachedResult);
  }

  // DNS-based SSRF guard: make sure the hostname doesn't resolve to a private IP
  if (!(await resolvesToPublicIp(parsedUrl.hostname))) {
    console.warn(`OGProxy reject [${url}]: resolves to private IP or DNS fail (SSRF guard)`);
    cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(403).send('Forbidden host');
  }

  // ogs options: timeout + download limit + bounded redirects
  const options = {
    url,
    timeout: REQUEST_TIMEOUT,
    downloadLimit: MAX_CONTENT_BYTES,
    fetchOptions: {
      redirect: 'follow',
      follow: MAX_REDIRECTS,
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Accept-Language': 'fr-FR,fr;q=0.9,en;q=0.8',
      },
    },
  };

  // Enforce cache cap before inserting a new entry
  if (cache.keys().length >= CACHE_MAX_ENTRIES) {
    cache.clear();
  }

  try {
    const results = await ogs(options);
    cache.put(url, results, CACHE_TTL_MS);
    return res.json(results);
  } catch (error) {
    const reason =
      (error && error.result && error.result.error) ||
      (error && error.message) ||
      'unknown';
    const status =
      (error && error.response && error.response.status) || 'n/a';
    console.error(`OGProxy fail [${url}]: ${reason} (HTTP ${status})`);
    cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(500).send('Error scraping Open Graph data');
  }
});

app.listen(port, () => {
  console.log(`OGProxy server listening on port ${port}`);
});

NodeBB ACP client script

// ------------------------------------------
// OGPROXY
// ------------------------------------------

/// Function to preview links
function previewLinks() {
	$(document).ready(function() {
		// Set this flag to true to enable debug logging
		var debug = false;
		// Get all the links within the content class (posts) and chat, excluding mentions plugin links AND already-processed links
		var links = $(".content a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), [component=\"chat/message/body\"] a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .preview-container a:not(.plugin-mentions-a):not(.plugin-mentions-user):not(.og-processed), .resolved-message a:not(.og-processed), .adhoc a:not(.og-processed)");
		// List of bare hostnames to ignore (forum itself + the OGProxy subdomain, to avoid self-scraping)
		var ignoredHosts = [
			window.location.hostname,
			"proxy.YOURDOMAIN.EXT"
		];

		// List of paths to ignore
		var ignoredPaths = ['/post'];
		if (debug) {
			// Log the ignored hosts and paths
			console.log("OGProxy: Hosts in the ignore list and will not be parsed: " + ignoredHosts.join(", "));
			console.log("OGProxy: Paths containing " + ignoredPaths[0] + " are in the ignore list and will not be parsed.");
			console.log("OGProxy: Parsing DOM for any URLs that should be converted to previews.");
		}
		// Iterate over each link
		links.each(function() {
			var link = $(this);
			var url = link.attr("href");
			var hostname = link.prop("hostname");
			var text = $(this).text();

			// Helper function to check if the URL is a file URL.
			// Strip query string and fragment first so e.g. "image.png?ssl=1" is still detected.
			function isFileUrl(url) {
				if (!url) {
					return false;
				}
				var cleanUrl = url.split('?')[0].split('#')[0];
				var fileExtensionPattern = /\.(jpg|png|jpeg|gif|pdf|docx?|xlsx?|pptx?|zip|rar|svg|webp)$/i;
				return fileExtensionPattern.test(cleanUrl);
			}

			function isFullPath(url) {
				// Regular expression to match a full path URL
				var fullPathRegex = /^(?:[a-z]+:)?\/\//i;

				// Check if the URL matches the full path pattern
				return fullPathRegex.test(url);
			}

			// Helper function to check if the domain should be ignored
			// Uses the browser-resolved hostname (reliable even for relative hrefs)
			function shouldIgnoreDomain(linkHostname, url, ignoredHosts) {
				if (!linkHostname) {
					return false;
				}
				// Ignore if it points to an ignored host AND hits an ignored path
				if (ignoredPaths.some(function(path) { return url && url.includes(path); }) && ignoredHosts.includes(linkHostname)) {
					return true;
				}
				// Ignore any link pointing to an ignored host (forum itself, proxy subdomain)
				return ignoredHosts.includes(linkHostname);
			}

			// Helper function to extract the bare hostname from the URL (kept for compatibility)
			function extractDomain(url) {
				if (url) {
					var domain = url.split('/')[2]?.split(':')[0];
					return domain;
				}
				return null;
			}

			// Process the link if it's not a file URL, not in the ignored domain list, and it's the only content within its parent element
			if (!isFileUrl(url) && !shouldIgnoreDomain(hostname, url, ignoredHosts) && link.parent().contents().length === 1) {
				var host = window.location.protocol + "//" + hostname;
				var faviconApi = "https://t0.gstatic.com/faviconV2?client=SOCIAL&type=FAVICON&fallback_opts=TYPE,SIZE,URL&url=" + host + "&size=32";
				if (debug) {
					console.log("OGProxy: Getting favicon for URL: " + url);
				}
				var website = link.prop("hostname");
				var altSite = website.replace(/^www\./, "").replace(/\..+$/, "");
                var proxy = "https://proxy.YOURDOMAIN.EXT";
                var apiKey = "YOUR_API_KEY";

				// Mark this link as processed BEFORE the request, so it is never re-scraped
				// on subsequent ajaxify/posts.loaded/composer.preview events, whether the
				// request succeeds or fails. This stops the request-hammering loop.
				link.addClass('og-processed');

				// Send an AJAX request to the proxy server to fetch OpenGraph data for the URL
				$.ajax({
					url: proxy + "/ogproxy?url=" + encodeURIComponent(url),
					method: "GET",
					headers: {
						'X-Api-Key': apiKey
					},
					success: function(data) {
						var result = data.result;
						// Extract relevant data from the OpenGraph result or use fallback values
						var rawTitle = $(data.html).filter('title').text();
						var altTitle = $(result).filter('meta[property="og:title"]').attr('content');
						var altDescription = $(result).filter('meta[property="og:description"]').attr('content');
						var tempDescription = "This website did not return any description. It might be behind a login or paywall.";
						var altImageUrl = $(result).filter('meta[property="og:image"]').attr('content');
						//var tempImage = proxy + "/images/404_3.webp";
                        var tempImage = proxy + "/images/404.png";
						var url = result.requestUrl || url;
						var title = rawTitle || result.ogTitle || altTitle;
						var description = result.ogDescription || altDescription || tempDescription;
						var favicon = faviconApi || result.favicon || data.faviconUrl;
						var imageUrl = result.ogImage && result.ogImage[0].url || altImageUrl || tempImage;
						// Some websites return a relative path for the image URL, which isn't much use, so we need to change this to full
						var fullImagePath = host + imageUrl;
						var site = result.ogSiteName || altSite;
						if (isFullPath(imageUrl) === false) {
							imageUrl = fullImagePath;
						}
						// Test to see if image is broken in the preview card. This might be the result of hotlinking protection, so the image isn't
						// rendered as a result. If this is the case, we replace it with the tempImage to keep things looking nice.
						$(document).ready(function() {
							$('#card-image img').on('error', function() {
								// Image failed to load
								// Add logic here to handle the broken image
								if (debug) {
									console.log("OGProxy: Broken image URL: " + imageUrl + " detected. Replacing with " + tempImage);
								}
								$(this).attr('src', tempImage); // Replace with a placeholder image
							});
						});

						if (debug) {
							console.log("OGProxy: Getting data from URL: " + url);
							console.log("OGProxy: Getting image URL: " + imageUrl);
						}
						// Create the HTML for the link preview card
						var cardHtml = '<div class="card card-wrapper og-processed">' +
						    '<a href="' + url + '" class="og-processed">' +
							'<div class="card card-preview">' +
							'<div class="card-image-container">' +
							'<div id="card-image"><img src="' + imageUrl + '"></div>' +
							'</div>' +
							'<div class="card-body">' +
							'<h4 id="sitetitle" class="card-site-title"><img id="favicon" class="card-favicon" src="' + favicon + '">' + site + '</h4>' +
							'<h6 class="card-title">' + title + '</h6>' +
							'<p class="card-text">' + truncateDescription(description, 150) + '</p>' +
							'</div>' +
							'</div>' +
							'</div>' +
							'</a>';
						// Replace the original link with the link preview card
						link.replaceWith(cardHtml);
					},
					error: function() {
						if (debug) {
							console.log("OGProxy: Error fetching OpenGraph data for URL: " + url);
						}
						// Link stays in the DOM but is already marked .og-processed,
						// so it will not be retried on subsequent events.
					}
				});
			}
		});
	});
}

// Helper function to truncate the description with ellipsis if it exceeds the specified limit
function truncateDescription(description, limit) {
	if (description.length > limit) {
		return description.substring(0, limit) + '...';
	}
	return description;
}

$(window).on('action:ajaxify.end', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});

$(window).on('action:posts.loaded', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});

$(window).on('action:posts.edited', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});

/* TEST BUG */
/*
$(window).on('action:chat.loaded', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});
*/

$(window).on('action:chat.received', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});
$(window).on('action:composer.preview', function(data) {
	$(document).ready(function() {
		previewLinks()
	});
});

systemd unit (`/etc/systemd/system/ogproxy.service`)

[Unit]
Description=OGProxy Server
After=network.target

[Service]
ExecStart=/usr/bin/node /home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy/server.js
WorkingDirectory=/home/XXXXXXXXXXXXXXXX/domains/proxy.XXXXXXXXX.XXX/ogproxy
Restart=always
RestartSec=3
RuntimeMaxSec=86400

# --- Memory safeguards ---
MemoryMax=512M
MemoryHigh=400M

# --- Network egress guard (anti-SSRF at kernel level) ---
# Allow loopback (nginx reverse-proxies here on 127.0.0.1:2000) + DNS stub resolver.
# Block all private ranges and cloud metadata so a redirect can't reach them.
# The app-level isBlockedHost() guard still rejects 127.0.0.1 on the initial URL.
IPAddressAllow=127.0.0.1 127.0.0.53 127.0.0.54
IPAddressDeny=10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 169.254.0.0/16 100.64.0.0/10 fc00::/7 fe80::/10

[Install]
WantedBy=multi-user.target

DownPW

OGProxy : follow-up: second memory leak found & fixed

Context

After this morning’s fixes (download limit, cache TTL, systemd MemoryMax), the server stayed up, but during the afternoon OGProxy slowly climbed to 464 MB RSS with all 4 GB of swap consumed. The systemd MemoryMax=512M guard rail did its job (it capped OGProxy instead of letting it take the whole box down like before), which bought time to diagnose calmly. This was a second, separate leak, slower than the first.

Root cause

The logs showed the smoking gun:

MaxListenersExceededWarning: Possible EventEmitter memory leak detected.
22 terminated listeners added to [Fetch]. MaxListeners is 21.

Stack: ogs 6.1.0 → undici 5.22.1 on Node 24. ogs v6 implements its timeout option via an AbortSignal passed to undici. With this version combo, when a request is aborted by that internal timeout, the abort listener attached to the Fetch object is not removed. Every timed-out request leaks one listener, and they accumulate in memory.

Trigger: a 10-day-old forum post listing ~10 store.ubisoft.com links. Opening that topic fires ~10 previews in parallel, all hitting the timeout, each leaking a listener. Repeated views over the day pushed it to 464 MB + full swap.

There was also a vicious circle: as the process bloated, its own outbound fetches got slow enough to time out, which created more timeouts, which leaked more listeners. That explains the flood of Connect Timeout Error in the afternoon logs, hey were a symptom of the leak, not an external block. Once restarted fresh, those same Ubisoft URLs returned success: true in ~2.4 s.

Fix

Stop using ogs’s internal timeout option (the leaking path). Instead, manage the timeout with our own AbortController + setTimeout, pass the signal via fetchOptions, and always clearTimeout() in a finally block, which detaches the abort listener on every exit path (success, failure, or timeout). Also raised EventEmitter.defaultMaxListeners to 50 as a safety net for legitimate concurrency bursts (like that 10-link post).

Verified: with our own signal aborting at 3 s, a Ubisoft URL completed in 2.4 s (signal is respected by ogs 6.1). After deploy, no more MaxListenersExceededWarning, no more cascade timeouts, and memory now oscillates (217 MB under load → back down to 91 MB at rest) instead of climbing and staying climbed.

Note on RuntimeMaxSec

The existing RuntimeMaxSec=86400 (forced daily restart) was almost certainly an earlier band-aid masking exactly this leak. Now that the cause is fixed, it can be removed once stability is confirmed over 24–48 h, but it’s harmless to keep for now.

Only `server.js` changed (client ACP + systemd unit unchanged)

const express = require('express');
const ogs = require('open-graph-scraper');
const cors = require('cors');
const { URL } = require('url');
const cache = require('memory-cache');
const dns = require('dns').promises;
const net = require('net');

// Raise the listener ceiling as a safety net against transient concurrency spikes
require('events').EventEmitter.defaultMaxListeners = 50;

const app = express();
const port = 2000;

// API key from environment, fallback to inline value for compatibility
const apiKey = process.env.OGPROXY_API_KEY || 'YOUR_API_KEY_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx';

// --- Limits / safeguards ---
const REQUEST_TIMEOUT = 15000;                // 15s max per fetch
const MAX_CONTENT_BYTES = 5 * 1024 * 1024;    // 5 MB max downloaded page
const CACHE_TTL_MS = 60 * 60 * 1000;          // success cache: 1h
const FAIL_CACHE_TTL_MS = 10 * 60 * 1000;     // negative cache: 10 min
const CACHE_MAX_ENTRIES = 1000;               // max cached entries
const MAX_REDIRECTS = 3;                       // cap redirect hops

// Returns true if an IP string is private / loopback / link-local / reserved
function isBlockedIp(ip) {
  if (!ip) return true;
  if (net.isIPv4(ip)) {
    const p = ip.split('.').map(Number);
    if (p[0] === 10) return true;
    if (p[0] === 127) return true;
    if (p[0] === 0) return true;
    if (p[0] === 169 && p[1] === 254) return true;   // link-local / cloud metadata
    if (p[0] === 192 && p[1] === 168) return true;
    if (p[0] === 172 && p[1] >= 16 && p[1] <= 31) return true;
    if (p[0] === 100 && p[1] >= 64 && p[1] <= 127) return true; // CGNAT
    return false;
  }
  if (net.isIPv6(ip)) {
    const v = ip.toLowerCase();
    if (v === '::1') return true;
    if (v.startsWith('fc') || v.startsWith('fd')) return true;  // unique local
    if (v.startsWith('fe80')) return true;                       // link-local
    if (v.startsWith('::ffff:')) return isBlockedIp(v.split(':').pop()); // IPv4-mapped
    return false;
  }
  return true; // not a valid IP -> block by default
}

// Static hostname guard (fast reject before any DNS work)
function isBlockedHost(hostname) {
  if (!hostname) return true;
  const h = hostname.toLowerCase();
  return (
    h === 'localhost' ||
    h.endsWith('.localhost') ||
    h.endsWith('.internal') ||
    h.endsWith('.local') ||
    (net.isIP(h) && isBlockedIp(h))   // literal IP in URL
  );
}

// Resolve hostname and ensure no resolved IP is private (anti-SSRF via DNS)
async function resolvesToPublicIp(hostname) {
  try {
    const records = await dns.lookup(hostname, { all: true });
    if (!records || records.length === 0) return false;
    return records.every(r => !isBlockedIp(r.address));
  } catch (e) {
    return false; // DNS failure -> treat as unsafe
  }
}

app.use(cors({ origin: 'https://YOUR_DOMAINE.EXT' }));

app.get('/ogproxy', async (req, res) => {
  let { url } = req.query;
  const requestApiKey = req.headers['x-api-key'];

  if (requestApiKey !== apiKey) {
    return res.status(401).send('Unauthorized');
  }
  if (!url || typeof url !== 'string') {
    return res.status(400).send('Missing URL parameter');
  }
  if (!url.startsWith('http')) {
    try {
      url = new URL(url, `${req.protocol}://${req.get('host')}`).href;
    } catch (e) {
      return res.status(400).send('Invalid URL');
    }
  }

  // Parse + protocol check
  let parsedUrl;
  try {
    parsedUrl = new URL(url);
  } catch (e) {
    console.warn(`OGProxy reject [${url}]: invalid URL`);
    return res.status(400).send('Invalid URL');
  }
  if (!['http:', 'https:'].includes(parsedUrl.protocol)) {
    console.warn(`OGProxy reject [${url}]: invalid protocol`);
    return res.status(400).send('Invalid protocol');
  }

  // Static host guard
  if (isBlockedHost(parsedUrl.hostname)) {
    console.warn(`OGProxy reject [${url}]: forbidden host (static guard)`);
    return res.status(403).send('Forbidden host');
  }

  // Cache hit (success OR negative) — checked before DNS to stay fast
  const cachedResult = cache.get(url);
  if (cachedResult) {
    if (cachedResult.__ogproxyFail === true) {
      return res.status(500).send('Error scraping Open Graph data (cached)');
    }
    return res.json(cachedResult);
  }

  // DNS-based SSRF guard: make sure the hostname doesn't resolve to a private IP
  if (!(await resolvesToPublicIp(parsedUrl.hostname))) {
    console.warn(`OGProxy reject [${url}]: resolves to private IP or DNS fail (SSRF guard)`);
    cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(403).send('Forbidden host');
  }

  // Enforce cache cap before inserting a new entry
  if (cache.keys().length >= CACHE_MAX_ENTRIES) {
    cache.clear();
  }

  // Manage the timeout ourselves with an AbortController we clean up explicitly.
  // This avoids the listener leak from ogs/undici's internal `timeout` option
  // (ogs 6.x + undici 5.x on Node 24 leaks an abort listener per timed-out request,
  // which slowly fills RAM/swap). clearTimeout() in finally detaches the listener.
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), REQUEST_TIMEOUT);

  const options = {
    url,
    downloadLimit: MAX_CONTENT_BYTES,
    fetchOptions: {
      signal: controller.signal,
      redirect: 'follow',
      follow: MAX_REDIRECTS,
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Accept-Language': 'fr-FR,fr;q=0.9,en;q=0.8',
      },
    },
  };

  try {
    const results = await ogs(options);
    cache.put(url, results, CACHE_TTL_MS);
    return res.json(results);
  } catch (error) {
    const reason =
      (error && error.result && error.result.error) ||
      (error && error.message) ||
      'unknown';
    const status =
      (error && error.response && error.response.status) || 'n/a';
    console.error(`OGProxy fail [${url}]: ${reason} (HTTP ${status})`);
    cache.put(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(500).send('Error scraping Open Graph data');
  } finally {
    // Always clear the timer — detaches the abort listener and stops the leak
    clearTimeout(timer);
  }
});

app.listen(port, () => {
  console.log(`OGProxy server listening on port ${port}`);
});

Possible upstream-clean alternative (optional)

Upgrading open-graph-scraper to its latest 6.x (which bundles a newer undici) may fix the listener cleanup at the source, letting you go back to the simpler built-in timeout option. Worth checking when convenient, but the AbortController approach above is robust regardless of the undici version, so there’s no rush.

DownPW

OGProxy : Other Memory Saturation Root Cause & Fix

OGProxy was periodically saturating server RAM and swap (up to ~4 GB of
arrayBuffers, swap fully consumed), causing multi-minute service degradation.
After tracing through several misleading leads, the root cause was identified:
OGProxy was downloading entire file-host link bodies into memory when trying
to generate previews.

On a file-sharing forum, links to file hosts (1fichier, etc.) are everywhere.
When OGProxy received a URL like https://1fichier.com/?xxxx, it attempted to
“preview” it, but that URL is a direct file download
(Content-Type: application/octet-stream, Content-Length: 20.6 GB). OGProxy
pulled the file into memory. Critically, neither open-graph-scraper’s
downloadLimit nor an AbortController stopped this, verified by
reproduction: arrayBuffers climbed ~120 MB/s past 4 GB while the abort
timeout was ignored.

Diagnostic path (for reference)

We instrumented the process with a /debug/mem endpoint exposing
process.memoryUsage() + cache size, plus a 30-second sampling trace. This let
us correlate memory spikes with nginx access logs. The trace showed
arrayBuffers jumping from 0 → 457 → 3669 MB in ~5 minutes, correlated via
nginx log to a single GET on a 1fichier link. The cache, EventEmitter
listeners, and image links were all ruled out as primary causes (cache stayed
at <30 entries during the spike; heapUsed stayed low; only arrayBuffers
leaked).

A representative slice of the trace at the moment of the spike:

11:24:39  arrayBuffers=0     rss=161
11:25:09  arrayBuffers=457   rss=427   <- jump in one 30s sample
11:25:39  arrayBuffers=884
11:26:09  arrayBuffers=1437
...
11:30:09  arrayBuffers=3669

No OGProxy fail log line appeared during the spike window, the offending
request neither failed nor completed; it was an in-progress, never-ending
download. The nginx access log for that minute pointed at the 1fichier GET.

Root cause

open-graph-scraper (ogs) performs its own internal fetch, and for these URLs:

The downloadLimit option does not reliably abort the body download on
streamed / chunked responses or on hosts that serve large
application/octet-stream payloads.
An AbortController passed via fetchOptions.signal does not propagate
to the underlying stream read in a way that stops the transfer in time.

Result: a single large file-host link could pull multiple GB into
arrayBuffers before anything intervened.

The fix: bounded streaming fetch

The structural problem is that ogs() controls the fetch and we don’t control
body consumption. The fix moves the fetch into our own code so we control every
byte read:

boundedFetch(url, maxBytes, timeoutMs) performs the HTTP fetch itself,
then:
- Re-checks the final host for SSRF after redirects.
- Rejects any non-text/html / application/xhtml Content-Type
  before reading the body (aborts immediately).
- Reads the body chunk-by-chunk via resp.body.getReader(), tracking total
  bytes, and hard-aborts at 5 MB regardless of what the server claims.
The retrieved HTML is then handed to ogs for parsing only: ogs({ html }).

This makes the protection structural rather than cooperative: no file host
can leak memory regardless of whether it honors HEAD, serves chunked, or
misreports headers.

Important ogs constraint

You must call ogs({ html }) alone. Passing { html, url } together
throws:

Must specify either `url` or `html`, not both

Because url is omitted, ogs cannot resolve relative og:image paths. This is
fine here: the ACP client already resolves relative image paths itself
(isFullPath() + host + imageUrl), so no client-side change was required.

Other hardening applied in the same pass

Cache: replaced memory-cache (which creates a per-entry setTimeout
that retains the cached object, a secondary leak) with a plain Map using
lazy expiry + a single sweep interval. Stored value is slimmed via
slimResult(): only error + result + HTML truncated at </head>
(preserves <title>, drops the multi-MB body and the undici response
object). Cap 300 entries, 30 min TTL, 10 min negative-cache TTL.
Negative cache: failed/rejected URLs are cached to prevent re-scrape
hammering from the client.
SSRF guards (three layers): static host/IP blocklist (private ranges,
loopback, link-local, CGNAT, IPv6 ULA/link-local), DNS resolution check, and
post-redirect re-validation of the final host. (Also backed at the OS level
by systemd IPAddressDeny on the unit.)
AbortController + clearTimeout in finally to stop the earlier
MaxListenersExceededWarning listener leak on timed-out requests.
nginx rate limit: limit_req_zone (10 r/s, burst 50, nodelay, returns
429) on the /ogproxy location. The API key is necessarily exposed
client-side (it ships in the ACP JS), so it provides no real protection on its
own; the rate limit is the actual abuse mitigation.
systemd guard rail: MemoryMax=512M / MemoryHigh=400M so OGProxy can
never take the whole box down again, this was the silent hero that kept the
server alive throughout diagnosis.

Validation

Test URL	Expected	Result
`https://1fichier.com/?xxxx` (20.6 GB)	reject, no body read	415, `arrayBuffers` stays 0
Direct image (`pbs.twimg.com/...jpg`)	reject on content-type	415
`https://github.com`	full preview	200, OG title/image/description, HTML truncated at `</head>`

Process idles at ~100 MB RSS; under load heapUsed oscillates and returns to
baseline (no step-up accumulation).

Reproduction of the bounded fetch against the 20.6 GB link, confirming zero
body is pulled:

arrayBuffers AVANT: 0 MB
  pendant: 0 MB
Resultat 1fichier: REJETE: non-HTML content-type: application/octet-stream
arrayBuffers APRES: 0 MB

Note on dependencies

Reproduced on open-graph-scraper 6.1.0 / undici 5.22.1 / Node 24. The
unreliable downloadLimit behavior may be version-specific; a newer undici
might handle aborts on large streams better. The bounded-fetch approach is
robust regardless of the underlying library version, so it is the recommended
long-term fix.

Appendix A: Full `server.js`

const express = require('express');
const ogs = require('open-graph-scraper');
const cors = require('cors');
const { URL } = require('url');
const dns = require('dns').promises;
const net = require('net');

require('events').EventEmitter.defaultMaxListeners = 50;

const app = express();
const port = 2000;

const apiKey = process.env.OGPROXY_API_KEY || '<API_KEY>';

const REQUEST_TIMEOUT = 12000;
const MAX_CONTENT_BYTES = 5 * 1024 * 1024;    // 5 MB hard cap on body
const CACHE_TTL_MS = 30 * 60 * 1000;
const FAIL_CACHE_TTL_MS = 10 * 60 * 1000;
const CACHE_MAX_ENTRIES = 300;
const MAX_REDIRECTS = 3;

// --- Map cache (lazy expiry, no per-entry timers) ---
const cacheStore = new Map();
function cacheGet(key) {
  const e = cacheStore.get(key);
  if (!e) return null;
  if (Date.now() > e.expires) { cacheStore.delete(key); return null; }
  return e.value;
}
function cacheSet(key, value, ttl) {
  if (cacheStore.size >= CACHE_MAX_ENTRIES) {
    cacheStore.delete(cacheStore.keys().next().value);
  }
  cacheStore.set(key, { value, expires: Date.now() + ttl });
}
setInterval(() => {
  const now = Date.now();
  for (const [k, e] of cacheStore) if (now > e.expires) cacheStore.delete(k);
}, 60 * 1000).unref();

function slimResult(results) {
  if (!results || typeof results !== 'object') return results;
  let slimHtml = '';
  if (typeof results.html === 'string') {
    const headEnd = results.html.search(/<\/head>/i);
    slimHtml = headEnd !== -1 ? results.html.slice(0, headEnd + 7) : results.html.slice(0, 8192);
  }
  return { error: results.error, result: results.result, html: slimHtml };
}

function isBlockedIp(ip) {
  if (!ip) return true;
  if (net.isIPv4(ip)) {
    const p = ip.split('.').map(Number);
    if (p[0] === 10) return true;
    if (p[0] === 127) return true;
    if (p[0] === 0) return true;
    if (p[0] === 169 && p[1] === 254) return true;
    if (p[0] === 192 && p[1] === 168) return true;
    if (p[0] === 172 && p[1] >= 16 && p[1] <= 31) return true;
    if (p[0] === 100 && p[1] >= 64 && p[1] <= 127) return true;
    return false;
  }
  if (net.isIPv6(ip)) {
    const v = ip.toLowerCase();
    if (v === '::1') return true;
    if (v.startsWith('fc') || v.startsWith('fd')) return true;
    if (v.startsWith('fe80')) return true;
    if (v.startsWith('::ffff:')) return isBlockedIp(v.split(':').pop());
    return false;
  }
  return true;
}

function isBlockedHost(hostname) {
  if (!hostname) return true;
  const h = hostname.toLowerCase();
  return (
    h === 'localhost' || h.endsWith('.localhost') ||
    h.endsWith('.internal') || h.endsWith('.local') ||
    (net.isIP(h) && isBlockedIp(h))
  );
}

async function resolvesToPublicIp(hostname) {
  try {
    const records = await dns.lookup(hostname, { all: true });
    if (!records || records.length === 0) return false;
    return records.every(r => !isBlockedIp(r.address));
  } catch (e) {
    return false;
  }
}

// Bounded streaming fetch: reads the body chunk by chunk and aborts hard at maxBytes.
// Rejects non-HTML content-types before reading any body. Structural protection
// against file hosts (1fichier, etc.) - independent of what the server claims.
async function boundedFetch(url, maxBytes, timeoutMs) {
  const controller = new AbortController();
  const timer = setTimeout(() => controller.abort(), timeoutMs);
  try {
    const resp = await fetch(url, {
      redirect: 'follow',
      signal: controller.signal,
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Accept-Language': 'fr-FR,fr;q=0.9,en;q=0.8',
      },
    });

    // Re-check final host after redirects (anti-SSRF)
    try {
      const finalHost = new URL(resp.url || url).hostname;
      if (isBlockedHost(finalHost) || !(await resolvesToPublicIp(finalHost))) {
        controller.abort();
        return { ok: false, reason: 'redirect to forbidden host' };
      }
    } catch (e) { /* ignore */ }

    const ctype = (resp.headers.get('content-type') || '').toLowerCase();
    if (ctype && !ctype.includes('text/html') && !ctype.includes('application/xhtml')) {
      controller.abort(); // not HTML: read nothing
      return { ok: false, reason: `non-HTML content-type: ${ctype.split(';')[0]}` };
    }

    if (!resp.body) {
      return { ok: false, reason: 'no response body' };
    }

    const reader = resp.body.getReader();
    const chunks = [];
    let total = 0;
    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      total += value.length;
      if (total > maxBytes) {
        controller.abort(); // hard cap reached: stop downloading
        return { ok: false, reason: `body exceeded ${maxBytes} bytes` };
      }
      chunks.push(value);
    }
    const html = Buffer.concat(chunks).toString('utf8');
    return { ok: true, html };
  } catch (e) {
    return { ok: false, reason: (e && e.name === 'AbortError') ? 'timeout/abort' : (e && e.message) || 'fetch error' };
  } finally {
    clearTimeout(timer);
  }
}

app.use(cors({ origin: 'https://YOUR_DOMAIN.EXT' }));

app.get('/debug/mem', (req, res) => {
  const m = process.memoryUsage();
  res.json({
    rss_mb: Math.round(m.rss / 1048576),
    heapUsed_mb: Math.round(m.heapUsed / 1048576),
    external_mb: Math.round(m.external / 1048576),
    arrayBuffers_mb: Math.round(m.arrayBuffers / 1048576),
    cache_entries: cacheStore.size,
  });
});

app.get('/ogproxy', async (req, res) => {
  let { url } = req.query;
  const requestApiKey = req.headers['x-api-key'];

  if (requestApiKey !== apiKey) return res.status(401).send('Unauthorized');
  if (!url || typeof url !== 'string') return res.status(400).send('Missing URL parameter');
  if (!url.startsWith('http')) {
    try { url = new URL(url, `${req.protocol}://${req.get('host')}`).href; }
    catch (e) { return res.status(400).send('Invalid URL'); }
  }

  let parsedUrl;
  try { parsedUrl = new URL(url); }
  catch (e) { console.warn(`OGProxy reject [${url}]: invalid URL`); return res.status(400).send('Invalid URL'); }

  if (!['http:', 'https:'].includes(parsedUrl.protocol)) {
    return res.status(400).send('Invalid protocol');
  }
  if (isBlockedHost(parsedUrl.hostname)) {
    console.warn(`OGProxy reject [${url}]: forbidden host (static guard)`);
    return res.status(403).send('Forbidden host');
  }

  const cached = cacheGet(url);
  if (cached) {
    if (cached.__ogproxyFail === true) return res.status(500).send('Error scraping Open Graph data (cached)');
    return res.json(cached);
  }

  if (!(await resolvesToPublicIp(parsedUrl.hostname))) {
    console.warn(`OGProxy reject [${url}]: resolves to private IP / DNS fail (SSRF)`);
    cacheSet(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(403).send('Forbidden host');
  }

  if (cacheStore.size >= CACHE_MAX_ENTRIES) {
    cacheStore.delete(cacheStore.keys().next().value);
  }

  // Bounded fetch: download the body ourselves, capped at 5 MB, HTML-only.
  const fetched = await boundedFetch(url, MAX_CONTENT_BYTES, REQUEST_TIMEOUT);
  if (!fetched.ok) {
    console.error(`OGProxy reject [${url}]: ${fetched.reason}`);
    cacheSet(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    const code = (fetched.reason.startsWith('non-HTML') || fetched.reason.startsWith('body exceeded')) ? 415 : 500;
    return res.status(code).send('Unable to preview this URL');
  }

  try {
    // Parse the already-fetched HTML (no second fetch). Client resolves relative image paths itself.
    const results = await ogs({ html: fetched.html });
    const slim = slimResult(results);
    cacheSet(url, slim, CACHE_TTL_MS);
    return res.json(slim);
  } catch (error) {
    const reason = (error && error.result && error.result.error) || (error && error.message) || 'unknown';
    console.error(`OGProxy fail [${url}]: ${reason}`);
    cacheSet(url, { __ogproxyFail: true }, FAIL_CACHE_TTL_MS);
    return res.status(500).send('Error scraping Open Graph data');
  }
});

app.listen(port, () => {
  console.log(`OGProxy server listening on port ${port}`);
});

Note: /debug/mem is a temporary diagnostic endpoint. Remove it once the
deployment is confirmed stable in production.

Appendix B: nginx rate limit

Zone definition, placed in /etc/nginx/conf.d/ogproxy-ratelimit.conf (included
at the http level; survives vhost regeneration by the panel):

# Rate limit zone for OGProxy - 10 MB shared memory (~160k IPs tracked)
# 10 requests/second sustained per IP
limit_req_zone $binary_remote_addr zone=ogproxy_limit:10m rate=10r/s;

Application, inside the reverse-proxy location / of the OGProxy vhost:

location / {
    limit_req zone=ogproxy_limit burst=50 nodelay;
    limit_req_status 429;

    proxy_set_header Host $host;
    proxy_pass http://127.0.0.1:2000;
    proxy_redirect off;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Api-Key $http_x_api_key;
}

burst=50 absorbs the legitimate burst when a user opens a link-heavy topic
(the client fires many preview requests at once); sustained hammering beyond
that is rejected with 429.

Appendix C : systemd unit guard rails

Key directives on ogproxy.service:

[Service]
MemoryHigh=400M
MemoryMax=512M
Restart=always
RestartSec=3

# SSRF egress guard (OS-level backstop to the in-app checks)
IPAddressAllow=127.0.0.1 127.0.0.53 127.0.0.54
IPAddressDeny=10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 169.254.0.0/16 100.64.0.0/10 fc00::/7 fe80::/10

127.0.0.1 must stay allowed because nginx reverse-proxies to OGProxy over
loopback; blocking all loopback breaks the nginx -> ogproxy hop (504s).

OGProxy - a replacement for iFramely

Root causes of the 2 GB RAM usage

1. Unbounded cache (memory-cache)

2. Redundant fetches per request

3. Unbounded favicon buffering

4. Axios with no timeout or size limit

5. No compression

Root causes of the 2 GB RAM usage

1. Unbounded cache (memory-cache)

2. Redundant fetches per request

3. Unbounded favicon buffering

4. Axios with no timeout or size limit

5. No compression

Root causes

Fixes applied

Result

Client server contract (unchanged, do not modify ACP response handling)

Expected remaining failures (not bugs)

Optional follow-ups (no urgency)

Files changed

server.js

NodeBB ACP client script

systemd unit (/etc/systemd/system/ogproxy.service)

OGProxy : follow-up: second memory leak found & fixed

Context

Root cause

Fix

Note on RuntimeMaxSec

Only server.js changed (client ACP + systemd unit unchanged)

Possible upstream-clean alternative (optional)

OGProxy : Other Memory Saturation Root Cause & Fix

Diagnostic path (for reference)

Root cause

The fix: bounded streaming fetch

Important ogs constraint

Other hardening applied in the same pass

Validation

Note on dependencies

Appendix A: Full server.js

Appendix B: nginx rate limit

Appendix C : systemd unit guard rails

Related Topics

Individual Categories

1. Unbounded cache (`memory-cache`)

1. Unbounded cache (`memory-cache`)

`server.js`

systemd unit (`/etc/systemd/system/ogproxy.service`)

Only `server.js` changed (client ACP + systemd unit unchanged)

Appendix A: Full `server.js`