Skip to content

Export posts out of NodeBB into HTML and Markdown flat files -> Halo ITSM

Guides
Posts 1 Posters 1 Views 107 Watching 1
  • At work, we are transitioning from NodeBB for our Knowledge Base to Halo ITSM, which we require for SOC2 compliance amongst other things. Because I had 165 articles in NodeBB I didn’t want to have to re-type, or even copy and paste, I decided to write a Python script to walk the target category and create a file for each.

    Here’s the script to complete that. There are a number of prerequisities here, which I’ve identified below

    import os
    import re
    import time
    import requests
    import html2text
    from datetime import datetime
    # --- CONFIGURATION ---
    # Your Forum URL goes here
    BASE_URL = "https:/yourforum.com"
    #The category ID you want to target goes here
    CATEGORY_ID = 3
    # In my case, I needed to define a new "home" for the exported files under `/public/uploads` as this contained all the images I needed to embed into the new flat files. Therefore, ASSET_DOMAIN is nothing more than a basic website where I can grab the images from afterwards.
    ASSET_DOMAIN = "https://assetlocation.com"
    # The below directories are created at the same level as the script. If they do not exist, you need to create them. They will contain both `HTML` and `markdown` copies of the posts.
    HTML_DIR = "nodebb_export_html"
    MD_DIR = "nodebb_export_markdown"
    os.makedirs(HTML_DIR, exist_ok=True)
    os.makedirs(MD_DIR, exist_ok=True)
    h = html2text.HTML2Text()
    h.ignore_links = False
    h.body_width = 0
    page = 1
    total_exported = 0
    print(f"🔄 Starting export for category {CATEGORY_ID} from {BASE_URL}")
    while True:
    print(f"📄 Fetching page {page}...")
    url = f"{BASE_URL}/api/category/{CATEGORY_ID}?page={page}"
    res = requests.get(url, timeout=10)
    if res.status_code != 200:
    print(f"❌ Failed to fetch page {page}: {res.status_code}")
    break
    data = res.json()
    topics = data.get("topics", [])
    if not topics:
    print("✅ No more topics found. Export complete.")
    break
    for topic in topics:
    tid = topic['tid']
    title = topic['title']
    print(f"→ Exporting topic {tid}: {title}")
    topic_url = f"{BASE_URL}/api/topic/{tid}"
    topic_res = requests.get(topic_url, timeout=10)
    if topic_res.status_code != 200:
    print(f"⚠️ Failed to fetch topic {tid}")
    continue
    topic_data = topic_res.json()
    posts = topic_data.get("posts", [])
    tags = topic_data.get("topic", {}).get("tags", [])
    tag_list = ", ".join(tags) if tags else ""
    safe_title = title.replace(' ', '_').replace('/', '-')
    html_file = f"{HTML_DIR}/{tid}-{safe_title}.html"
    md_file = f"{MD_DIR}/{tid}-{safe_title}.md"
    # --- HTML EXPORT ---
    with open(html_file, "w", encoding="utf-8") as f_html:
    f_html.write(f"<html><head><title>{title}</title></head><body>\n")
    f_html.write(f"<h1>{title}</h1>\n")
    if tag_list:
    f_html.write(f"<p><strong>Tags:</strong> {tag_list}</p>\n")
    for post in posts:
    username = post['user']['username']
    content_html = post['content']
    timestamp = datetime.utcfromtimestamp(post['timestamp'] / 1000).strftime('%Y-%m-%d %H:%M:%S UTC')
    pid = post['pid']
    # Rewrite asset paths in HTML
    content_html = re.sub(
    r'src=["\'](/assets/uploads/files/.*?)["\']',
    rf'src="{ASSET_DOMAIN}\1"',
    content_html
    )
    content_html = re.sub(
    r'href=["\'](/assets/uploads/files/.*?)["\']',
    rf'href="{ASSET_DOMAIN}\1"',
    content_html
    )
    f_html.write(f"<div class='post'>\n")
    f_html.write(f"<h3><strong>Original Author: {username}</strong></h3>\n")
    f_html.write(f"<p><em>Posted on: {timestamp} &nbsp;|&nbsp; Post ID: {pid}</em></p>\n")
    f_html.write(f"{content_html}\n")
    f_html.write("<hr/>\n</div>\n")
    f_html.write("</body></html>\n")
    # --- MARKDOWN EXPORT ---
    with open(md_file, "w", encoding="utf-8") as f_md:
    # Metadata block
    f_md.write(f"<!-- FAQLists: Knowledge Base -->\n")
    if tag_list:
    f_md.write(f"<!-- Tags: {tag_list} -->\n")
    f_md.write("\n")
    f_md.write(f"# {title}\n\n")
    for post in posts:
    username = post['user']['username']
    content_html = post['content']
    timestamp = datetime.utcfromtimestamp(post['timestamp'] / 1000).strftime('%Y-%m-%d %H:%M:%S UTC')
    pid = post['pid']
    # Convert HTML to Markdown
    content_md = h.handle(content_html).strip()
    # Rewrite asset paths
    content_md = re.sub(
    r'(!\[.*?\])\((/assets/uploads/files/.*?)\)',
    rf'\1({ASSET_DOMAIN}\2)',
    content_md
    )
    content_md = re.sub(
    r'(\[.*?\])\((/assets/uploads/files/.*?)\)',
    rf'\1({ASSET_DOMAIN}\2)',
    content_md
    )
    f_md.write(f"**Original Author: {username}**\n\n")
    f_md.write(f"_Posted on: {timestamp} | Post ID: {pid}_\n\n")
    f_md.write(f"{content_md}\n\n---\n\n")
    total_exported += 1
    print(f"✔ Saved: {html_file} & {md_file}")
    page += 1
    time.sleep(1)
    print(f"\n🎉 Done! Exported {total_exported} topics to '{HTML_DIR}' and '{MD_DIR}'")

    Run the script using python scriptname.py.

    If the script fails, it’s likely because you do not have the required modules installed in Python

    import os
    import re
    import time
    import requests
    import html2text

    In this case, you’d need to install them using (for example) pip install html2text

    To get them into an Excel file where they can all be bulk imported, we’d then use something like the below script

    import os
    import re
    import pandas as pd
    from datetime import datetime
    import markdown
    # --- CONFIGURATION ---
    export_dir = "nodebb_export_markdown"
    output_file = "Halo_KB_Import_HTML.xlsx"
    # This value can be whatever suits your needs
    created_by = "Import"
    today = datetime.today().strftime('%Y-%m-%d')
    # --- BUILD DATAFRAME FOR HALO ---
    import_rows = []
    for filename in sorted(os.listdir(export_dir)):
    if filename.endswith(".md"):
    filepath = os.path.join(export_dir, filename)
    with open(filepath, "r", encoding="utf-8") as f:
    lines = f.readlines()
    # Default values
    # Change "Knowledge Base" to reflect what you are using in Halo
    faqlists = "Knowledge Base"
    tags = ""
    # Parse metadata comments from top of file
    metadata_lines = []
    while lines and lines[0].startswith("<!--"):
    metadata_lines.append(lines.pop(0).strip())
    for line in metadata_lines:
    faq_match = re.match(r"<!-- FAQLists:\s*(.*?)\s*-->", line)
    tag_match = re.match(r"<!-- Tags:\s*(.*?)\s*-->", line)
    if faq_match:
    faqlists = faq_match.group(1)
    if tag_match:
    tags = tag_match.group(1)
    markdown_content = ''.join(lines)
    html_content = markdown.markdown(markdown_content)
    # Extract summary from filename
    summary = filename.split('-', 1)[1].rsplit('.md', 1)[0].replace('_', ' ')
    import_rows.append({
    "Summary": summary,
    "Details": html_content,
    "Resolution": "",
    "DateAdded": today,
    "CreatedBy": created_by,
    "FAQLists": faqlists,
    "Tags": tags
    })
    # --- EXPORT TO EXCEL ---
    df = pd.DataFrame(import_rows)
    df.to_excel(output_file, index=False)
    print(f"✅ Done! Halo HTML import file created: {output_file}")

    This then generates a file called Halo_KB_Import_HTML.xlsx which you can then use to import each exported post into Halo.

    Cool eh? Huge time saver 🙂



1/1

9 Apr 2025, 16:15


Threaded Replies

Related Topics
  • 3 Votes
    5 Posts
    447 Views
    @crazycells Agreed. It takes a more sensible approach. Nobody ever upvotes the first post - it’s usually much further down as the conversation progresses.
  • Nodebb design

    Solved General nodebb 11 Jul 2023, 10:13
    1 Votes
    2 Posts
    392 Views
    @Panda said in Nodebb design: One negative is not being so good for SEO as more Server side rendered forums, if web crawlers dont run the JS to read the forum. From recollection, Google and Bing have the capability to read and process JS, although it’s not in the same manner as a physical person will consume content on a page. It will be seen as plain text, but will be indexed. However, it’s important to note that Yandex and Baidu will not render JS, although seeing as Google has a 90% share of the content available on the web in terms of indexing, this isn’t something you’ll likely lose sleep over. @Panda said in Nodebb design: The “write api” is preferred for server-to-server interactions. This is mostly based around overall security - you won’t typically want a client machine changing database elements or altering data. This is why you have “client-side” which could be DOM manipulation etc, and “server-side” which performs more complex operations as it can communicate directly with the database whereas the client cannot (and if it can, then you have a serious security flaw). Reading from the API is perfectly acceptable on the client-side, but not being able to write. A paradigm here would be something like SNMP. This protocol exists as a UDP (UDP is very efficient, as it is “fire and forget” and does not wait for a response like TCP does) based service which reads performance data from a remote source, thus enabling an application to parse that data for use in a monitoring application. In all cases, SNMP access should be “RO” (Read Only) and not RW (Read Write). It is completely feasible to assume complete control over a firewall for example by having RW access to SNMP and then exposing it to the entire internet with a weak passphrase. You wouldn’t do it (at least, I hope you wouldn’t) and the same ethic applies to server-side rendering and the execution of commands.
  • 14 Votes
    69 Posts
    15k Views
    @phenomlab Seems to be better with some scaling fix for redis on redis.conf. I haven’t seen the message yet since the changes I made # I increase it to the value of /proc/sys/net/core/somaxconn tcp-backlog 4096 # I'm uncommenting because it can slow down Redis. Uncommented by default !!!!!!!!!!!!!!!!!!! #save 900 1 #save 300 10 #save 60 10000 If you have other Redis optimizations. I take all your advice https://severalnines.com/blog/performance-tuning-redis/
  • 13 Votes
    22 Posts
    3k Views
    Been playing with the user profile page this afternoon. Thought I’d post a video as I’m really pleased with how this came out profile-screen-capture.webm
  • 0 Votes
    13 Posts
    1k Views
    @cagatay That matches what I see [image: 1667218162107-4f0f858d-9812-42b1-9f61-ffb13d31dccd-image.png]
  • 3 Votes
    2 Posts
    315 Views
    @cagatay JS will work fine - no changes there, and there are no plans to drop support for jQuery. More of an issue is the CSS - for which there are quite a few breaking changes. Keep an eye on sudonix.dev (my development site) where you can see progress in relation to how I am tackling the compatibility issues.
  • 1 Votes
    2 Posts
    805 Views
    @eveh Welcome board The code you are referring to is custom written as no such functionality exists under NodeBB. However, adding the functionality is relatively trivial. Below are the required steps Navigate to /admin/appearance/customise#custom-header Add the below code to your header, and save once completed <ol id="mainbanner" class="breadcrumb"><li id="addtext">Your Title Goes Here</li></ol> Navigate to /admin/appearance/customise#custom-js and add the below code, then save $(document).ready(function() { $(window).on('action:ajaxify.end', function(data) { // Initialise mainbanner ID, but hide it from view $('#mainbanner').hide(); var pathname = window.location.pathname; if (pathname === "/") { $("#addtext").text("Your Title"); $('#mainbanner').show(); } else {} // If we want to add a title to a sub page, uncomment the below and adjust accordingly //if (pathname === "/yourpath") { //$("#addtext").text("Your Title"); //$('#mainbanner').show(); //} }); }); Navigate to /admin/appearance/customise#custom-css and add the below CSS block .breadcrumb { right: 0; margin-right: auto; text-align: center; background: #0086c4; color: #ffffff; width: 100vw; position: relative; margin-left: -50vw; left: 50%; top: 50px; position: fixed; z-index: 1020; } Note, that you will need to adjust your CSS code to suit your own site / requirements.
  • 8 Votes
    20 Posts
    2k Views
    @pobojmoks Do you see any errors being reported in the console ? At first guess (without seeing the actual code or the site itself), I’d say that this is AJAX callback related