Skip to content

Discussion

41 Topics 391 Posts

Topics posted here are designed to attract discussion, thoughts, concerns, wisdom, etc.

Subcategories


  • Do cool things on your website…

    4 71
    4 Topics
    71 Posts
    At work, we are transitioning from NodeBB for our Knowledge Base to Halo ITSM, which we require for SOC2 compliance amongst other things. Because I had 165 articles in NodeBB I didn’t want to have to re-type, or even copy and paste, I decided to write a Python script to walk the target category and create a file for each. Here’s the script to complete that. There are a number of prerequisities here, which I’ve identified below import os import re import time import requests import html2text from datetime import datetime # --- CONFIGURATION --- # Your Forum URL goes here BASE_URL = "https:/yourforum.com" #The category ID you want to target goes here CATEGORY_ID = 3 # In my case, I needed to define a new "home" for the exported files under `/public/uploads` as this contained all the images I needed to embed into the new flat files. Therefore, ASSET_DOMAIN is nothing more than a basic website where I can grab the images from afterwards. ASSET_DOMAIN = "https://assetlocation.com" # The below directories are created at the same level as the script. If they do not exist, you need to create them. They will contain both `HTML` and `markdown` copies of the posts. HTML_DIR = "nodebb_export_html" MD_DIR = "nodebb_export_markdown" os.makedirs(HTML_DIR, exist_ok=True) os.makedirs(MD_DIR, exist_ok=True) h = html2text.HTML2Text() h.ignore_links = False h.body_width = 0 page = 1 total_exported = 0 print(f"🔄 Starting export for category {CATEGORY_ID} from {BASE_URL}") while True: print(f"📄 Fetching page {page}...") url = f"{BASE_URL}/api/category/{CATEGORY_ID}?page={page}" res = requests.get(url, timeout=10) if res.status_code != 200: print(f"❌ Failed to fetch page {page}: {res.status_code}") break data = res.json() topics = data.get("topics", []) if not topics: print("✅ No more topics found. Export complete.") break for topic in topics: tid = topic['tid'] title = topic['title'] print(f"→ Exporting topic {tid}: {title}") topic_url = f"{BASE_URL}/api/topic/{tid}" topic_res = requests.get(topic_url, timeout=10) if topic_res.status_code != 200: print(f"⚠️ Failed to fetch topic {tid}") continue topic_data = topic_res.json() posts = topic_data.get("posts", []) tags = topic_data.get("topic", {}).get("tags", []) tag_list = ", ".join(tags) if tags else "" safe_title = title.replace(' ', '_').replace('/', '-') html_file = f"{HTML_DIR}/{tid}-{safe_title}.html" md_file = f"{MD_DIR}/{tid}-{safe_title}.md" # --- HTML EXPORT --- with open(html_file, "w", encoding="utf-8") as f_html: f_html.write(f"<html><head><title>{title}</title></head><body>\n") f_html.write(f"<h1>{title}</h1>\n") if tag_list: f_html.write(f"<p><strong>Tags:</strong> {tag_list}</p>\n") for post in posts: username = post['user']['username'] content_html = post['content'] timestamp = datetime.utcfromtimestamp(post['timestamp'] / 1000).strftime('%Y-%m-%d %H:%M:%S UTC') pid = post['pid'] # Rewrite asset paths in HTML content_html = re.sub( r'src=["\'](/assets/uploads/files/.*?)["\']', rf'src="{ASSET_DOMAIN}\1"', content_html ) content_html = re.sub( r'href=["\'](/assets/uploads/files/.*?)["\']', rf'href="{ASSET_DOMAIN}\1"', content_html ) f_html.write(f"<div class='post'>\n") f_html.write(f"<h3><strong>Original Author: {username}</strong></h3>\n") f_html.write(f"<p><em>Posted on: {timestamp} &nbsp;|&nbsp; Post ID: {pid}</em></p>\n") f_html.write(f"{content_html}\n") f_html.write("<hr/>\n</div>\n") f_html.write("</body></html>\n") # --- MARKDOWN EXPORT --- with open(md_file, "w", encoding="utf-8") as f_md: # Metadata block f_md.write(f"<!-- FAQLists: Knowledge Base -->\n") if tag_list: f_md.write(f"<!-- Tags: {tag_list} -->\n") f_md.write("\n") f_md.write(f"# {title}\n\n") for post in posts: username = post['user']['username'] content_html = post['content'] timestamp = datetime.utcfromtimestamp(post['timestamp'] / 1000).strftime('%Y-%m-%d %H:%M:%S UTC') pid = post['pid'] # Convert HTML to Markdown content_md = h.handle(content_html).strip() # Rewrite asset paths content_md = re.sub( r'(!\[.*?\])\((/assets/uploads/files/.*?)\)', rf'\1({ASSET_DOMAIN}\2)', content_md ) content_md = re.sub( r'(\[.*?\])\((/assets/uploads/files/.*?)\)', rf'\1({ASSET_DOMAIN}\2)', content_md ) f_md.write(f"**Original Author: {username}**\n\n") f_md.write(f"_Posted on: {timestamp} | Post ID: {pid}_\n\n") f_md.write(f"{content_md}\n\n---\n\n") total_exported += 1 print(f"✔ Saved: {html_file} & {md_file}") page += 1 time.sleep(1) print(f"\n🎉 Done! Exported {total_exported} topics to '{HTML_DIR}' and '{MD_DIR}'") Run the script using python scriptname.py. If the script fails, it’s likely because you do not have the required modules installed in Python import os import re import time import requests import html2text In this case, you’d need to install them using (for example) pip install html2text To get them into an Excel file where they can all be bulk imported, we’d then use something like the below script import os import re import pandas as pd from datetime import datetime import markdown # --- CONFIGURATION --- export_dir = "nodebb_export_markdown" output_file = "Halo_KB_Import_HTML.xlsx" # This value can be whatever suits your needs created_by = "Import" today = datetime.today().strftime('%Y-%m-%d') # --- BUILD DATAFRAME FOR HALO --- import_rows = [] for filename in sorted(os.listdir(export_dir)): if filename.endswith(".md"): filepath = os.path.join(export_dir, filename) with open(filepath, "r", encoding="utf-8") as f: lines = f.readlines() # Default values # Change "Knowledge Base" to reflect what you are using in Halo faqlists = "Knowledge Base" tags = "" # Parse metadata comments from top of file metadata_lines = [] while lines and lines[0].startswith("<!--"): metadata_lines.append(lines.pop(0).strip()) for line in metadata_lines: faq_match = re.match(r"<!-- FAQLists:\s*(.*?)\s*-->", line) tag_match = re.match(r"<!-- Tags:\s*(.*?)\s*-->", line) if faq_match: faqlists = faq_match.group(1) if tag_match: tags = tag_match.group(1) markdown_content = ''.join(lines) html_content = markdown.markdown(markdown_content) # Extract summary from filename summary = filename.split('-', 1)[1].rsplit('.md', 1)[0].replace('_', ' ') import_rows.append({ "Summary": summary, "Details": html_content, "Resolution": "", "DateAdded": today, "CreatedBy": created_by, "FAQLists": faqlists, "Tags": tags }) # --- EXPORT TO EXCEL --- df = pd.DataFrame(import_rows) df.to_excel(output_file, index=False) print(f"✅ Done! Halo HTML import file created: {output_file}") This then generates a file called Halo_KB_Import_HTML.xlsx which you can then use to import each exported post into Halo. Cool eh? Huge time saver
  • PHP is dead? No, it isn't!!

    Pinned php
    3
    1
    3 Votes
    3 Posts
    959 Views
    @Madchatthew I can’t see it happening either to be honest. It’s one of the most mature programming languages there is, and given the figures, it’s clear to see is not going anywhere anytime soon.
  • 1 Votes
    2 Posts
    10 Views
    @phenomlab this is good. I see this happen a lot. It is unclear because usually it is a group talking and going over things and trying to plan according to the knowledge that they have with what is working and not working now. And yet some time down the road, something new comes out, something changes and now what was possible isn’t and vise versa. I have seen these kinds of situations, and I encourage taking very good notes and making them so they are always accessible to you/your team so that way a search can be done and the notes appear and you know exactly why, what, how and where the decision was made. When I say you, I mean anyone in a position to make decisions that bring about change in whatever area it might be.
  • Apple and Meta fined by EU over competition law

    meta apple
    5
    5 Votes
    5 Posts
    950 Views
    and BOUM Personally, I don’t hate American companies. I use their products like everyone else, but I think their economic weight is such that they impose their own rules instead of respecting those of the countries where they do business. And here, for once, the DMA is putting the church back in the middle of the village (French Expression).
  • The plans to put data centres in orbit and on the Moon

    space data servers
    1
    0 Votes
    1 Posts
    402 Views
    No one has replied
  • 50 years of Microsoft

    microsoft 50years
    7
    4 Votes
    7 Posts
    972 Views
    @phenomlab said in 50 years of Microsoft: NetBUI Netbeeuuuui in french Happy birtdead
  • Ex GCHQ employee risk to national security

    gchq security
    4
    1 Votes
    4 Posts
    747 Views
    @phenomlab said in Ex GCHQ employee risk to national security: I can’t believe also that security is so lax that someone without adequate clearance can waltz into a restricted area and take what they want. Yeah I can’t believe that either. It is crazy
  • Which email client do you use?

    email client
    56
    22 Votes
    56 Posts
    13k Views
    @phenomlab said in Which email client do you use?: @DownPW Isn’t this more of an email server in it’s own right rather than an email client? Oh yes sorry… I didn’t pay attention
  • Microsoft in talks to buy TikTok

    microsoft tiktok
    2
    3 Votes
    2 Posts
    562 Views
    @phenomlab well I hope that a better company steps up and puts in a higher bid. If I had the money I would buy TikTok. That platform is a money makers dream. So many people on it now er or was. I think MS will just mess it up like they do everything else. Hell, they can’t even get their own software to work correctly, how would they even keep that one up and running.
  • Ross Ulbricht pardoned by Trump

    ulbricht silkroad trump
    3
    0 Votes
    3 Posts
    585 Views
    @Panda said in Ross Ulbricht pardoned by Trump: So @phenomlab are you arguing that a ‘Double life +40’ sentence is what you would have been in support of, i.e. no release at any stage?? Yes, exactly. Let’s not forget the reason for the sentence in the first place, plus the fact that he created Silk Road with the intent for it to be used for nefarious purposes, and stood to make a lot of money from it. He fully intended to take advantage of the profit being returned at the expense of those people who died, and couldn’t care less about the demise of others as long as he was able to make money.
  • Australia passes social media ban for under 16s

    social
    11
    11 Votes
    11 Posts
    1k Views
    @phenomlab I agree with you, otherwise they would have already done that.
  • Note Taking App

    notetaking opensource
    13
    4 Votes
    13 Posts
    2k Views
    @Madchatthew A simple PWA would probably suffice in the meantime
  • Best Search Engine?

    google duckduckgo startpage
    13
    1 Votes
    13 Posts
    2k Views
    @phenomlab and that is the exact moment you double click on the browser icon and start typing away what you want to search for haha
  • What's your view on RSS - is it dead technology?

    rss feed syndicate
    14
    15 Votes
    14 Posts
    2k Views
    @phenomlab said in What's your view on RSS - is it dead technology?: @JAC would be keen to get your views around RSS feeds I’ve used RSS feeds over the years to pull in articles on forums and websites. I also have an RSS feed app on my phone that contains selective football news, it’s still an incredibly handful tool for me.
  • Reddit users say share plans 'beginning of the end'

    reddit ipo
    9
    8 Votes
    9 Posts
    1k Views
    This is interesting - $116m bet on share positions? https://news.sky.com/story/gamestop-stock-resurgent-as-influencer-roaring-kitty-places-116m-bet-on-retailer-13147356 Anyone else think this sounds very much like insider trading?
  • Google looks to AI paywall option

    google intelligence
    1
    1 Votes
    1 Posts
    608 Views
    No one has replied
  • 4 Votes
    7 Posts
    2k Views
    @phenomlab oh no, that is 1 cent on the video, but you are right, symbols are similar… I just converted it to $1 , since it is more intuitive in daily life…
  • The corporate greed of Amazon

    amazon greed
    11
    1
    13 Votes
    11 Posts
    2k Views
    And so it starts. Amazon are going to introduce forced ads even for Prime customers on their platform. To remove them, you have to pay more?? https://news.sky.com/story/amazons-prime-video-to-include-ads-from-2024-unless-you-pay-more-12967202
  • I have a dream, a vanilla one

    forum
    8
    1 Votes
    8 Posts
    1k Views
    @Panda I think we’re already seeing that direction being followed - although probably not in the sense of self-hosted.
  • How long before AI takes over your job?

    intelligence learning future
    21
    16 Votes
    21 Posts
    4k Views
    @crazycells said in How long before AI takes over your job?: sponsored content To me, this is the method to get yourself to the top of the list. Unfair advantage doesn’t even properly describe it.
  • Twitter "rebrands" to X

    twitter x-corp
    18
    17 Votes
    18 Posts
    2k Views
    @crazycells always makes me laugh