Community-Question Pipeline (with Spam Precheck)

I did this for a client last week, but this could apply to any niche where you want to become an authority.

Load your source list• Reads a Google Sheet of community sources (FB groups, subreddits, Discord channels, etc.).• Skips any source already processed today via a “LastRan” timestamp check.
Scrape raw posts• Uses a scraper node (Apify, RSS pull, Reddit API, etc.) to fetch the newest posts.
Spam & noise filter (pre-AI!)• A custom JS “Filter Spam” node examines each post’s text and URL.• Drops anything with no question, promotional language, obvious lead-gen or duplicates.• Only true, question-style posts continue—so you never burn an OpenAI token on ads or fluff.
AI analysis & response• Feeds each filtered question into a LangChain/ChatGPT agent with your strict JSON schema.• Gets back a validated object:– qualified: true/false– metadata (URL, username, datetime, location)– a concise, on-brand answer– topic tags for categorization
Normalize & dedupe• A single Code node un-wraps LLM output (handles both output: [] and direct arrays), de-dupes by URL+question, and tallies how many Q&A items you generated.
Write results & update status• Appends all new Q&A rows (with qualifiedCount) into your “Q/A Followups” sheet.• Updates each source row with LastRan and LastRunTotal so you won’t re-scrape the same group today.
Notifications & error alerts• On success: posts “Processed Successfully” to Slack/Discord/Teams.• On error: immediately notifies your team with the error message—so you can fix broken selectors or API issues before they pile up.

Adapting to Any Niche

Sources → Your community
Spam filter tuning
AI prompt & tags
Output destinations
Notification channels & cadence

With the precheck (you can adjust the settings as well) OpenAI call is on a genuine question—maximizing ROI on your token spend and keeping your community engagement both efficient and relevant.

0 comments