AI News Aggregator & Publisher
This project is a complete, end-to-end automated news aggregation and publishing system. It continuously monitors multiple Persian economic RSS feeds using a multi-threaded Python backend. For each new article, it uses Playwright to download the fully-rendered page, extracts clean text with Trafilatura, and stores the content in a PostgreSQL database. A scheduled job periodically selects the latest articles, filters out duplicates using TF-IDF cosine similarity, and sends them to the OpenAI (GPT-4o) API for professional rewriting. The final, polished content, along with its featured image, is then posted as a draft to a WordPress site through a custom-built PHP REST API plugin, creating a seamless workflow from raw news to review-ready editorial content.
Year
2026
Role
Lead Developer
Technologies
Python, OpenAI API (GPT-4o), PostgreSQL, Playwright, Trafilatura, scikit-learn, WordPress REST API, PHP

Challenge
The primary challenge was to design a robust, fully automated pipeline to handle the entire news lifecycle: from discovery on various sources to publication-ready drafts in WordPress. This involved dealing with dynamic JavaScript-rendered websites, extracting clean article content from cluttered HTML, preventing duplicate or near-duplicate stories, and creating a reliable bridge between a Python backend and a PHP-based CMS (WordPress).
Solution
An integrated system was developed using a multi-threaded Python application for parallel RSS feed monitoring. Playwright was used for reliable scraping of modern websites, while Trafilatura and scikit-learn handled text extraction and similarity detection. The core of the content enhancement is an OpenAI integration that rewrites articles based on a specific prompt. A custom WordPress plugin was created in PHP to provide a secure REST API endpoint for receiving the processed articles and images, which are then saved as drafts for final editorial review.
Results
Fully automated end-to-end news aggregation and publishing workflow.
Continuous, parallel monitoring of multiple RSS feeds.
AI-powered content rewriting for improved quality and consistency.
Effective duplicate detection using NLP techniques (TF-IDF).
Seamless integration with WordPress via a custom REST API plugin.
Significant reduction in manual effort for content curation and publishing.