MCP Smart Crawler

by loo-y

Web/Scraping & Content Extraction web scraping playwright crawler metadata extraction xiaohongshu

MCP Smart Crawler is a Model Context Protocol (MCP) server that uses Playwright to crawl web content, extract metadata, and download resources. It is designed to work with platforms like Xiaohongshu.

View on GitHub

Last updated: N/A

What is MCP Smart Crawler?

MCP Smart Crawler is an MCP server that utilizes Playwright for browser automation to crawl web content, specifically designed to extract metadata and download resources from platforms like Xiaohongshu.

How to use MCP Smart Crawler?

To use this server, configure your MCP client by adding the provided JSON configuration to your client's MCP server settings. Adjust the command and args based on how you run the server script. The configuration specifies the command to execute (e.g., npx mcp-smart-crawler) and any optional arguments, such as the download folder.

Key features of MCP Smart Crawler

Extract metadata (title, description, images) from Xiaohongshu posts.
Download videos and images from Xiaohongshu share links.
Uses Playwright for browser automation.

Use cases of MCP Smart Crawler

Extracting product information from Xiaohongshu posts for market research.
Downloading images and videos from Xiaohongshu for content analysis.
Automating the collection of metadata from Xiaohongshu posts for social media monitoring.
Integrating Xiaohongshu content into other applications via MCP.

FAQ from MCP Smart Crawler

What is Playwright?

Playwright is a Node.js library to automate Chromium, Firefox and WebKit with a single API.

What is MCP?

MCP stands for Model Context Protocol, likely a custom protocol for communication between a client and server.

Can I use this crawler for other websites?

While primarily designed for Xiaohongshu, the underlying Playwright framework can be adapted for other websites with modifications to the extraction logic.

How do I configure the download folder?

You can specify the download folder using the --download-folder argument in the MCP server configuration.

What kind of metadata is extracted?

The crawler extracts metadata like the title, description, and images from Xiaohongshu posts.