RSS Post Importer - Use Case - How to Scrape Articles From Another Site and Copy It Over to Your Site
Use Case – How to scrape articles from another site and copy it over to your site.
Note: This guide requires:
RSS Post Importer
The RSS Post Importer plugin lets you import posts or pages from an external site to your Wordpress site. It can be useful for linking together sites that you own that should share content mutually, or for scraping similar sites for content that relates to what your site is about.
It is a one time set up plugin that continues to work after being configured allowing you to find great quality content providers and funnel them through your domain.
The RSS Post Importer is a must-have for affiliate marketing, for event listing sites, for new sites and for keeping content across multiple sites up to date without all the donkey work.
Content Importer Add-on
The Content Importer Add-on allows you to fetch full text of items coming from an RSS feed.
For example, even if an RSS feed was not set properly and shows only the excerpt of a blog post, with this add-on you can also capture the full content.
Use Case Front-End
Use Case Assumptions
In this example we will consider how to scrape articles from another site and copy it on your site using RSS Post Importer plugin and Content Importer add-on.
The add-on requires the base plugin RSS Post Importer to work. Both are tightly connected, so if you are not familiar with the base plugin, we recommend checking the following links:
- RSS Post Importer - Getting Started (With Video)
- RSS Post Importer - CreativeMinds Products Documentation
Also, we assume that you already know how to create an RSS Feed using base plugin, so now we will just consider how changes the process with Content Importer add-on. If you don't know, how to create the RSS Feed, you can learn more in our use case:
Installing The plugin
The process is the same for all CM plugins and add-ons.
- Download the plugin from your customer dashboard
- Log in to WordPress and navigate to the WordPress Admin → Plugins settings
- Click on Add New
- Activate it and add the license
Learn more: Getting Started - Plugin Overview
Creating the RSS feed using add-on
When creating or editing a feed, you will notice a few new fields that appear only if the add-on is active.
- Pause full content extraction
- Auto search and detect content
- Content Selector
- Selectors to exclude
Let's consider all these options more detailed, and start with Content Selector.
Content Selector - This field is crucial. You must input on it the CSS element where the content is located. This element likely will be different for each RSS feed. You can do this following these steps:
Step 1: Access RSS item
Access an RSS item from the feed you would like to use.
Let's consider it using a definite example: https://www.complex.com/style/rihanna-lvmh-closing-fenty-fashion-house/
Step 2: Inspect the Element
You must find the element which contains the content.
To do so, click on it with the right button and choose Inspect - selecting the beginning of the content is better. This works in both Chrome and Firefox.
You can confirm you have found the element when the relevant area is highlighted.
Step 3: Isolate class or id
In this case, the element is inside <div class= "article__copy clearfix">.
The most relevant CSS elements are id and class.
So, in this case, we can use either article__copy or clearfix, as both are classes. They will produce the same effect.
Step 4: Adding CSS element to the feed page
Now that we found the element, it's time to add it to the feed page. Head back to the Content Selector field and fill into it .
TIP: Class vs. ID
Each element needs a symbol preceding it.
- class - . (final dot). Example: .article__copy (if class="article__copy")
- id - # (number sign/hash). Example: #article__copy (if id="article__copy")
Selector Examples Table
|Document has||You should include in the add-on|
More information: CSS selectors - CSS: Cascading Style Sheets
What if Content Selector doesn't work properly? Let's consider all options:
- Auto search and detect content - You can use this if, even doing the process above, Content Selector is not working. Enabling this option turns off the Content Selector option below
- Content Selector - Include the element to be included. In this example above, .article__copy
- Selector to exclude - Write down one or more elements that should be avoided. This is useful to remove extra advertisement injections. Note: you can enter a few selectors, and they must be separated by comma.
First two options are mutually exclusive. Once Auto search and detect content is unchecked, plugin will search selector value from Content Selector among target page content. If Auto search and detect content is enabled, Content Selector will be ignored during content extraction.
Third option Selector to exclude is helpful, when result post contains extra advertisement injections which is impossible to directly filter away by plugin engine.
And, finally, we have one more option:
- Pause Full content extraction - Check the option to interrupt it for that feed alone. Note that feed items will still be imported but without the full content extraction rule. Moreover, if Pause Full content extraction is unchecked, one of Auto search and detect content and Content Selector should be filled.
Now setup is ready and the add-on will collect the element you have selected. Let's see the result on the Front-End:
Use Case Front-End