How to: Create a simple news aggregator with Yahoo! Pipes

I’ve noticed that many news sites seem to be in the aggregation business, and actually, with Yahoo! Pipes, it’s not too difficult to create your own.

Here’s an example:

Subscribe to RSS headline updates from:
Powered by FeedBurner

In this post, I’ll explain how to use Yahoo! Pipes to create a simple aggregator that cites the source of each headline in brackets.

In the Pipes workspace, you’ll notice the left column has lists of modules divided by category. At the bottom of the workspace is the Debugger, which lets you see if your modules are working properly.

To place a module on the page, click and drag it onto the grid.

The first module you’ll drag onto the page is the “Fetch Feed” module, and in the input box, paste the URL to the RSS or ATOM feed you want to use. In this case, I used CNN.

To add multiple sources to your aggregator, drag multiple copies of the fetch feed module and paste the URLs for your other feeds. I used the New York Times, Los Angeles Times and Washington Post.

Under “Operators” on the left, find the “Union” module and drag it onto the workspace. This module allows you to combine up to five feeds into one. (If you have more than five sources, you can use multiple union modules.)

To connect a fetch feed module to the union module, click the dot at the bottom of the feed module and drag it to one of the dots at the top of the union module. Then drag the dot from the bottom of the union module to the Pipe Output module.

Though it’s not visible in this screencap, if you check your debugger, you’ll see that the headlines are organized by feed. Since you probably want the headlines to be in chronological order, use the “Sort” module.

To sort by date, select “item.pubDate” and “descending.” Note: Not all feeds have valid pubDates. Also, if you check the debugger, you’ll see that the pipe is outputting 77 headlines. In the next step, I’ll show you how to reduce the output so the feed loads faster.

Because stories that newspapers grab from wire services occasionally have the same headline, I added a “Unique” module between the union and sort modules — there’s no use sorting duplicates. I also placed a “Truncate” module right before the pipe output to limit the number of headlines to 25.

Now we’ll label each feed with its source. To do this, drag a “RegEx” (stands for “Regular Expressions”) module into the workspace. In this example, I’m connecting the CNN output to the regex module before I connect it to the union module.

Because we’ll be appending the source to the headline, select “item.title” in the first drop-down menu. In the next box (after “replace”), type $. This tells the system that anything typed into the last box should go at the end of the headline. In this case, I wrote ” [CNN].” (Note the space before the brackets. Without it, the bracket will run up against the last letter of the headline itself.)

If you’d like to add the source to the beginning of the headline, you can type ^ instead. In fact, you can replace the whole headline by typing (.*) in the box.

You can also use the regex module to alter just about any bit of data fetched in the feed.

Once you’ve finished adding regex modules for each of the sources, you’ll probably want to change how you’re filtering for unique headlines. The “item.title” attribute won’t work anymore because of the text you’ve added, so we’ll change it to “item.y:title.”

Finally, to filter out a few more items from the Associated Press, I used the “Filter” module between the union and unique modules.

That’s it! You can save the pipe, then run it. Here’s the output from the pipe I created.

Remember, you can rearrange and add modules as much as you’d like. I’ll occasionally use the filter and truncate modules on a single feed source before sending it to the regex module, just to keep certain items from appearing in and certain sources from taking over the feed.

3 Responses

  1. bob
    |

    Nice intro to Pipes! I’m always amazed that it exists — free! Here’s to hoping Yahoo keeps it going.

    It seems daunting at first, but you did a nice job making it clear.

    I used Pipes when I built Newsbobber — and then went even deeper and started parsing feeds with SimplePie and routing items into a MySQL database.

    Thanks for posting this.

  2. Nicole
    |

    Thanks! :) I’m afraid I’ve barely scratched the surface in my knowledge of how to use it.

  3. Hart
    |

    Excellent work Nicole! I’ll give this a shot myself.