Tech News

Building a custom technology news aggregator

About the project

The system crawls latest news from top technology news portals, aggregates them into a central database and feeds the data through an API. In this project we collected data from over 50 portals including Techcrunch, CNET, TheVerge, etc.

Technology Used

  • Scrapy
  • Flask
  • Ubuntu
  • Nginx

Problem

Our client needed a robust system that can grab any new published recently from some of the top tech news portals. The idea was to build an API on top of it so the data can be used by web and mobile app. Using already existing services was not feasible because:

  • Lack of customization
  • Becomes expensive when scaling

Solution

We managed to solve the problem by creating a custom scraper for each news portal. The idea was to get maximum number of news with fewer requests as possible and hence building a cost effective system. Since the scrapers were hosted on our server, scaling it to large amount of data was easy and less expensive as compared to other available services.