Review Tracking

Building custom datasets by scraping web data

About the project

Creating datasets for various analysis and research purpose.

Technology Used

  • Scrapy
  • requests
  • BeautifulSoup
  • Ubuntu

Problem

Web is a massive source of data and is often messy. It is very hard to extract clean data from web and most of the times we will have to face complexities. Creating dataset by leveraging the data on web is not as easy as it sounds.

Solution

We often get requests from our clients to create a datasets by scraping websites with massive amount of information. We have managed to get data from sites like IMDb, Wikipedia, Reddit, Instagram, etc and build custom datasets as requested for our clients to use.