I need to generate data around a very specific criteria/topic in order to analyze the results later on. for instance, i need to generate data around a restaurant and its competition that sell beef burgers to students (age 16-23) , with the following criteria: location: Halifax demographics: Halifax, age 16-23; like outdoor eating, spending around 7$ per meal, students use iphone, students use twitter and facebook, students who go to my competition, students who talk about burgers, retaurants reviews..... Income: students with $20,000 income per year and less... Social media: hashtags about the topic, what people are saying about the topic, tags... etc. Any thoughts or suggestions?
This one is tricky because all social media platforms have terms and site rules around collecting personal data. Facebook in particular is touchy about this, so I would not recommend it. Twitter has an API (https://developer.twitter.com/en/docs/tweets/search/overview) that will give you raw data for a small cost, so your hashtag information can be obtained that way. If you are serious about this, then I would identify the top 50-100 businesses that employ students in the area which also keep their pay scale in the range you're looking for. Then using those company's social media pages to identify employees. You are now zeroed on your audience, you just need to fine tune it to filter on the those students that eat out, perhaps by cross referencing Yelp/Google reviews with the employee data obtained.
I would like to re-emphasize here that this sort of data is tricky to obtain as many of the sites containing this data have terms and conditions which discourage collecting personal data.
Are you hoping for a dynamic system that identifies new data sources as they become available? Or will you identify existing sites/pages that give you the data you want and focus on what is generated there? From there, you can obtain the data manually or programatically. Manually, you will actually be visiting those pages on a regular basis, copying and pasting the data there, and then formatting the data to be usable. This is not a scalable solution, but can work if the data sources are limited and you've got the time. Programmatically, you can build small scripts that will make the site requests for you. If you don't program, there are many freelancers/developers that have this skillset and are reasonably priced. There are also 3rd party platforms that are well priced for small scale web crawling.
To sum up, you need to identify where you'll get your data from online (the income data & personal will be the trickiest), then identify how you want to obtain that data (manually or programmatically), and finally format the data into a usable dataset (CSV & JSON being simple and universal).