Create a high qulity Image Dataset using EyeEm
01 Nov 2018 by dzlabCreating a high quality images dataset from EyeEm
The following tutorial walk you through how to create a high quality image dataset from EyeEm. Note: The steps have to be repeated for each class, as we basically need to get URLs for each class once at a time.
Get a list of URLs
Search and scroll
Go to EyeEm web site and search for the images you are interested in. Try to be as specific as possible so that the search result will match the class you’re trying to build the dataset for, in any case you can alway manually delete files.
Keep scrolling down until you have a enough images as you will be able to download only the visible one. I don’t know if there is a maximum to what EyeEm can return but I guess the limit is your browser memory.
Download into file
Now you must run some Javascript code in your browser which will save the URLs of all the images you want for you dataset.
Press Ctrl+Shift+J in Windows/Linux and Cmd+Opt+J in Mac, and a small window the javascript ‘Console’ will appear. That is where you will paste the JavaScript commands.
You will need to get the urls of each of the images in a CSV file. You can do this by running the following commands:
urls = Array.from(document.querySelectorAll('.sc-jWBwVP')).map(el=>el["children"][0].src);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
Note if you have an Ad blocker (I highly recommend you install one, check uBlock Origin), you may need to disable it momentarly for the EyeEm website otherwise you won’t be able to downand the CSV file with all image URLs.
Create directory and upload urls file into your server
Upload the urls file to the root folder and create a unique folder for each class in the same root folder.
Download images
For each class, download the images corresponding to the urls we got from EyeEm. I first tried using the fasai download_images
helper function but it fails as the server response doesn’t contains a Content-Length
header. Instead we will just download the files manually:
Cleanup the dataset by removing corrupted files if any using the fastai verify_images
helper function
View data
Train model
Interpretation
Full jupyter notebook - link
Note This work is an adaptation of an original notebook by Jeremey and FastAI team - link