Create a high qulity Image Dataset using EyeEm01 Nov 2018 by dzlab
Creating a high quality images dataset from EyeEm
The following tutorial walk you through how to create a high quality image dataset from EyeEm. Note: The steps have to be repeated for each class, as we basically need to get URLs for each class once at a time.
Get a list of URLs
Search and scroll
Go to EyeEm web site and search for the images you are interested in. Try to be as specific as possible so that the search result will match the class you’re trying to build the dataset for, in any case you can alway manually delete files.
Keep scrolling down until you have a enough images as you will be able to download only the visible one. I don’t know if there is a maximum to what EyeEm can return but I guess the limit is your browser memory.
Download into file
You will need to get the urls of each of the images in a CSV file. You can do this by running the following commands:
urls = Array.from(document.querySelectorAll('.sc-jWBwVP')).map(el=>el["children"].src);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
Note if you have an Ad blocker (I highly recommend you install one, check uBlock Origin), you may need to disable it momentarly for the EyeEm website otherwise you won’t be able to downand the CSV file with all image URLs.
Create directory and upload urls file into your server
Upload the urls file to the root folder and create a unique folder for each class in the same root folder.
For each class, download the images corresponding to the urls we got from EyeEm. I first tried using the fasai
download_images helper function but it fails as the server response doesn’t contains a
Content-Length header. Instead we will just download the files manually:
Cleanup the dataset by removing corrupted files if any using the fastai
verify_images helper function
Full jupyter notebook - link
Note This work is an adaptation of an original notebook by Jeremey and FastAI team - link