Web scraping from select drop-downs using JS and Chrome

Neetish Raj
3 min readFeb 14, 2020

--

So I came across a small web scraping job for someone where I needed to scrape data from 2 select drop-downs. It was a simple task but the code came out pretty handy for many other scraping tasks as well, so I thought let’s document this process.
NOTE: Skip to the bottom if just wanna see the code

What you need?

Just a chrome browser with working internet.

What you need to know?

  1. JavaScript: Enough to work with the DOM , do data manipulations and fetch data.
  2. CSS selectors: Knowledge of CSS selectors is inversely proportional to the efforts required during web scraping.

What are we gonna do?

So there was this site where we had 2 select drop-downs.
In the first select drop-down we had around 1000 option tags as shown below

<select name="employee_roles">
<option value="1">Administrator</option>
....
....
<option value="1000">Zonal Head</option>

</select>

Selecting one of the options above would trigger a fetch call that would populate second select drop-down options.

<select name="employee_skills">
<option value="121">UNIX</option>
....
....
<option value="324">Shell Scripting</option>

</select>

And we wanted all that data in a roles array in JSON format as shown below.

{
"roles": [
{
"role_id": 1,
"role_name": "Administrator",
"role_skills": [
{"skill_id": 1167, "skill_name": "UNIX"},
...
...,
{"skill_id": 3242, "skill_name": "Bash Scripting"}

]
},
...
...
{
"role_id": 1000,
"role_name": "Zonal Head",
"role_skills": [
{"skill_id": 4001, "skill_name": "Microsoft PowerPoint"},
...

]
}
]
}

How we do it?

Open chrome with that website opened in a tab, then press F12 to open Chrome dev-tools.

Now we play a little with the CSS selectors to find the options in the first select drop-down and understand the structure using the in-built jQuery selector syntax “ $( ) ”on the console.

$("select option") // Shows the list of option tags
$("select option").length // Shows the count of option tags
$("select option")[0] // returns the first option element
$("select option")[0].value // returns the role_id 1
$("select option")[0].textContent // returns role_name "Administrator"

We have the access to roles values, now we need to trigger a fetch call that would fetch the corresponding skills for that role. To understand this I opened the network tab in chrome dev-tools and studied the GET request URL for “Advance Administrator” which had role_id = 1. Then I write my own fetch call like shown below.

fetch('https://some-secret-site.com/skills?role_id=1')
.then((response) => {
return response.json();
})
.then((myJson) => {
console.log(myJson);
});

So we can see by just changing the role_id value in the fetch call url we can get the skills data for any roles. Now we got everything and just need to organize it in a cool script.

Grand Finale! The final web scraping code

var data = {roles:[]};
var isWebScrapingCompleted = false;
var list = $("select option");
var url = 'https://some-secret-site.com/skills?role_id=';
for(var i = 0; i < list.length; i++){
fetch(url + list[i].value)
.then((response) => {
return response.json();
})
.then((myJson) => {
data.roles.push({
role_id: list[i].value,
role_name: list[i].textContent,
role_skills: myJson
});

// Indicator of last fetch call
if( list[i].value == 1000){
isWebScrapingCompleted = true;
}
});
}
// This block of code is for waiting and logging status every 1 second till we fetch skills data for all 1000 roles.var myInterval = setInterval(() => {
if(!isWebScrapingCompleted){
console.log("Still making fetch calls. Wait some more!");
} else {
console.log("Web Scraping Completed, Hurray!\n Extracting data to JSON format....\n\n");
console.log(JSON.stringify(data, null, 4));
clearInterval(myInterval);
}
}, 1000);

Disclaimer: Make sure you read this before performing web-scraping on publicly available data.

--

--

Neetish Raj
Neetish Raj

Written by Neetish Raj

Cloud Architect | BLR Software Guy

No responses yet