I come up with another solution to propose that no one mention. There is a library called Selenum it is is an open-source automating testing tool used for automating web applications for testing purposes, but is certainly not limited to only this . You can write a web crawler and get benefited from this automation testing tool just as a human would do.
As an illustration, i will provide to you a quick tutorial to get a better look of how it works. if you are being bored to read this post take a look at this Video to understand what capabilities this library can offer in order to crawl web pages.
Selenium Components
To begin with Selenium consist of various components that coexisted in a unique process and perform their action on the java program. This main component is called Webdriver and it must be included in your program in order to make it working properly.
Go to the following site here and download the latest release for your computer OS (Windows, Linux, or MacOS). It is a ZIP archive containing chromedriver.exe. Save it on your computer and then extract it to a convenient location just as C:\WebDrivers\User\chromedriver.exe We will use this location later in the java program.
The next step is to inlude the jar library. Assuming you are using maven project to build the java programm you need to add the follow dependency to your pom.xml
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>3.8.1</version>
</dependency>
Selenium Web driver Setup
Let us get started with Selenium. The first step is to create a ChromeDriver instance:
System.setProperty("webdriver.chrome.driver", "C:\WebDrivers\User\chromedriver.exe);
WebDriver driver = new ChromeDriver();
Now its time to get deeper in code.The following example shows a simple programma that open a web page and extract some useful Html components. It is easy to understand, as it has comments that explain the steps clearly. Please take a brief look to understand how to capture the objects
//Launch website
driver.navigate().to("http://www.calculator.net/");
//Maximize the browser
driver.manage().window().maximize();
// Click on Math Calculators
driver.findElement(By.xpath(".//*[@id = 'menu']/div[3]/a")).click();
// Click on Percent Calculators
driver.findElement(By.xpath(".//*[@id = 'menu']/div[4]/div[3]/a")).click();
// Enter value 10 in the first number of the percent Calculator
driver.findElement(By.id("cpar1")).sendKeys("10");
// Enter value 50 in the second number of the percent Calculator
driver.findElement(By.id("cpar2")).sendKeys("50");
// Click Calculate Button
driver.findElement(By.xpath(".//*[@id = 'content']/table/tbody/tr[2]/td/input[2]")).click();
// Get the Result Text based on its xpath
String result =
driver.findElement(By.xpath(".//*[@id = 'content']/p[2]/font/b")).getText();
// Print a Log In message to the screen
System.out.println(" The Result is " + result);
Once you are done with your work, the browser window can be closed with:
driver.quit();
Selenium Browser Options
There too much functionality you can implement when you working with this library, For example, assuming you are using chrome you can add in your code
ChromeOptions options = new ChromeOptions();
Take look at how we can use WebDriver to open Chrome extensions using ChromeOptions
options.addExtensions(new File("src\test\resources\extensions\extension.crx"));
This is for using Incognito mode
options.addArguments("--incognito");
this one for disabling javascript and info bars
options.addArguments("--disable-infobars");
options.addArguments("--disable-javascript");
this one if you want to make the browser scraping silently and hide browser crawling in the background
options.addArguments("--headless");
once you have done with it then
WebDriver driver = new ChromeDriver(options);
To sum up let's see what Selenium has to offer and make it a unique choice compared with the other solutions that proposed on this post thus far.
- Language and Framework Support
- Open Source Availability
- Multi-Browser Support
- Support Across Various Operating Systems
- Ease Of Implementation
- Reusability and Integrations
- Parallel Test Execution and Faster Go-to-Market
- Easy to Learn and Use
- Constant Updates