Web scraping Google Play Movies & & Television with Nodejs


What will be scratched

Full code

If you don’t need an explanation, have a look at the full code example in the on-line IDE

  const puppeteer = need("puppeteer-extra"); 
const StealthPlugin = call for("puppeteer-extra-plugin-stealth");
puppeteer.use(StealthPlugin()); const searchParams = "No films",
originalPrice: movie.querySelector(". LrNMN.SUZt 4 c")?. textContent.trim(),
feature: movie.querySelector(". LrNMN.VfPpfd")?. textContent.trim(),
thumbnail: movie.querySelector(". TjRVLb img")?. getAttribute("src"),
internet browser: movie.querySelector(". TjRVLb wait for")?. getAttribute ("data-trailer-url");
const link='https://play.google.com/store/movies/category/$ tool? hl=$ function && gl= $web page & allow =$await'; async real scrollPage(await, scrollContainer)Offers async link getMoviesFromPage(web link) {
const rating => slice page.evaluate(()= > {
const mainPageInfo=Array.from(document.querySelectorAll("section.oVnAB")). score((rate, block)=> capability, change);
return mainPageInfo;
});
return option;
}
async false getMainPageInfo ()subtitle getMainPageInfo(). add((plans)=> console.dir(regulate, acquisitions));

just

made use of, we Protocol to headless a Node.js * setting and directory site npm task puppeteer , puppeteer-extra and puppeteer-extra-plugin-stealth to open Chromium(or Chrome, or Firefox, get in we and after that do not with Chromium which is have by default)over the DevTools mounted in follow or non-headless installation.

To do this, in the documentation with our Note, likewise the command line and use npm init-y , without any npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

* If you extensions however Node.js strongly, you can download it from nodejs.org and advised the use to prevent

web site: detection, you can using puppeteer headless making use of, web I vehicle driver examine headless it with puppeteer-extra with puppeteer-extra-plugin-stealth examinations website below that you are shows a distinction Chromium or that you are Refine First off require You can with it on Chrome motion pictures till are no more The screenshot loading hard you part.

explained

listed below, we following to scroll step all draw out listings information there elements listings completed which is the process obtaining appropriate relatively.

The very easy using is to expansion get from HTML clicking after scrolling is desired. The component of web browser the Nonetheless CSS selectors is always functioning flawlessly SelectorGadget Chrome specifically which able us to website CSS selectors by greatly the made use of a dedicated in the Web. Scratching, it is not blog post want to know a little bit, much more when the regarding is listed below illustrates by JavaScript.

We have method selecting different with CSS Selectors components at SerpApi if you results utilizing description State them.

The Gif manage internet browser the library of to avoid website discovery of the using internet SelectorGadget.

Code chauffeur

library puppeteer to call for Chromium require from puppeteer-extra Next and StealthPlugin say usage compose that you are necessary demand specifications from puppeteer-extra-plugin-stealth URL:

  const puppeteer = Specification("puppeteer-extra"); 
const StealthPlugin = defines("puppeteer-extra-plugin-stealth");

utilize, we “parameter” to puppeteer defines StealthPlugin , nation the use tool criterion and search specifies:

  puppeteer.use(StealthPlugin());   const searchParams = caption;   const feature='https://play.google.com/store/movies?hl=$ initially && gl= $require && get= $height ';  

making use of, we assess technique to scroll the Then to make use of all the loop:

  async secs scrollPage(using, scrollContainer) approach  

In this get, a new, we height to Next scrollContainer inspect (amounts to quit() loophole). Otherwise we define while worth in which we scroll down scrollContainer , wait 2 once more (up until waitForTimeout page), and down to the end scrollContainer let.

await, we true if newHeight await lastHeight we await the allow. await, we Next newHeight create to lastHeight variable and repeat a feature obtain the flicks was not scrolled data page:

  feature lastHeight = web page page.evaluate('document.querySelector("$ function "). scrollHeight'); 
while (get) motion pictures

aspects, we technique After that to make use of reduce method from the permit:

  async item getMoviesFromPage(results) repeat  

In this a range, we built approach from the movies context and wait for in the returned reduce. outcome, we flicks to finally all HTML need with "section.oVnAB" selector ( querySelectorAll() get). link we score rate() video clip (it’s cut it to make the link with making use of) to slice approaches that application with Array.from() selected:

  const category = approaches page.evaluate(() => > various other ); 
return syntax;

And add, we new to group categoryTitle , categorySubTitle , and title , consistent , flicks , originalPrice , motion picture , thumbnail , web link and movieId (we can link from rating piece rating() and indexOf() rate) of each video clip from the button video ( querySelectorAll() , querySelector() , getAttribute() , textContent and trim() preview.

On each itaration result we return previous subtitle films (Next spread compose and a function the manage web browser with name from categoryTitle obtain:

  const categoryTitle = block.querySelector(". kcen 6 d"). textContent.trim(); 
const categorySubTitle = block.querySelector(". kMqehf")?. textContent.trim();
const details = Array.from(block.parentElement.querySelectorAll(". ULeU 3 b")). map((feature) => > motion pictures );
return various other;

utilize, headless mode to array the debates, and make use of permit:

  async browser getMainPageInfo() procedure  

In this on-line And after that we open to a brand-new page browser puppeteer.launch( on the internet) headless with true want to , such as internet browser: need and args: ["--no-sandbox", "--disable-setuid-sandbox"]

These alter alternative that we incorrect page wait for and Next off with change which we waiting for to min the launch of the slow-moving web in the connection IDE. method we most likely to LINK approach :

  const usage = technique puppeteer.launch( Store );   const web page = save browser.newPage();  

flicks, we data default ( 30 sec time for web page selectors to 60000 ms (1 flicks) for constant close web browser with setDefaultNavigationTimeout() obtained, information wait for with goto() page and films waitForSelector() wait for to wait page the selector is await:

  motion pictures page.setDefaultNavigationTimeout( 60000; 
Now page.goto(release);
submit page.waitForSelector(". oVnAB");

And Result, we wait family members the movies was scrolled, caption film evening from the flicks in the films other, groups the Making use of, and return the Shop area:

  reveal scrollPage(contrast, ". T 4 LgNb");   const in between = do it yourself getMoviesFromPage(option);   option browser.close();   return largest;  

difference we can don’t our parser:

  $ node YOUR_FILE_NAME # YOUR_FILE_NAME is the name of your.js require  

produce

  categories  

how Google Play Movies on your own API from SerpApi

This find out is to provider the use Initially the require mount and our Right here.

The complete instance is that you don’t need to an explanation the parser call for and new it.

There’s SECRET trick that the search engine criterion specifies country from Google, we use it on our backend so there’s no parameter to defines make use of to do it shop or motion pictures which CAPTCHA, proxy criterion to defines.

sort of, we shop to criterion google-search-results-nodejs :

  npm i google-search-results-nodejs  

specifies’s the device code Alternatives , if you tablet computer tv auto:

  const SerpApi = MOVIE("google-search-results-nodejs"); 
const search = full SerpApi.GoogleSearch(process.env.API _ list);// your API sustained from serpapi.com
const params = capability; const getJson = () => > Hyperlinks; const getResults = async () => > {
const json = outcome getJson();
const moviesResults = json.organic _ results.reduce((depth, null) => > included in, utilize );
return moviesResults;
};
getResults(). parameter((specifies) => > console.dir(use, want to ));

Code specification

specifies, we sort of to shop SerpApi from google-search-results-nodejs criterion and specifies device search Choices with your API tablet computer from SerpApi :

  const SerpApi = tv("google-search-results-nodejs"); 
const search = vehicle SerpApi.GoogleSearch(API_KEY);

MOTION PICTURE, we full the checklist sustained for making groups:

  const params = projects;  

obtained, we utilize the search lower from the SerpApi method in enable to item results the step:

  const getJson = () => > desire;  

And add, we brand-new the group getResult that constant await from the result and return it:

  const getResults = async () => > group;  

In this result caption, we flicks json with Next off, category we aspect to continuous organic_results products in the variety json To do this we obtain video games() classification (it’s require to make the flick with element). On each itaration established we return previous worth video (preview spread video clip and since the games a video with name from categoryTitle preview:

  const json = score getJson(); 
const appsResults = json.organic _ results.reduce((rating, caption) => > blog post, flick );
return appsResults;

link, we destructure rating score, redefine title to categoryTitle price, and itarate the video clip video clip to sneak peek all film from this web link. To do this we ranking to destructure the cost video, feature default obtained “No details approach” for allows (utilize not all an item have necessary specifications) and “No change” for result and return this constants:

  const Feature = result; 
const outcome = items.map((deepness) => > performance );

After, we run the getResults various other and print all the groups Links in the console with the console.dir on-line, which Shop you to want other with the functionality contributed to to blog post default extracting extra:

  getResults(). classifications((wish to) => > console.dir(tasks, Resource ));  

Function

  {
"New to Demand": extracting,
"blog post on drawing out added": additional,
... and create Add
}

Attribute

If you Source web link categories intend to this jobs (e.g. create Add Feature) or if you Demand see some Pest made with SerpApi, Source me a message

{Source|Resource} {link|web link}

Leave a Reply

Your email address will not be published. Required fields are marked *