How to get the path of current script using Node.js ? Show Improve Article Save Article Like Article
Improve Article Save Article We can get the path of the present script in node.js by using and module scope variables.
Let’s Consider the below file structure of the project: Below examples illustrate the use of __dirname and __filename module scope variable in node.js: Example 1: Determine the path of the present script while executing app.js file. app.js file:
Output: Filename is D:\DemoProject\app.js Directory name is D:\DemoProject Example 2: Determine the path of the present script while executing routes\user.js file. user.js file:
Output: Filename is D:\DemoProject\routes\app.js Directory name is D:\DemoProject\routes My Personal Notes arrow_drop_up Save Please Login to comment...If you use a site hosting service, such as Wix or Blogger, you might not need to (or be able to) edit your robots.txt file directly. Instead, your provider might expose a search settings page or some other mechanism to tell search engines whether or not to crawl your page. If you want to hide or unhide one of your pages from search engines, search for instructions about modifying your page visibility in search engines on your hosting service, for example, search for "wix hide page from search engines". You can control which files crawlers may access on your site with a robots.txt file. A robots.txt file lives at the root of your site. So, for site # Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /4, the robots.txt file lives at # Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /5. robots.txt is a plain text file that follows the . A robots.txt file consists of one or more rules. Each rule blocks or allows access for all or a specific crawler to a specified file path on the domain or subdomain where the robots.txt file is hosted. Unless you specify otherwise in your robots.txt file, all files are implicitly allowed for crawling. Here is a simple robots.txt file with two rules: User-agent: Googlebot Disallow: /nogooglebot/ User-agent: * Allow: / Sitemap: https://www.example.com/sitemap.xml Here's what that robots.txt file means:
See the section for more examples. Basic guidelines for creating a robots.txt fileCreating a robots.txt file and making it generally accessible and useful involves four steps:
Create a robots.txt fileYou can use almost any text editor to create a robots.txt file. For example, Notepad, TextEdit, vi, and emacs can create valid robots.txt files. Don't use a word processor; word processors often save files in a proprietary format and can add unexpected characters, such as curly quotes, which can cause problems for crawlers. Make sure to save the file with UTF-8 encoding if prompted during the save file dialog. Format and location rules:
How to write robots.txt rulesRules are instructions for crawlers about which parts of your site they can crawl. Follow these guidelines when adding rules to your robots.txt file:
Google's crawlers support the following rules in robots.txt files:
All rules, except User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/7, support the User-agent: * Disallow: /4 wildcard for a path prefix, suffix, or entire string. Lines that don't match any of these rules are ignored. Read our page about Google's interpretation of the robots.txt specification for the complete description of each rule. Upload the robots.txt fileOnce you saved your robots.txt file to your computer, you're ready to make it available to search engine crawlers. There's no one tool that can help you with this, because how you upload the robots.txt file to your site depends on your site and server architecture. Get in touch with your hosting company or search the documentation of your hosting company; for example, search for "upload files infomaniak". After you upload the robots.txt file, test whether it's publicly accessible and if Google can parse it. Test robots.txt markupTo test whether your newly uploaded robots.txt file is publicly accessible, open a private browsing window (or equivalent) in your browser and navigate to the location of the robots.txt file. For example, Sitemap: https://example.com/sitemap.xml Sitemap: https://www.example.com/sitemap.xml3. If you see the contents of your robots.txt file, you're ready to test the markup. Google offers two options for testing robots.txt markup:
Submit robots.txt file to GoogleOnce you uploaded and tested your robots.txt file, Google's crawlers will automatically find and start using your robots.txt file. You don't have to do anything. If you updated your robots.txt file and you need to refresh Google's cached copy as soon as possible, learn how to submit an updated robots.txt file. Useful robots.txt rulesHere are some common useful robots.txt rules: Useful rulesDisallow crawling of the entire websiteKeep in mind that in some situations URLs from the website may still be indexed, even if they haven't been crawled. Note: This does not match the various AdsBot crawlers, which must be named explicitly.User-agent: * Disallow: /Disallow crawling of a directory and its contents Append a forward slash to the directory name to disallow crawling of a whole directory. Caution: Remember, don't use robots.txt to block access to private content; use proper authentication instead. URLs disallowed by the robots.txt file might still be indexed without being crawled, and the robots.txt file can be viewed by anyone, potentially disclosing the location of your private content.User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/Allow access to a single crawler Only User-agent: Googlebot-news Allow: / User-agent: * Disallow: /0 may crawl the whole site. User-agent: Googlebot-news Allow: / User-agent: * Disallow: /Allow access to all but a single crawler User-agent: Googlebot-news Allow: / User-agent: * Disallow: /1 may not crawl the site, all other bots may. User-agent: Unnecessarybot Disallow: / User-agent: * Allow: / Disallow crawling of a single web page For example, disallow the User-agent: Googlebot-news Allow: / User-agent: * Disallow: /2 page located at User-agent: Googlebot-news Allow: / User-agent: * Disallow: /3, and User-agent: Googlebot-news Allow: / User-agent: * Disallow: /4 in the User-agent: Googlebot-news Allow: / User-agent: * Disallow: /5 directory. User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html Disallow crawling of the whole site except a subdirectory Crawlers may only access the User-agent: Googlebot-news Allow: / User-agent: * Disallow: /6 subdirectory. User-agent: * Disallow: / Allow: /public/ Block a specific image from Google Images For example, disallow the User-agent: Googlebot-news Allow: / User-agent: * Disallow: /7 image. User-agent: Googlebot-Image Disallow: /images/dogs.jpg Block all images on your site from Google Images Google can't index images and videos without crawling them. # Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /0 Disallow crawling of files of a specific file type For example, disallow for crawling all User-agent: Googlebot-news Allow: / User-agent: * Disallow: /8 files. # Example 1: Block only Googlebot User-agent: Googlebot Disallow: / # Example 2: Block Googlebot and Adsbot User-agent: Googlebot User-agent: AdsBot-Google Disallow: / # Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly) User-agent: * Disallow: /1 Disallow crawling of an entire site, but allow User-agent: Googlebot-news Allow: / User-agent: * Disallow: /9 This implementation hides your pages from search results, but the User-agent: Googlebot-news Allow: / User-agent: * Disallow: /9 web crawler can still analyze them to decide what ads to show visitors on your site. |