Today we will learn How
to build own site robots.txt & submit it on Google. It's very
important & most needed thing for a good SEO of your site.
So Let's start...
What is Robots.txt??
This is a text
file that is written and can be intepreted by Spiders/Crawlers to abide in rules given to it, and in robots.txt it refers to a particular spider/crawler by its UA (User Agent) in directing it where to crawl in site and where to not.
All robots.txt must be saved in the root directory and saved with robots.txt and any spider visiting your site must first look for http://yoursite.wapka.mobi/robots.txt before accessing the site
What is a Spider/Crawler? Spiders or called Crawlers are programs sent by Search engines to index pages and take results gotten from pages to the search engine, Spiders are also used by H*ckers in getting Email address for spamming, and all browser, spiders/crawlers have a unique User Agent used for surfing the net. What is User Agent? User Agents are just like an Identity Card used by browsers and Spiders in surfing the net in order to be recognised.
I have gone through most Wapka sites and I noticed most of them cannot be indexed on the google Search Engine.
WHY?
The reason is from the default Robot.txt file. Before every crawler/spider crawls a site it must first of all go to the Robot.txt and check the area it must and must not crawl. Now, the default wapka Robot.txt goes like this :
Let me analyze them:
User agent means the name of the crawler , slurp is yahoo, googlebot is google and so on .
User agent : * means all the spiders but user agent :
slurp or user agent : googlebot specifies the particular spider you are referring to and Disallow : / means that the crawler should not touch or crawl any of your site page.
Disallow :
means the spider is free to access all your page .
User Agent: Slurp
Disallow : /
means that Google Bot should not touch your site and that's why Wapka sites do not appear in Google Search .
If you want your site to be crawled and be visible on all search engine then use this.
Configuration of Robots.txt
Login to wapka & select your site. Edit Site >> Global Settings >>
Select Head Tags(meta,style,....)
Now, Click On Edit Robots File
now in the textarea paste this....
change your site url
Though u can modify it with your own mod but i have given the best robots.txt file that will be best for your site.
Setting Robots.txt on Google
go to google webmaster
select site & go to robots.txt tester tools
u will see a textarea like this & paste your site robots.txt code in the textarea & click test/submit. your site robots.txt can be found on http://yoursite.wapka.mobi/robots.txt if it shows red alert on your robots.txt that means that the txt has any error. & the green alert indicates that the txt file is ok. finish!!
I hope you enjoyed the tutorial.
So Let's start...
What is Robots.txt??
file that is written and can be intepreted by Spiders/Crawlers to abide in rules given to it, and in robots.txt it refers to a particular spider/crawler by its UA (User Agent) in directing it where to crawl in site and where to not.
All robots.txt must be saved in the root directory and saved with robots.txt and any spider visiting your site must first look for http://yoursite.wapka.mobi/robots.txt before accessing the site
What is a Spider/Crawler? Spiders or called Crawlers are programs sent by Search engines to index pages and take results gotten from pages to the search engine, Spiders are also used by H*ckers in getting Email address for spamming, and all browser, spiders/crawlers have a unique User Agent used for surfing the net. What is User Agent? User Agents are just like an Identity Card used by browsers and Spiders in surfing the net in order to be recognised.
I have gone through most Wapka sites and I noticed most of them cannot be indexed on the google Search Engine.
WHY?
The reason is from the default Robot.txt file. Before every crawler/spider crawls a site it must first of all go to the Robot.txt and check the area it must and must not crawl. Now, the default wapka Robot.txt goes like this :
CODE: click on quote button to copy
User-agent: Slurp
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 60
Disallow: /
User-agent: *
Disallow:
Crawl-delay: 60
Let me analyze them:
User agent means the name of the crawler , slurp is yahoo, googlebot is google and so on .
User agent : * means all the spiders but user agent :
slurp or user agent : googlebot specifies the particular spider you are referring to and Disallow : / means that the crawler should not touch or crawl any of your site page.
Disallow :
means the spider is free to access all your page .
User Agent: Slurp
Disallow : /
means that Google Bot should not touch your site and that's why Wapka sites do not appear in Google Search .
If you want your site to be crawled and be visible on all search engine then use this.
CODE: click on quote button to copy
User Agent: *
Disallow :
Disallow :
Configuration of Robots.txt
Login to wapka & select your site. Edit Site >> Global Settings >>
Select Head Tags(meta,style,....)
Now, Click On Edit Robots File
now in the textarea paste this....
CODE: click on quote button to copy
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://yoursite.wapka.mobi/sitemap.xml
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://yoursite.wapka.mobi/sitemap.xml
change your site url
Though u can modify it with your own mod but i have given the best robots.txt file that will be best for your site.
Setting Robots.txt on Google
go to google webmaster
select site & go to robots.txt tester tools
u will see a textarea like this & paste your site robots.txt code in the textarea & click test/submit. your site robots.txt can be found on http://yoursite.wapka.mobi/robots.txt if it shows red alert on your robots.txt that means that the txt has any error. & the green alert indicates that the txt file is ok. finish!!
I hope you enjoyed the tutorial.