Home
Download
Add-ons
Help
Forum
Organisation
Project
Welcome,
Guest
. Please
login
or
register
.
Did you miss your
activation email?
May 26, 2012, 07:48:29 PM
1 Hour
1 Day
1 Week
1 Month
Forever
Login with username, password and session length
Search:
Advanced search
Wollen Sie dem WebsiteBaker Team beitreten?
Nähere Informationen finden Sie unter
hier
und auf unserer
neuen Webseite
.
155551
Posts in
21715
Topics by
7737
Members
Latest Member:
gx-world
WebsiteBaker Community Forum
English
Help & Support
(Moderators:
Argos
,
badknight
)
Help please, sitemap crawler encountered error 403 and duplicate page
Pages: [
1
]
Go Down
Author
Topic: Help please, sitemap crawler encountered error 403 and duplicate page (Read 630 times)
wolga
Offline
Posts: 6
Help please, sitemap crawler encountered error 403 and duplicate page
«
on:
August 17, 2011, 11:37:43 AM »
Hi to all,
My first post and already complaining, sorry but here goes
WB was installed as a subfolder, links were made from the main site to the new installation, no problem to navigate the site. The problem is when indexing, first with A1 Sitemap Generator and then using an online crawler.
1. A1 Sitemap Generator encountered an error '403 Forbiden' when it hit the folder /mywebsitebaker. I checked the folder permisions and they all seem ok:
mydomain.com/mywebsitebaker/ = 755
mydomain.com/mywebsitebaker/pages/ = 755
mydomain.com/mywebsitebaker/pages/mycategory1/ = 755
All files have permission 644, and as mentioned, browsing is ok.
The robots file has this entry: Allow: /mywebsitebaker/pages/mycategory1/
Possible cause and I'm just guessing is that the Home page is just a link to the domain's home page. That way when a user is on /mywebsitebaker he can go back to the main page.
- Is there a problem of not using WB's home page?
- If installed as a folder does it require another setting in .htaccess? No changes were made in all .htaccess files.
2. So I tested with an online indexer which managed to to crawl beyond /mywebsitebaker but it found duplicate titles 'myarticle1'. The pages have url '.../myarticle1.php?' and '.../myarticle1.php?page=print'.
Maybe I clicked the print button while testing, but does WB keep a copy of the printed page someplace? Why does it keep a copy with 'page=print'. If many users click this button what could possibly happen?
Here are other information:
WB version 2.8.2 Revision 1480
Apache version 2.2.19
PHP version 5.3.6
MySQL version 5.1.56
Appreciate all help, thanks.
Logged
DarkViper
Development Team
Offline
Posts: 1253
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #1 on:
August 17, 2011, 12:23:17 PM »
a little question:
why you need an external sitemap generator?
Logged
Anleitungen lesen und selber nachdenken ist anstrengend... Da lass ich doch lieber andere für mich denken...
In
1984
: Nineteen Eighty-Four is a unrealistic utopia!!
In
2012
: Nineteen Eighty-Four is a little piece only of our reality!!
wolga
Offline
Posts: 6
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #2 on:
August 17, 2011, 12:55:46 PM »
Quote from: DarkViper on August 17, 2011, 12:23:17 PM
a little question:
why you need an external sitemap generator?
Been used to using them in previous CMS, especially the free online crawlers to to generate xml file for google.
Ok, so I just found a sitemap module for WB but I can't figure how to use it. WB is installed as a folder, how to generate for the whole site, can't figure.
Logged
NorHei
Forum administrator
Offline
Posts: 485
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #3 on:
August 17, 2011, 01:00:53 PM »
1. There are many Sitemap generators out in the Web that are more or less useless.
2. What happens if you call "
http://yourdomain.com/mywebsitebaker/
" manually ?
(please use the exact term , many Browsers tend to autocomplete)
3. as the sitemap generator only should try to read stuff he gets from the web user rights for directorys are irrelevant.
4. Tried to delete or rename robots.txt for testing ?
5. maybe there is a Setting for the sitemap generator to follot redirects ? As WB tends to redirect alot ?
6.
Quote
Maybe I clicked the print button while testing, but does WB keep a copy of the printed page someplace? Why does it keep a copy with 'page=print'. If many users click this button what could possibly happen?
The Print Button of your Browser ? I cannot remember WB having a printout function...? If it was the print button of your Browser , dont worrie.
7. If its no Top Secret page it would be helpfull to take a look.
8. A few words about the Sitemap module after having lunch
Logged
It is easier to change the specification to fit the program than vice versa.
wolga
Offline
Posts: 6
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #4 on:
August 17, 2011, 02:23:12 PM »
2. What happens if you call "
http://yourdomain.com/mywebsitebaker/
" manually ?
(please use the exact term , many Browsers tend to autocomplete)
-> it goes to the home page
4. Tried to delete or rename robots.txt for testing ?
-> renamed the robot file, now scanning. I'll post the result later.
5. maybe there is a Setting for the sitemap generator to follot redirects ? As WB tends to redirect alot ?
-> not sure about this, will go though the manual see what it does to redirects
6. The Print Button of your Browser ? I cannot remember WB having a printout function...? If it was the print button of your Browser , dont worrie.
-> sorry I didn't notice that. I thought the print button is there by default. I'm using the template 'danfuh-business02'. Here's how it looks like, print button at the bottom:
http://websitebaker.at/wb-templates/template-danfuh-business02.html
7. If its no Top Secret page it would be helpfull to take a look.
-> Under construction, it's awful. Shame on me. I sent you pm.
Logged
wolga
Offline
Posts: 6
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #5 on:
August 17, 2011, 03:05:26 PM »
Finished crawling the site. After removing (renaming) robots.txt finally managed to access all pages, thanks NorHei!
Now I added 3 entries to robots.txt and restart scaning.
Allow: /mywebsitebaker/
Allow: /mywebsitebaker/pages/
Allow: /mywebsitebaker/pages/mycategory1/
Hope that would allow crawlers to pass through.
Logged
NorHei
Forum administrator
Offline
Posts: 485
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #6 on:
August 17, 2011, 03:16:40 PM »
Ok, lets see ...
Concerning double pages.
"Dan fuh" produces a second page for printing thats right. But its not a real big problem as Google will chosse one of that pages and ignore(lower ranking) the other. (as both reside on same domain) .
If you want to avoid this you can replace or delete the print button .
Maybe you replace it whith a javascript print button(link)
Code:
<a href="#" onclick="window.print();return false;">print</a>
About the robots.txt.
If its scanning now , i guess the crawling problem was a robots.txt problem
Lets see...
Quote
The robots file has this entry: Allow: /mywebsitebaker/pages/mycategory1/
The startpage of your homepage is at
http://yourdomain.com/mywebsitebaker/
, or better
http://yourdomain.com/mywebsitebaker/index.php
.
If only /mywebsitebaker/pages/mycategory1/ is allowed he cannot read the startpage
Even if only /mywebsitebaker/pages/ is allowed he cannot read it.
So i guess you have to allow /mywebsitebaker/ and forbid alll subfolders he may not index.
Logged
It is easier to change the specification to fit the program than vice versa.
NorHei
Forum administrator
Offline
Posts: 485
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #7 on:
August 17, 2011, 03:34:59 PM »
About the xml Sitemep tool form WB, found it
http://www.websitebaker2.org/forum/index.php/topic,1751.msg129672.html#msg129672
If you are using this then just put is in the same location as your config.php, name it as you like (with the extension .php) and submit it to google. Maybe "sitemap.php" for example.
On submisssion too google it then generates a most recent sitemap just as the file is loaded by google. Yo can test it by just calling it directly, and the go to "show sourcecode" in your browser.
One Last thing, please avoid things like "Pease Help" , and "Help me" in your threads as many pepole just do a quick scan into the last posts and decide if there is something they can help whith. But if there is something like "Help me" at the begin of the thread the main content/question is often oferseen. Some Pepole even ignore posts whith such statements as it is considered impolite as most pepole inside the support area need help. Thx in advance, and keep asking if you need help
Logged
It is easier to change the specification to fit the program than vice versa.
wolga
Offline
Posts: 6
Re: Help please, sitemap crawler encountered error 403 and duplicate page
«
Reply #8 on:
August 17, 2011, 04:14:38 PM »
Quote from: NorHei on August 17, 2011, 03:16:40 PM
If you want to avoid this you can replace or delete the print button .
Maybe you replace it whith a javascript print button(link)
Code:
<a href="#" onclick="window.print();return false;">print</a>
Ok deleting the button is preferable as it would avoid posibble future complications.
Quote
About the xml Sitemep tool form WB, found it smiley
http://www.websitebaker2.org/forum/index.php/topic,1751.msg129672.html#msg129672
If you are using this then just put is in the same location as your config.php, name it as you like (with the extension .php) and submit it to google. Maybe "sitemap.php" for example.
On submisssion too google it then generates a most recent sitemap just as the file is loaded by google. Yo can test it by just calling it directly, and the go to "show sourcecode" in your browser.
Quote
So i guess you have to allow /mywebsitebaker/ and forbid alll subfolders he may not index.
Very good point.
I'll go through your advice, testing step at a time with different entries to robots.txt. And try to make the sitemap.php.
Thanks very much
Logged
Pages: [
1
]
Go Up
Jump to:
Please select a destination:
-----------------------------
General
-----------------------------
=> General Announcements
=> Security Announcements
=> Documentation
=> WebsiteBaker Website Showcase
=> Guest Area & Off-Topic
-----------------------------
English
-----------------------------
=> WebsiteBaker 2.9
===> Announcements
===> Help/Support
=====> Modules / Extensions
===> Suggestions
===> Software bugs
=> Help & Support
=> Modules
=> Droplets (PHP code for use with Droplet module) & Snippets (raw PHP code)
=> jQuery
=> Templates, Menus & Design
=> WebsiteBaker Language Files
=> WebsiteBaker 2.x discussion
=> WebsiteBaker 3
=> Archive (posts up to 2007)
-----------------------------
Deutsch (German)
-----------------------------
=> Ankündigungen
=> WebsiteBaker 2.9
===> Ankündigungen
===> Hilfe/Support
=====> Module / Extensions
===> Vorschläge
===> Softwarefehler
===> Erfahrungs und Testberichte
=> Hilfe/Support
=> Module & Snippets
=> Templates & Design
=> Tutorials
=> jQuery
=> Diskussion über WB
=> Off-Topic
=> Archiv für Themen bis 2007
-----------------------------
Nederlands (Dutch)
-----------------------------
=> Aankondigingen
=> Hulp & Ondersteuning
=> Niet-Terzake (Off Topic)
-----------------------------
Francais (French)
-----------------------------
=> Help/Support
-----------------------------
Italiano (Italian)
-----------------------------
=> Help/Support
-----------------------------
Bakery (WB shop module)
-----------------------------
=> Bakery English
=> Bakery Deutsch
-----------------------------
KeepInTouch (Multi Contact Module)
-----------------------------
=> KeepInTouch English
=> KeepInTouch Deutsch
Loading...