Support websites redirecting to the same page when AllowURLRevisit is disabled#763
Merged
asciimoo merged 1 commit intogocolly:masterfrom Apr 17, 2023
Merged
Conversation
Some websites set a session cookie, and return a redirect to
the same page instead of returning a response.
To illustrate this problem, this is how HTTP session
might look like:
GET / HTTP/1.1
Host: 127.0.0.1:34931
User-Agent: colly - https://github.com/gocolly/colly/v2
Accept: */*
Accept-Encoding: gzip
HTTP/1.1 302 Found
Content-Type: text/html; charset=utf-8
Location: /
Set-Cookie: session_id=1
Date: Mon, 10 Apr 2023 23:29:29 GMT
Content-Length: 24
<a href="/">Found</a>.
GET / HTTP/1.1
Host: 127.0.0.1:34931
User-Agent: colly - https://github.com/gocolly/colly/v2
Accept: */*
Cookie: session_id=1
Referer: http://127.0.0.1:34931/
Accept-Encoding: gzip
HTTP/1.1 200 OK
Date: Mon, 10 Apr 2023 23:29:29 GMT
Content-Length: 12
Content-Type: text/plain; charset=utf-8
hello world
This fixes regression introduced in 0be3b71 by specifically
bypassing revisit check if current redirect destination equals to
the original one.
55ef790 to
b4ca6a7
Compare
WGH-
added a commit
to WGH-/colly
that referenced
this pull request
Mar 29, 2024
This was "fixed" in b4ca6a7 (gocolly#763), but the fix turned out to be incomplete. That fix only allowed redirects leading to the same URL as the original destination, and didn't take into account more complicated cases. Such as, for example: * www.example.com * example.com * (set cookie) * example.com
WGH-
added a commit
to WGH-/colly
that referenced
this pull request
Mar 29, 2024
This was "fixed" in b4ca6a7 (gocolly#763), but the fix turned out to be incomplete. That fix only allowed redirects leading to the same URL as the original destination, and didn't take into account more complicated cases. Such as, for example: * www.example.com * example.com * (set cookie) * example.com
WGH-
added a commit
to WGH-/colly
that referenced
this pull request
Mar 29, 2024
This was "fixed" in b4ca6a7 (gocolly#763), but the fix turned out to be incomplete. That fix only allowed redirects leading to the same URL as the original destination, and didn't take into account more complicated cases. Such as, for example: * www.example.com * example.com * (set cookie) * example.com (cherry picked from commit 02570f1)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Some websites set a session cookie, and return a redirect to the same page instead of returning a response.
To illustrate this problem, this is how HTTP session might look like:
This fixes regression introduced in 0be3b71 by specifically bypassing revisit check if current redirect destination equals to the original one.