can_fetch() returns TRUE ...

Hey,

while integrating spiderbar's `can_fetch()` into the robotstxt package I encountered a test case where `can_fetch()` and `paths_allowed(check_method="robotstxt")` differ.

Consider the following robots.txt file:

```bash
User-agent: UniversalRobot/1.0
User-agent: mein-Robot
Disallow: /quellen/dtd/

User-agent: *
Disallow: /unsinn/
Disallow: /temp/
Disallow: /newsticker.shtml
```

Now try this: 

``` r
library(robotstxt)

rtxt <- "# robots.txt zu http://www.example.org/\n\nUser-agent: UniversalRobot/1.0\nUser-agent: mein-Robot\nDisallow: /quellen/dtd/\n\nUser-agent: *\nDisallow: /unsinn/\nDisallow: /temp/\nDisallow: /newsticker.shtml"

paths_allowed(
  paths          = "/temp/some_file.txt", 
  robotstxt_list = list(rtxt), 
  check_method   = "robotstxt",
  bot            = "*"
)
#> [1] FALSE

paths_allowed(
  paths          = "/temp/some_file.txt", 
  robotstxt_list = list(rtxt), 
  check_method   = "spiderbar",
  bot            = "*"
)
#> [1] FALSE

paths_allowed(
  paths          = "/temp/some_file.txt", 
  robotstxt_list = list(rtxt), 
  check_method   = "robotstxt",
  bot            = "mein-Robot"
)
#> [1] FALSE

paths_allowed(
  paths          = "/temp/some_file.txt", 
  robotstxt_list = list(rtxt), 
  check_method   = "spiderbar",
  bot            = "mein-Robot"
)
#> [1] TRUE
```

**`can_fetch()` seems to ignore those rules that are ought to apply to all bots if a specific bot name / user agent is used.**



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can_fetch() returns TRUE ... #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

can_fetch() returns TRUE ... #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions