Conversation
88a7447 to
cbd681a
Compare
|
@xiaokangwang Could you help to review this? (I am worried about causing any unexpected problems...) |
- add value/attribute checker(check missing space) - allow multiple spaces - sort attributes - improve readablity
- remove unnecessary variable - improve readablity
This reverts e640ac2 It is problematic and I will implement a new one
- refactor inclusion logic - add basic deduplicate
A domain rule is always added to the list corresponding to the filename it resides in. Additionally, you can now add affiliations to a domain rule, and the rule will be added to the list specified by the affiliation. Each affiliation begins with `&` and followed by the name of the affiliation. This helps us to reduce the number of data files without compromising functionality, and avoid writing a same rule in different files.
only for domain/full subdomains without attr
|
现在应该比较完善了,既然没有反对意见,我就合并了。合并后去重的功能马上就可以发挥作用。规则方面,稍后我会利用新程序去除各 cn 列表中带有 |
| var ( | ||
| TypeChecker = regexp.MustCompile(`^(domain|full|keyword|regexp|include)$`) | ||
| ValueChecker = regexp.MustCompile(`^[a-z0-9!\.-]+$`) | ||
| AttrChecker = regexp.MustCompile(`^[a-z0-9!-]+$`) | ||
| SiteChecker = regexp.MustCompile(`^[A-Z0-9!-]+$`) | ||
| ) |
There was a problem hiding this comment.
Parse. Don't validate.
There was a problem hiding this comment.
There was a problem hiding this comment.
Data validation is very important and I'm not against it. I'm just saying that this is the wrong and unsustainable way of doing it. Just google "parse, don't validate".
The first TypeChecker can easily be integrated into parsing logic as normal code. The rest of the checks are not difficult to implement in proper code. You can ask a LLM to write it for you, with tests included.
There was a problem hiding this comment.
I'm afraid I still don't understand what you mean. All these checks are done in the func parseEntry(except one checking filename in loadData which is even earlier). Rules are all validated when parsing a plain source rule string into an Entry struct at the beginning period of generating refMap, so all Entrys are safe to use in the following process.
All rule types(full, domain, keyword, regexp and even include at this period) use a same Entry struct. It seems unnecessary to create new distinct structs for dirrerent type of rules.
| var ( | ||
| refMap = make(map[string][]*Entry) | ||
| plMap = make(map[string]*ParsedList) | ||
| finalMap = make(map[string][]*Entry) | ||
| cirIncMap = make(map[string]bool) // Used for circular inclusion detection | ||
| ) |
There was a problem hiding this comment.
The use of maps here does not feel justified to me, at least most of them.
I also don't like that we are now stacking up global variables.
|
I am a golang beginner with little experience. I don't have much preference about global varibles/code style, and I just want to make things done. There seems no factual wrong logic/results. If you want to make these codes better/more elegant, just make new PRs(no offense) @database64128 |
落实我在这里的想法,
include:filename @attr1 @attr2 @-attr3 @-attr4代表包含 filename 中同时满足带有@attr1,带有@attr2,不带@attr3,不带@attr4这样的规则更为实际的作用有:在
geolocation-cn里写include:bytedance @-!cn就不会向geolocation-cn引入tiktok这类带有@!cn属性的域名;可以直接在category-ads中使用include:xxx @ads,然后将当前 data 目录下的xxx-ads文件合并到xxx,从而减少文件数量格式与 attribute 类似,使用
&符号,与 attribute 不冲突,可以有多个。规则会被额外添加到 affiliation 对应的列表中例如在
data/google中写youtube.com &youtube &category-entertainment,[domain:]youtube.com会被加到geosite:google,geosite:youtube,geosite:category-entertainment三项中去这样可以在不影响最终
dlc.dat文件的情况下,减少 data 中的文件数量,也可以避免同一条规则需要在多个文件中多次使用不便管理(子应用可以合并到母公司对应文件,如 youtube 合并到 google;规则数量少的文件可以合并到上级 category)。即使 data 目录不存在单独的文件data/affiliation,也不影响geosite:affiliation的使用简单去重:完全相同的规则仅保留一条,支持所有类型的规则
高级去重:去除不带属性的 full/domain 类型的多余的子域名规则,如同一列表中存在
domain:example.org时,不会再包含a.b.c.example.org仅支持 full/domain 类型的规则;为保证
@ads机制正常运作,不会去除带有属性的域名;为提高程序运行效率(保证兼容性?)不会去除仅两级的域名。即使列表中包含domain:cn,仍会保留domain:example.cn。这个如果不需要可以改掉完善去重以后可以减少生成二进制文件的体积,可以提高程序运行效率(
尽管作用很小);可以增加类似 #3097 那样的精细而又冗余的条目主要是增加了对 type/value/attributes/affiliations 的格式检查(对 regex 的检查是前段时间已经加了的),其他的参考提交信息
这些修改是在 Gemini 指导下,一行一行人工写的,也经过了我自己的多次测试。不过我的 golang 水平有限,现学现卖可能有问题