Skip to content

Refactor main.go#3119

Merged
MkQtS merged 12 commits intomasterfrom
maingo
Jan 20, 2026
Merged

Refactor main.go#3119
MkQtS merged 12 commits intomasterfrom
maingo

Conversation

@MkQtS
Copy link
Contributor

@MkQtS MkQtS commented Dec 31, 2025

  1. 完善对选择性包含的支持

落实我在这里的想法,include:filename @attr1 @attr2 @-attr3 @-attr4 代表包含 filename 中同时满足带有 @attr1,带有 @attr2,不带 @attr3,不带 @attr4 这样的规则

更为实际的作用有:在 geolocation-cn 里写 include:bytedance @-!cn 就不会向 geolocation-cn 引入 tiktok 这类带有 @!cn 属性的域名;可以直接在 category-ads 中使用 include:xxx @ads,然后将当前 data 目录下的 xxx-ads 文件合并到 xxx,从而减少文件数量

  1. 新增 affiliation/附属 语法

格式与 attribute 类似,使用 & 符号,与 attribute 不冲突,可以有多个。规则会被额外添加到 affiliation 对应的列表中

例如在 data/google 中写 youtube.com &youtube &category-entertainment[domain:]youtube.com 会被加到 geosite:google, geosite:youtube, geosite:category-entertainment 三项中去

这样可以在不影响最终 dlc.dat 文件的情况下,减少 data 中的文件数量,也可以避免同一条规则需要在多个文件中多次使用不便管理(子应用可以合并到母公司对应文件,如 youtube 合并到 google;规则数量少的文件可以合并到上级 category)。即使 data 目录不存在单独的文件 data/affiliation,也不影响 geosite:affiliation 的使用

  1. 支持对规则的去重

简单去重:完全相同的规则仅保留一条,支持所有类型的规则

高级去重:去除不带属性的 full/domain 类型的多余的子域名规则,如同一列表中存在 domain:example.org 时,不会再包含 a.b.c.example.org

仅支持 full/domain 类型的规则;为保证 @ads 机制正常运作,不会去除带有属性的域名;为提高程序运行效率(保证兼容性?)不会去除仅两级的域名。即使列表中包含 domain:cn,仍会保留 domain:example.cn。这个如果不需要可以改掉

完善去重以后可以减少生成二进制文件的体积,可以提高程序运行效率(尽管作用很小);可以增加类似 #3097 那样的精细而又冗余的条目

  1. 其他改进

主要是增加了对 type/value/attributes/affiliations 的格式检查(对 regex 的检查是前段时间已经加了的),其他的参考提交信息

这些修改是在 Gemini 指导下,一行一行人工写的,也经过了我自己的多次测试。不过我的 golang 水平有限,现学现卖可能有问题

@MkQtS MkQtS requested a review from Loyalsoldier December 31, 2025 14:08
@MkQtS MkQtS force-pushed the maingo branch 2 times, most recently from 88a7447 to cbd681a Compare January 9, 2026 06:58
@MkQtS
Copy link
Contributor Author

MkQtS commented Jan 9, 2026

@xiaokangwang Could you help to review this? (I am worried about causing any unexpected problems...)

@DeepChirp DeepChirp mentioned this pull request Jan 11, 2026
@MkQtS MkQtS changed the title Refine partial include Refactor main.go Jan 11, 2026
MkQtS added 12 commits January 20, 2026 15:35
- add value/attribute checker(check missing space)
- allow multiple spaces
- sort attributes
- improve readablity
- remove unnecessary variable
- improve readablity
This reverts e640ac2

It is problematic and I will implement a new one
- refactor inclusion logic
- add basic deduplicate
A domain rule is always added to the list corresponding to the filename
it resides in. Additionally, you can now add affiliations to a domain
rule, and the rule will be added to the list specified by the
affiliation. Each affiliation begins with `&` and followed by the name
of the affiliation.

This helps us to reduce the number of data files without compromising
functionality, and avoid writing a same rule in different files.
only for domain/full subdomains without attr
@MkQtS
Copy link
Contributor Author

MkQtS commented Jan 20, 2026

现在应该比较完善了,既然没有反对意见,我就合并了。合并后去重的功能马上就可以发挥作用。规则方面,稍后我会利用新程序去除各 cn 列表中带有 @!cn 属性的规则(其他的仓库文件调整暂时不会有 先看看会不会翻车

@MkQtS MkQtS merged commit ec95fed into master Jan 20, 2026
1 check passed
@MkQtS MkQtS deleted the maingo branch January 20, 2026 12:58
Comment on lines +32 to +37
var (
TypeChecker = regexp.MustCompile(`^(domain|full|keyword|regexp|include)$`)
ValueChecker = regexp.MustCompile(`^[a-z0-9!\.-]+$`)
AttrChecker = regexp.MustCompile(`^[a-z0-9!-]+$`)
SiteChecker = regexp.MustCompile(`^[A-Z0-9!-]+$`)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parse. Don't validate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you like the validity check here? I think these checks is somehow necessary. There is a check for regexp since #3055 and #3064, and I added more checks during this refactoring. We can find and correct some potential inappropriate rules via these checks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data validation is very important and I'm not against it. I'm just saying that this is the wrong and unsustainable way of doing it. Just google "parse, don't validate".

The first TypeChecker can easily be integrated into parsing logic as normal code. The rest of the checks are not difficult to implement in proper code. You can ask a LLM to write it for you, with tests included.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid I still don't understand what you mean. All these checks are done in the func parseEntry(except one checking filename in loadData which is even earlier). Rules are all validated when parsing a plain source rule string into an Entry struct at the beginning period of generating refMap, so all Entrys are safe to use in the following process.

All rule types(full, domain, keyword, regexp and even include at this period) use a same Entry struct. It seems unnecessary to create new distinct structs for dirrerent type of rules.

Comment on lines +39 to +44
var (
refMap = make(map[string][]*Entry)
plMap = make(map[string]*ParsedList)
finalMap = make(map[string][]*Entry)
cirIncMap = make(map[string]bool) // Used for circular inclusion detection
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of maps here does not feel justified to me, at least most of them.

I also don't like that we are now stacking up global variables.

@MkQtS
Copy link
Contributor Author

MkQtS commented Jan 23, 2026

I am a golang beginner with little experience. I don't have much preference about global varibles/code style, and I just want to make things done. There seems no factual wrong logic/results. If you want to make these codes better/more elegant, just make new PRs(no offense) @database64128

Copy link

@farshadjanu1 farshadjanu1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants