Skip to content
This repository was archived by the owner on Dec 27, 2025. It is now read-only.
This repository was archived by the owner on Dec 27, 2025. It is now read-only.

Wiki.java FAQ/TODO/whinge list and thoughts #154

@MER-C

Description

@MER-C

Pull requests are welcome on some of these, please ask for my thoughts first.

MediaWiki annoyances and wishlist

a.k.a. why can't I do X?

Missing features

Vectorization

  • getLastRevision
  • getDeletedText: adding more titles gives all deleted revisions in those pages...
  • getDeletedHistory (do after reverse is culled)
  • getDeletedRevisions
  • getBlockList (users only). Add filters and support IPs properly.
  • Go wide in parse() to fetch the parsed text, original wikitext, wikilinks, categories, external links, sections and templates, all at the same time. This could be the base of a Page object (yes, I finally have meaningful data to put in there.)
  • Make text a field of Revision, getRevisions fetch text optionally and Revision.getText lazy loading of text (with a warning that it shouldn't be used in loops).

General FIXMEs

  • parse+diff: missingtitle is a generic error messages that represent real unrecoverable assertion errors in other methods.
  • LogEntry details handling (Refactor LogEntry details to a HashMap #126)
  • Simplify site info caching.

Deprecated API removal

  • Change signatures of parse and diff. Deprecate some trampolines.

WMF specific

Utilities

  • CSV export (revisions, log entries, user info?, page info?)
  • Diff parsing -- refactoring? I'd like to see machine readable diffs first.
  • Export of tabular data to wiki table (may be useful, just an idea at the moment)
  • LogEntry -> wikitext table

Tools

  • Explore stuff in paid for spam
  • UserLinkAdditionFinder: servlet version -- limited to one user per request if useful.
  • UserLinkAdditionFinder/CCIAnalyzer: do not return links or analyze text that was already there. Requires diff parsing refactoring.
  • CCIAnalyzer: aggressive mode (Add aggressive mode to CCIAnalyzer #97)
  • Transition user watchlist into a generic mass contribution fetcher, particularly from categories and lists of users. The tool should support new pages only (for spam sockfarms).
  • ContributionSurveyor: split long surveys into multiple text files, 2000 articles per file, and serve them ZIPped.
  • AdminStats: protections.
  • AdminStats: writeup and plots.

Non-problems and implementation notes

  • Why is X (e.g. assertion modes, log types, namespaces) not implemented as an Enum? MediaWiki has a large library of extensions, each extension may add more possible values. Furthermore, the site owner may add other possible values (e.g. more namespaces). Wiki.java only covers MediaWiki as shipped with no extensions.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions