Skip to content

perf: improve inference speed and Windows compatibility#170

Open
williamyang2024 wants to merge 1 commit intobytedance:masterfrom
williamyang2024:feat/windows-perf-improvements
Open

perf: improve inference speed and Windows compatibility#170
williamyang2024 wants to merge 1 commit intobytedance:masterfrom
williamyang2024:feat/windows-perf-improvements

Conversation

@williamyang2024
Copy link
Copy Markdown

  • Wrap model.generate() in torch.inference_mode() to reduce VRAM usage
  • Add use_cache=True for faster token generation
  • Add sys.stdout line-buffering for real-time progress visibility
  • Add requirements.win.txt for Windows dependency setup
  • Add run_demo.py as a convenient local runner script
  • Fix whitespace/formatting in markdown_utils.py

- Wrap model.generate() in torch.inference_mode() to reduce VRAM usage
- Add use_cache=True for faster token generation
- Add sys.stdout line-buffering for real-time progress visibility
- Add requirements.win.txt for Windows dependency setup
- Add run_demo.py as a convenient local runner script
- Fix whitespace/formatting in markdown_utils.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants