Fix: make Dockerfile self-contained with multi-stage build#24277
Fix: make Dockerfile self-contained with multi-stage build#24277Famous077 wants to merge 5 commits intogoogle-gemini:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Dockerfile to support a self-contained build process. By transitioning to a multi-stage build, the container now compiles the necessary packages internally, ensuring that users can build the image from a fresh repository clone without needing to perform local builds beforehand. This improves the developer experience and ensures consistent build environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a multi-stage Docker build to compile and package the Gemini CLI within the container, eliminating the need for host-side pre-building. Feedback focuses on optimizing the Dockerfile for better caching and smaller image sizes, specifically by managing dependency files more efficiently and excluding the .git directory. Other recommendations include using npm ci for consistent builds, combining npm install commands to resolve package dependencies correctly, and reverting the change from CMD to ENTRYPOINT to preserve user flexibility.
|
|
||
| # Copy ALL source files needed for build | ||
| COPY package*.json ./ | ||
| COPY packages/ ./packages/ |
There was a problem hiding this comment.
Copying the entire packages/ directory before npm install invalidates the Docker layer cache for dependencies whenever any source file changes. This significantly slows down subsequent builds. To optimize caching, copy only the package.json files for all workspaces first, run npm install (or npm ci), and then copy the rest of the source code.
Dockerfile
Outdated
| COPY eslint.config.js ./ | ||
| COPY scripts/ ./scripts/ | ||
| COPY esbuild.config.js ./ | ||
| COPY .git/ ./.git/ |
There was a problem hiding this comment.
Copying the .git directory into the Docker build context and image layers is discouraged as it can be very large, increasing build times and image size. Since the generate-git-commit-info.js script only needs the commit hash, consider passing the hash as a build argument (ARG GIT_COMMIT) and using it to generate the version info, rather than including the entire git history.
Dockerfile
Outdated
| COPY .git/ ./.git/ | ||
|
|
||
| # Skip husky (no need in Docker), install deps and build | ||
| RUN HUSKY=0 npm install && \ |
There was a problem hiding this comment.
|
|
||
| # Copy built artifacts from builder stage (no host pre-build needed!) | ||
| COPY --from=builder /build/packages/cli/dist/google-gemini-cli-*.tgz /tmp/gemini-cli.tgz | ||
| COPY --from=builder /build/packages/core/dist/google-gemini-cli-core-*.tgz /tmp/gemini-core.tgz |
There was a problem hiding this comment.
When installing local tarballs that have inter-dependencies (e.g., @google/gemini-cli depending on @google/gemini-cli-core), it is critical to install them in a single npm install -g command (e.g., npm install -g /tmp/gemini-core.tgz /tmp/gemini-cli.tgz). This allows npm to correctly resolve the workspace dependencies from the provided tarballs. Installing them in separate RUN steps or separate commands will likely fail if the packages are not available in a public registry.
|
|
||
| # default entrypoint when none specified | ||
| CMD ["gemini"] | ||
| ENTRYPOINT ["/usr/local/share/npm-global/bin/gemini"] No newline at end of file |
There was a problem hiding this comment.
Changing the default command from CMD to ENTRYPOINT is a breaking change that prevents users from easily overriding the command (e.g., to run a shell like bash). Additionally, using an absolute path is redundant as the binary is already in the PATH. It is recommended to revert to CMD ["gemini"] to maintain compatibility with existing workflows.
CMD ["gemini"]
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the Dockerfile into a multi-stage build process, which improves layer caching and allows the entire build and packaging workflow to occur within the container. Feedback indicates that the GIT_COMMIT build argument is currently ineffective because the underlying build script does not check the environment variable and fails to retrieve git information since the .git directory is not present in the builder stage.
| ARG GIT_COMMIT=unknown | ||
| ENV GIT_COMMIT=$GIT_COMMIT |
There was a problem hiding this comment.
The GIT_COMMIT build argument and environment variable are defined here, but the build process (specifically scripts/generate-git-commit-info.js called during npm run build) does not currently utilize this environment variable. The script attempts to run git rev-parse, which will fail inside the container since the .git directory is not copied. This results in the version metadata being set to 'N/A' regardless of the provided build argument. To fix this, the scripts/generate-git-commit-info.js file should be updated to check process.env.GIT_COMMIT before falling back to the git command.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request optimizes the Docker build process by introducing a multi-stage build, which improves layer caching and reduces the final image size. It also updates the git commit information generation script to support an environment variable, allowing for consistent builds in environments without a .git directory. I have identified an issue in the Dockerfile where copying the scripts directory prematurely invalidates the cache; removing this redundant instruction will improve build performance.
| COPY packages/cli/package*.json ./packages/cli/ | ||
| COPY packages/core/package*.json ./packages/core/ | ||
| COPY packages/vscode-ide-companion/package*.json ./packages/vscode-ide-companion/ | ||
| COPY packages/vscode-ide-companion/scripts/ ./packages/vscode-ide-companion/scripts/ |
There was a problem hiding this comment.
This COPY instruction is likely misplaced and harms build caching. By copying the scripts directory before npm ci, any change to a script file will invalidate this layer and trigger a full npm ci re-installation, even if no dependencies have changed. This is inefficient. Since the entire packages/ directory is copied later on line 26, this line is redundant and should be removed to improve build performance and caching efficiency.
|
I actually tried removing this line as suggested - the build breaks immediately with a MODULE_NOT_FOUND error for generate-notices.js. Turns out the vscode-ide-companion package has a prepare script that calls node ./scripts/generate-notices.js during npm ci, so the scripts folder needs to be present before the install runs. |
|
Hey! While I was making this change, I realized the Docker build context was huge , over 620MB , because .git, node_modules, and all the package-level node_modules folders were getting transmitted to Docker with every single build. Silly as that may sound, that’s a lot of data to be uploading each time - and all those dependencies are being re-installed from scratch inside the container via npm ci anyway. |
scripts/generate-git-commit-info.js
Outdated
| // Check for GIT_COMMIT env var first (e.g. when building inside Docker | ||
| // without a .git directory available) | ||
| if (process.env.GIT_COMMIT) { | ||
| gitCommitInfo = process.env.GIT_COMMIT; |
There was a problem hiding this comment.
I think, here, the value from process.env.GIT_COMMIT be validated first before assignment.
The value from process.env.GIT_COMMIT here (line 48) is assigned to gitCommitInfo without any validation.
That variable is later interpolated into a single-quoted string in the generated TypeScript file on line 71:
export const GIT_COMMIT_INFO = '${gitCommitInfo}';This generated file is imported at runtime by:
packages/core/src/telemetry/clearcut-logger/clearcut-logger.ts(line 64) — telemetrypackages/cli/src/ui/components/AboutBox.tsx(line 9) — version UIpackages/cli/src/ui/commands/bugCommand.ts— bug reports
If someone passes a value containing a single quote (e.g., --build-arg GIT_COMMIT="abc'def"), the generated file becomes:
export const GIT_COMMIT_INFO = 'abc'def'; // ← syntax errorA more adversarial input like --build-arg GIT_COMMIT="'; process.exit(1); //" would produce valid but injected code:
export const GIT_COMMIT_INFO = ''; process.exit(1); //';Since a real git commit hash is always hexadecimal, a small guard here would prevent both accidental breakage and injection:
const envCommit = process.env.GIT_COMMIT;
if (envCommit && /^[0-9a-f]+$/i.test(envCommit)) {
gitCommitInfo = envCommit;
}This way, invalid inputs are silently ignored.
|
Hi @jackwotherspoon , @Adib234 , I've implemented the required changes as per the google-code-assit. Could you please review the PR when you get a chance and let me know if anything needs to be updated or improved. Thank you |
Summary
Fixes Dockerfile to work on a clean
git clonewithout requiring host pre-built artifacts.Details
The Dockerfile used
COPY packages/cli/dist/*.tgzwhich failed if the user hadn'trun
npm run buildlocally first. Converted to a multi-stage build so everythingcompiles inside the container.
Stage 1 (builder): installs git, runs
npm install+npm run build+npm packStage 2 (runtime): copies
.tgzartifacts from builder via--from=builderAdded
HUSKY=0to skip git hooks during container build.Related Issues
Fixes #15859
How to Validate
docker build --build-arg CLI_VERSION_ARG=1.0.0 -t gemini .docker run --rm gemini --version1.0.0Before this fix, step 2 fails with:
ERROR: lstat /packages/cli/dist: no such file or directoryPre-Merge Checklist