Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
56d4543
Add mixed-line-ending test case
nagromc Jun 13, 2017
3d4fb41
Add mixed-line-ending processor
nagromc Jun 13, 2017
16b7c7a
Use Enum to list line ending types in mixed_line_ending
nagromc Jun 15, 2017
afaa97c
Add enum to mixed_line_ending `--fix` option
nagromc Jun 17, 2017
5186664
Split argument parsing from main mixed line ending function
nagromc Jun 18, 2017
93194b9
Change --fix option from mixed_line_ending
nagromc Jun 18, 2017
b2b0d59
Use enum instead of raw argparse result
nagromc Jun 26, 2017
ad0062a
Add filenames option
nagromc Jun 26, 2017
466f9e1
Add line ending detection
nagromc Jun 26, 2017
aaf134c
Add line ending conversion
nagromc Jun 28, 2017
0a8b929
Change files according to --fix option
nagromc Jun 29, 2017
2b28f4f
Rename variable names
nagromc Jun 29, 2017
22b2282
Reuse variable definition
nagromc Jun 29, 2017
f477582
Add logs to mixed_line_ending.py
nagromc Jun 29, 2017
b1294b8
Add unit test for mixed_line_ending
nagromc Jul 3, 2017
4270b56
Refactor MixedLineDetection
nagromc Jul 4, 2017
a1ffbfa
Add mixed line detection
nagromc Jul 4, 2017
c6c4c4a
Refactor mixed_line_ending
nagromc Jul 4, 2017
614893f
Fix _process_fix_auto to return the right value
nagromc Jul 4, 2017
a1e1421
Refactor mixed_line_ending
nagromc Jul 4, 2017
609d011
Improve logging for force line ending
nagromc Jul 4, 2017
63bb1fd
Add unit test for mixed_line_ending --fix={cr,crlf}
nagromc Jul 4, 2017
3dbeeee
Improve test coverage
nagromc Jul 4, 2017
d0016c5
Ignore .cache/
nagromc Jul 8, 2017
ba63d1b
Refactor _process_no_fix
nagromc Jul 8, 2017
2b6ad97
Fix 13 tests for Python 3.4 & 3.5
nagromc Jul 9, 2017
d16d04a
Fix the 5 remaining tests for Python 3.4 & 3.5
nagromc Jul 9, 2017
8bc4af4
Refactor file opening
nagromc Jul 9, 2017
4fc9624
Refactor _detect_line_ending
nagromc Jul 10, 2017
1937788
Add support for CR line ending for mixed_line_ending.py
nagromc Jul 10, 2017
560e1c2
Add mixed-line-ending hook declaration
nagromc Jul 10, 2017
0335ebf
Update README.md
nagromc Jul 17, 2017
55658c4
Merge remote-tracking branch 'upstream/master' into mixed-line-ending
nagromc Jul 18, 2017
ab2a849
Clean up against add-trailing-comma
nagromc Jul 18, 2017
41ff0e1
Use new features from pre-commit 0.15.0
nagromc Jul 18, 2017
eb0c3ba
Update README.md based on maintainer's wish
nagromc Jul 18, 2017
f795097
Add Pyhton 2.7 'enum34' dependency declaration
nagromc Jul 20, 2017
f58b552
Refactor pre_commit_hooks/mixed_line_ending.py
nagromc Jul 20, 2017
4be276c
Remove non relevant comments and make others more explicit
nagromc Jul 20, 2017
ef4a323
Refactor pre_commit_hooks/mixed_line_ending.py
nagromc Jul 20, 2017
4d3d8e1
Remove -v/--verbose option on mixed_line_ending.py
nagromc Jul 20, 2017
f9915cb
Remove _check_filenames in mixed_line_ending.py
nagromc Jul 20, 2017
0e223bc
Refactor mixed_line_ending.py
nagromc Jul 20, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions .pre-commit-hooks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,15 @@
# for backward compatibility
files: ''
minimum_pre_commit_version: 0.15.0
- id: mixed-line-ending
name: Mixed line ending
description: Replaces or checks mixed line ending
entry: mixed-line-ending
language: python
types: [text]
# for backward compatibility
files: ''
minimum_pre_commit_version: 0.15.0
- id: name-tests-test
name: Tests should end in _test.py
description: This verifies that test files are named correctly
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,11 @@ Add this to your `.pre-commit-config.yaml`
- To remove the coding pragma pass `--remove` (useful in a python3-only codebase)
- `flake8` - Run flake8 on your python files.
- `forbid-new-submodules` - Prevent addition of new git submodules.
- `mixed-line-ending` - Replaces or checks mixed line ending.
- `--fix={auto,crlf,lf,no}`
- `auto` - Replaces automatically the most frequent line ending. This is the default argument.
- `crlf`, `lf` - Forces to replace line ending by respectively CRLF and LF.
- `no` - Checks if there is any mixed line ending without modifying any file.
- `name-tests-test` - Assert that files in tests/ end in `_test.py`.
- Use `args: ['--django']` to match `test*.py` instead.
- `no-commit-to-branch` - Protect specific branches from direct checkins.
Expand Down
6 changes: 6 additions & 0 deletions hooks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,12 @@
entry: upgrade-your-pre-commit-version
files: ''
minimum_pre_commit_version: 0.15.0
- id: mixed-line-ending
language: system
name: upgrade-your-pre-commit-version
entry: upgrade-your-pre-commit-version
files: ''
minimum_pre_commit_version: 0.15.0
- id: name-tests-test
language: system
name: upgrade-your-pre-commit-version
Expand Down
212 changes: 212 additions & 0 deletions pre_commit_hooks/mixed_line_ending.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
import argparse
import re
import sys

from enum import Enum


class LineEnding(Enum):
CR = b'\r', 'cr', re.compile(b'\r(?!\n)', re.DOTALL)
CRLF = b'\r\n', 'crlf', re.compile(b'\r\n', re.DOTALL)
LF = b'\n', 'lf', re.compile(b'(?<!\r)\n', re.DOTALL)

def __init__(self, string, opt_name, regex):
self.string = string
self.str_print = repr(string)
self.opt_name = opt_name
self.regex = regex


class MixedLineEndingOption(Enum):
AUTO = 'auto', None
NO = 'no', None
CRLF = LineEnding.CRLF.opt_name, LineEnding.CRLF
LF = LineEnding.LF.opt_name, LineEnding.LF

def __init__(self, opt_name, line_ending_enum):
self.opt_name = opt_name
self.line_ending_enum = line_ending_enum


class MixedLineDetection(Enum):
NOT_MIXED = 1, False, None
UNKNOWN = 2, False, None
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, we could remove the equality hack if we do something simple like "lf wins on ties" or something. Just an idea

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, isn't it platform-related? I mean, Windows users would not appreciate to have their files changed in lf file ending.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they're 50/50 I think it's probably fine?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, mixed_line_ending.py is able to detect LF, CRLF, and CR. So it would be 33% each.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes it even more rare, I'd say just pick one of them if there's ties (since it also simplifies other code elsewhere iirc)

Up to you though, this is fine as is :)

MIXED_MOSTLY_CRLF = 3, True, LineEnding.CRLF
MIXED_MOSTLY_LF = 4, True, LineEnding.LF
MIXED_MOSTLY_CR = 5, True, LineEnding.CR

def __init__(self, index, mle_found, line_ending_enum):
# TODO hack to prevent enum overriding
self.index = index
self.mle_found = mle_found
self.line_ending_enum = line_ending_enum


ANY_LINE_ENDING_PATTERN = re.compile(
b'(' + LineEnding.CRLF.regex.pattern +
b'|' + LineEnding.LF.regex.pattern +
b'|' + LineEnding.CR.regex.pattern + b')',
)


def mixed_line_ending(argv=None):
options = _parse_arguments(argv)

filenames = options['filenames']
fix_option = options['fix']

if fix_option == MixedLineEndingOption.NO:
return _process_no_fix(filenames)
elif fix_option == MixedLineEndingOption.AUTO:
return _process_fix_auto(filenames)
# when a line ending character is forced with --fix option
else:
return _process_fix_force(filenames, fix_option.line_ending_enum)


def _parse_arguments(argv=None):
parser = argparse.ArgumentParser()
parser.add_argument(
'-f',
'--fix',
choices=[m.opt_name for m in MixedLineEndingOption],
default=MixedLineEndingOption.AUTO.opt_name,
help='Replace line ending with the specified. Default is "auto"',
)
parser.add_argument('filenames', nargs='*', help='Filenames to fix')
args = parser.parse_args(argv)

fix, = (
member for name, member
in MixedLineEndingOption.__members__.items()
if member.opt_name == args.fix
)

options = {
'fix': fix, 'filenames': args.filenames,
}

return options


def _detect_line_ending(filename):
with open(filename, 'rb') as f:
buf = f.read()

le_counts = {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can unindent after here, as we've read the entire file by this point and no longer need the file object around (And you can regain a level of indentation)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Done.

I was wondering actually: is it a good practice to read the entire file?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I think it's fine. If it's checked into git there's a reasonable expectation that the file is a manageable size. Most of the other hooks do the same


for le_enum in LineEnding:
le_counts[le_enum] = len(le_enum.regex.findall(buf))

mixed = False
le_found_previously = False
most_le = None
max_le_count = 0

for le, le_count in le_counts.items():
le_found_cur = le_count > 0

mixed |= le_found_previously and le_found_cur
le_found_previously |= le_found_cur

if le_count == max_le_count:
most_le = None
elif le_count > max_le_count:
max_le_count = le_count
most_le = le

if not mixed:
return MixedLineDetection.NOT_MIXED

for mld in MixedLineDetection:
if (
mld.line_ending_enum is not None and
mld.line_ending_enum == most_le
):
return mld

return MixedLineDetection.UNKNOWN


def _process_no_fix(filenames):
print('Checking if the files have mixed line ending.')

mle_filenames = []
for filename in filenames:
detect_result = _detect_line_ending(filename)

if detect_result.mle_found:
mle_filenames.append(filename)

mle_found = len(mle_filenames) > 0

if mle_found:
print(
'The following files have mixed line endings:\n\t%s',
'\n\t'.join(mle_filenames),
)

return 1 if mle_found else 0


def _process_fix_auto(filenames):
mle_found = False

for filename in filenames:
detect_result = _detect_line_ending(filename)

if detect_result == MixedLineDetection.NOT_MIXED:
print('The file %s has no mixed line ending', filename)
elif detect_result == MixedLineDetection.UNKNOWN:
print(
'Could not define most frequent line ending in '
'file %s. File skiped.', filename,
)

mle_found = True
else:
le_enum = detect_result.line_ending_enum

print(
'The file %s has mixed line ending with a '
'majority of %s. Converting...', filename, le_enum.str_print,
)

_convert_line_ending(filename, le_enum.string)
mle_found = True

print(
'The file %s has been converted to %s line ending.',
filename, le_enum.str_print,
)

return 1 if mle_found else 0


def _process_fix_force(filenames, line_ending_enum):
for filename in filenames:
_convert_line_ending(filename, line_ending_enum.string)

print(
'The file %s has been forced to %s line ending.',
filename, line_ending_enum.str_print,
)

return 1


def _convert_line_ending(filename, line_ending):
with open(filename, 'rb+') as f:
bufin = f.read()

# convert line ending
bufout = ANY_LINE_ENDING_PATTERN.sub(line_ending, bufin)

# write the result in the file replacing the existing content
f.seek(0)
f.write(bufout)
f.truncate()


if __name__ == '__main__':
sys.exit(mixed_line_ending())
2 changes: 2 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
'simplejson',
'six',
],
extras_require={':python_version=="2.7"': ['enum34']},
entry_points={
'console_scripts': [
'autopep8-wrapper = pre_commit_hooks.autopep8_wrapper:main',
Expand All @@ -53,6 +54,7 @@
'file-contents-sorter = pre_commit_hooks.file_contents_sorter:main',
'fix-encoding-pragma = pre_commit_hooks.fix_encoding_pragma:main',
'forbid-new-submodules = pre_commit_hooks.forbid_new_submodules:main',
'mixed-line-ending = pre_commit_hooks.mixed_line_ending:mixed_line_ending',
'name-tests-test = pre_commit_hooks.tests_should_end_in_test:validate_files',
'no-commit-to-branch = pre_commit_hooks.no_commit_to_branch:main',
'pretty-format-json = pre_commit_hooks.pretty_format_json:pretty_format_json',
Expand Down
11 changes: 11 additions & 0 deletions testing/resources/mixed_line_ending.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
This line ends with 'LF'
This line ends with 'CRLF'
This line ends with 'LF'
This line ends with 'CRLF'
This line ends with 'LF'
This line ends with 'CRLF'
This line ends with 'LF'
This line ends with 'CRLF'
This line ends with 'LF'
This line ends with 'CRLF'
This line ends with 'LF'
Loading