-
Notifications
You must be signed in to change notification settings - Fork 101
Description
Hi,
I work building (big) Excel financial models and source control is a nightmare. I love git (for small personal coding projects) but it doesn't work with Excel, as we all know. I came across your project yesterday after following a link from http://programmaticallyspeaking.com/git-diffing-excel-files.html and it's a bit of a revelation for me.
I've been giving some thought to how I'd approach making your tool more functional for the types of spreadsheet I work with and below are my high level thoughts on how the tool could be structured. The key idea being to track transformations first and then cell diff (hopefully) aligned cells. I know you have this on your to do list and to my way of thinking tracking cell transformations requires quite a different program structure, which as well as being difficult is possibly why you've not got round to it yet.
I know nothing about java and am not a trained programmer but am happy to try to help out (day-job workload permitting) if you're keen. Alternately if the changes are too radical/not where you want to go I could just fork you and go my own way. Let me know.
Anyway here's my thoughts:
Maintain 3 spreadsheets
- From spreadsheet (immutable)
- To spreadsheet (immutable)
- Working spreadsheet, a clone of the From spreadsheet to start with but is gradually edited by applying the changes identified in to two transform sections below
FIND TRANSFORMATIONS
idea is to repress high-volume cell difference reporting that can be easily explained by the single transformation that created the bulk changes
SHEET TRANSFORMATIONS
- (REPORT) Sheets re-ordered
- (REPORT) Sheet renamed
- (REPORT) Sheet copied
- (REPORT) Sheet deleted
CELL TRANSFORMATIONS
- (REPORT) Rows added
- (REPORT) Rows deleted
- (REPORT) Rows cut & pasted (i.e. moved)
- (REPORT) Rows copied & inserted (i.e. duplicated)
- (REPORT) Columns added
- (REPORT) Columns deleted
- (REPORT) Columns cut & pasted (i.e. moved)
- (REPORT) Columns copied & inserted (i.e. duplicated)
- (REPORT) Cells deleted (shift left)
- (REPORT) Cells deleted (shift up)
- (REPORT) Cells inserted (shift left)
- (REPORT) Cells inserted (shift down)
- Possibly need to run this loop 2 (or more) times per sheet to catch instances where, for example, columns have been inserted and then rows
FIND CELL DIFFERENCES
Can now compare the cells in the Working spreadsheet to the To spreadsheet
- Need a method to translate the cell address through the transformations to infer the original cell reference in the From spreadsheet (i.e. reverse apply the transformations) for reporting
FORMULAE CONDENSE
- Condense the sheet down to just unique formulae on the sheet and the blocks of cells that all share the same formula
- This is important as in big Excel files it is common practice to fill formulae. For example a spreadsheet might have 500 columns and each row will have a filled-across formula. If the diff reports on each of the 500 differences (per edited row) then you'll spam the output and not be able to easily determine what's changed
- In VBA you'd do this by comparing the FormulaR1C1 property of the cells for equivalence
- Output should be a paired list of (block) cell references and the formula in the top, left cell of the block
CELL DIFFERENCES
- (REPORT) Where cell blocks with matching formulae have been extended or reduced
- (REPORT) Where matching cell blocks have different formulae
- (REPORT) Where non-matching cell blocks with different formulae (need to think about this one a bit)