If you're a software developer/programmer, the open-source distributed version control system called Git can be a lifesaver. How many times have you worked on some code, made changes, then broke things and couldn't undo your mistake? That's a nightmare for developers, especially if the code is in production.
Originally developed by Linus Torvalds (of Linux fame), it was designed by software developers for software developers. It can review the code changes line by line and if you use GitHub or GitLab, you can easily see those changes, approve/discard them, and push them into production using a CI/CD system.
Many Cloud providers already provide this capability and I've used GitHub with Hugo on AWS Amplify to create a CI/CD workflow. It works like a charm; I make changes in my local repository, commit the changes, create a pull request, and then approve it. AWS Amplify gets a notification that the repository has changed and it rebuilds the Hugo-powered website. It's all very efficient and nice.
Since I use this nearly every day in my professional life at H2O, I started to wonder what else can I use Git and GitHub for. Over the holidays I bumped into an old engineering colleague and we went out to lunch. We talked about the latest flooding happening in New Jersey and a host of other civil engineering-related things.
He mentioned that his office was moving even more towards digital. Granted, all the site design and surveying work was already digital, in CADD format using AutoCAD, but more and more the documents for septic designs, and reports were either in PDF, Word Doc format, or being scanned in.
The problem he was having was how to version control everything. At first, a simple file renaming strategy worked, like below:
2023-06-01-filname1.docx OR 2023-06-01-filename1.dwg
After a while, he'd have all kinds of files with the same file extension and different dates cluttering up his server:
Was there a possible solution to handle all this? My first thoughts went straight to Git and GitHub.
Using modern and not-so-modern versioning tools
When one gets the final document or CADD file, it's assumed that all the changes have been incorporated. The way we worked back in those days to make sure all the changes were incorporated was using a check set, a red pen, and yellow & blue highlighters.
A designer would print out a design planset for review and give it to the engineer to mark up. The engineer would review the planset and use a red pen to "markup" any changes, deletions, or additions. Once completed, this planset would be returned to the designer and they would work on implementing the changes.
Once the changes were ready, the designer would print out a revised planset and give it back to the engineer along with the marked-up set from before. The engineer would use a yellow highlighter to mark over his/her red markups to confirm that the changes were completed.
This process would repeat itself many times over the life of a project and it consumed a lot of paper. In today's world, that much paper consumption should be an anathema but it's not, and it drives me crazy.
If the final check planset has been redlined, changed, and yellow highlighted, it would go to to final quality control group. That group would review the redlines, and the yellow highlighted areas and verify that it was indeed complete. If it was completed, they would go over the yellow highlighted areas with a blue highlighter. This would turn the marked-up area green, signifying that it was "good to go."
While this method is a good process, it doesn't address the digital aspect of all these files. For example, the designer probably has versions 1, 2, and 3 of CADD files or other binaries for each step of the review process. On top of that, their desk and the rest of the office become awash in a lot of paper.
So is Git the right solution here?
The answer is: maybe.
Maybe it's just easier to use a red pen, highlighters, and a lot of printer paper?
Git and GitHub versioning control strategy
I'm willing to bet that any reader from the civil engineering industry has at least one "junk.dwg" file. I know I've had dozens in file directories locally and on the network. Every CADD manager at an established firm will have some methodology employed to keep his or her files organized. They would religiously backup the files on "tape" and if a digital file got destroyed, they'd have to find it on the backup take and restore it.
That process was cumbersome but it worked.
I thought to myself, there's got to be a better way so I started looking around the Internet to see if Engineering companies were using a better version control system. It turns out that at least 9 years ago the conversation around using Git for versioning files was being discussed.
While many of the commenters on Reddit liked the general idea of Git, they didn't like the fact that it couldn't show the differences between one CADD file version and the other, much like it does for software code. That, of course, was and is a perceived weakness in using a software development tool for something other than its intended use.
However, I believe that Git has some value to provide. With proper commit messages and branches, you can control documents and CADD files as a whole. Yet this process is also its biggest weakness, it's a very linear flow to version control of binary files and not code.
For example, you might have version 1 of a proposal that gets committed as a DOCX file but then make updates to the original file under a different branch that's named "new wage rates."
Sounds great until you realize that the binary file you updated gets added to the commit history. If you're binary was 1 GB in size and you changed it and updated the repository, then you have two 1 GB binaries, one of the original binary and a second updated binary. This can lead to huge repositories and you'll need a lot more storage. Of course GitHub now offers the large file storage (LFS) option but is this committing large binary blobs the right way to think about this problem?
I don't think so.
So what's the solution? I'm not sure there is a perfect one. Perhaps the answer is, "it depends." Maybe it's a hybrid approach?
Is a hybrid approach the right way?
Without adding another layer of process, a hybrid approach is tantalizing to think about. This approach would use a combination of cloud backups throughout the life of the project and Git for archiving the final version. For the most part any designer and developer will be working over multiple versions and iterations of their code or drawings. Why not create project working directories that automatically backup all your work during the day?
There are tons of free and paid services like Dropbox or OneDrive to let you backup in the background and can save your butt if something bad were to happen to your network or computer. Once the project is final and needs to be archived, then you can push it to Git, with one set of binary blobs and everything.
But that adds just another layer of process, what if we streamlined this even further and dropped Git all together?
Dropping Git altogether for cloud backup
I started this article trying to understand if GitHub is the right way to handle version controlling for CADD drawings and project related files. Git works great for software but was never designed for binary blobs like *.DWG files. You couldn't see what was changed in the binary file so what good is it?
In my search for a versioning tool was I trying to cram a metaphorical square peg into a round hole? Was I adding another process layer that wasn't needed at all? Was using cloud backups in near real-time a better option?
Yes. Yes to all those questions.
The answer is as simple as using cloud backup storage system that syncs your project directory throughout the day as you work. Enforcing a naming (and/or date) convention for versioning control of binary objects will be highly effective and keep things organized. Then instituting a deletion of "junk.dwg" and other CADD drawing versions will need to be done on an annual basis to free up storage and keep things organized.
In summary, you could use GitHub and Git for your CADD files but it merely adds another process layer than can be easily handled with cloud backup and naming process for your CADD files. After all, why complicate things when you don't need to?