Even among very tech-savvy engineers, there is still a lot of confusion between what it means to store design information in a file versus a database. In a recent LinkedIn discussion that popped up in my feed, I noticed the following question:
“I'm trying to find terminology to help differentiate whether data is stored in individual CAD files or in a database. I don't want to confuse individual files stored in PDM / PLM as ‘in a database’ but I'm struggling with how to make it clear. How can I ask that in a way that a typical engineer would understand?”
My first response to that question would be: “Why should engineers need to know the intricate details of how their data is stored?”
All engineers really need to know is what they can and can’t do with their data and the implications that may have on their daily workload. They may need to know how much setup is involved and if a certain workflow or sequence of events must be followed in order to make the system work the way they want. The more an engineer has to think about data management, the more it negatively impacts their design processes, their deadlines and potentially, their budgets.
Making a clear distinction between files and different types of database storage is indeed a complex issue. Judging by the responses (more than 100 comments at the time of this writing), it seems that everybody understands the notional differences, but their personal experiences and interpretations vary wildly. Some may dismiss this as semantics, but there are clear advantages to one particular method of data storage. In any case, these are the finer details that engineers should not have to worry about or get involved in.
So let’s review the different ways that CAD data can be stored and how they impact your business.
This is the easiest one to explain. Anybody who has ever used a computer to create something – an image, a spreadsheet or a CAD design – has saved their data into a discrete file on their local hard drive.
Each file type has its own unique data structure that records all the information in a format only the software that created it can understand. Therefore, anybody with the same version of the same software as you can open and edit your file. Proprietary file structures can also be easily reverse engineered to extract the necessary information (the DWG file format being the prime example).
Most experts would agree that computer files are an extremely insecure method of storing data.
Files are easily mislaid, corrupted or overwritten. Managing files using Windows Explorer is the stuff of nightmares, especially when you’re trying to find the latest revision of a drawing to manufacture or you have multiple designers working on the same project. Files get copied and emailed everywhere and nobody knows who has what or which copy is the latest version. That’s why every CAD vendor has its own bolt-on Product Data Management (PDM) software that they can charge extra for.
This statement is misleading. Your data is not stored in a PDM database, it is still stored in files on a file server with all the inherent issues mentioned above. What is stored in the PDM database is data about your data, otherwise known as “metadata.” It’s like keeping a Rolodex or library card system to keep track of where files are kept and a brief summary of what each file contains. To find a file, you consult your index card to find where the file is stored and then copy it to your local hard drive, known as “checking out” the file (just like a library book).
All PDM databases use what is called a relational database. These databases store metadata in fixed tables with rigid schemas and pointers linking multiple tables together. This enables data to be structured and categorized so that it can be indexed, searched and easily manipulated using Structured Query Language (SQL) transactions. For a CAD file, a table may state what type of file it is, where it is stored, list all of its custom properties and have links to the assembly or project that it belongs to. A simple SQL query is able to extract all the information about a file and its relational hierarchy to all the other files in your database.
This means that finding files is fast, but in order to work on them you must check them out and have them copied locally to your hard drive. Keeping with the library book analogy, once a copy of the file is checked out, it is no longer available. You can still see the library card data and a view-only representation of the file, but the file itself cannot be copied or edited.
Checked-out files are locked by the PDM system to prevent others from checking them out, editing them and overwriting your changes. Nobody can work on a file until it is checked back into the PDM system and unlocked. This mechanism ensures that those who have files checked out can be easily traced, files can be revision controlled and conflicts between design teams can be avoided. In practice, this becomes more of an obstacle than a benefit as locked files prevent others from working, resigning them to wait until the files are checked back in and unlocked before they can get edit access. This forces a serial workflow causing bottlenecks and unnecessary delays. The bigger the team and the more agile the design process, the bigger this problem becomes.
Files, once checked-out, are uncontrolled. They can be copied and emailed at will, posing another huge security threat and the possibility of a supplier manufacturing the wrong version of a part.
This statement is also misleading. Some CAD systems have introduced a “new” type of file storage system – the so-called “file-less,” “no file” or “zero file” database. The end user only ever interacts with the PDM interface and never sees the actual CAD files. However, the files are downloaded in the background when a user wishes to work on them. If you know where to look, you’ll find them hidden in an obscure area of your hard drive. While these files can be technically referred to as a cache, they still contain the editable CAD data and each file must first be downloaded before the CAD system can open it. This type of system is used in Dassault Systemès’ 3DEXPERIENCE Platform, as noted by a commenter in the LinkedIn thread below:
There are advantages to this method of data storage. First, there is a central file store so that anybody can get the data they need without having to worry where any of the files are located (the same as regular PDM, really). Second, the cached files are sufficiently hidden and obfuscated so that they can only be opened by the installed CAD system and therefore cannot be emailed.
The downside to this method is that it doesn’t really offer any real benefits to the engineer. When a part needs to be edited, it is downloaded locally on the user’s hard drive and locked so that nobody else can edit it. Sound familiar? This is akin to checking out the data from a file-based PDM system. Once an edit is complete, the part file is committed back to the database (a.k.a. save and check-in) which, for a large file, can take some time while the file is copied back to the server over the network or internet.
I can see why people are confused and struggle to understand the differences.
Since this type of database is based on SQL with rigid schemas that cannot be easily changed, just like regular PDM, this type of system requires periodic maintenance downtime to install service packs and upgrades.
These are old technologies with a new lick of paint.
The final type of database structure we will discuss is the one that, in CAD terms, is unique to Onshape.
Onshape uses a document-oriented database model which supports various forms of data and completely flexible schemas. It is a highly performant and distributed non-relational database that is used in big data applications and other processing jobs involving data that doesn't fit well in a rigid relational model. Instead of using tables and rows like relational databases, a non-relational database architecture is made up of collections and documents.
“So what?” I hear you say.
This fundamental difference is what enables real-time collaboration, simultaneous editing, instant and secure sharing, version control and release management. No other product development platform comes close. There are definitely NO files here.
In layman’s terms, this means that an entire design team can work on the same project, same assembly, same part and even the same sketch if need be. At the same time. Nothing is locked. All design activities are carried out in parallel – as changes are made, every action is recorded in the database and instantly updated wherever it’s used. There is no save button, no check-in / check-out, no accidental overwrites, and no waiting around for someone else to finish their work before you can start yours.
This enables teams to co-design complex parts and assemblies without having to be physically in the same location. Since every design change is recorded, conflicts are easily resolved. Your team can experiment as much as they like, either in the same workspace or in their own branch, confident in the fact that any errors or bad decisions can always be undone. In short, Onshape gives you unlimited undo/redo.
Sharing data with colleagues, suppliers or customers is simple and secure. No CAD data ever leaves our servers. Just like Google Docs, all you need to do is enter a person’s email address, set view or edit permissions and press “Share.” Clicking on the email link will open your design in a web browser or a mobile device. No software or downloads required. This enables design teams to work together from anywhere and design reviews to be carried out in real time on any device. Everybody works on the exact same document, not different copies of the data. Access is just as easily revoked.
These are just some of the powerful, time-saving benefits that Onshape’s unique database architecture delivers. And I haven’t even mentioned CAD.
My explanation above may be a bit long, but to fully understand the differences, one or two sentences just won’t cut it. But I’ll give an abbreviated answer a try anyway:
“All CAD systems, regardless of their PDM technology, store data in a file and suffer the same file-related problems to varying degrees. The exception to this rule is Onshape… (see above).”