An Environment Baseline is the analysis and data collection effort that describes the existing content. The baseline defines the starting context for the content migration. Any content repository which has grown over time, with contributions from many people, will conceal some surprises. In defining the Business Requirements for a Content Migration, you best start with an accurate depiction of that content.
There are four key points to building this environment baseline:
- Build a detailed description of the entire structure.
- Patterns or chaos – how simple is it?
- Challenge your own hypothesis.
- Keep your eyes open, and tread lightly.
Detailed description of the entire structure.
Collect information on existing network drive structures. Collect information on file types/extensions. It’s extremely valuable to have a searchable and sortable listing of each area of storage – the names of the folders, the numbers of files and types within each. By creating text files that represent the levels of folders, you can observe numerically – statistically – what structure may exist and where files are commonly stored. In a Cersys content migrgation from a large file system, for example, we use file system scripts to create tab-delimited text files, that can subsequently be sorted, split into levels, etc.
Patterns or chaos – how simple is it?
This is the beginning of the analysis, leading into the business requirements work. As you collect the baseline information, watch for emerging distinct structures or patterns of folder use. How many of these can be parameterized with a relationship to existing metadata. Or, perhaps new metadata should describe that structure. In other words, the structural information is captured and transformed into metadata.
Other times, the variation represents far less business value. In that case, it is wiser to collapse exceptions into a simpler set of metadata. This is a cleaning rather than preserving.
Challenge your own hypothesis.
What scenarios support it? Find them in the raw data.
What would break it? Look for those in the raw data. Check the last edit dates – there may be pockets of ongoing activity in what most people would assure you were only archives.
Keep your eyes open, and tread lightly.
Don’t assume that all the users know where/what all the files are. Don’t assume that anyone knows where/what all the files are. There may be preconceptions about the files – that they are, or are not, already neatly organized, that important data is never kept on user hard drives, etc.
This article is part of a series discussing many aspects of content migration from an unstructured storage system to a document management system. The overview page is here: Content Migration to a DMS – Articles.