Storage, Part I of Many

While on a break from blogging, I’ve been churning about all things In Perpetuum. By churning, I mean creating lots of digital audio and video files, making a short movie, and sipping a campari and worrying about formats and storage. In these dog days, then, when it doesn’t hurt to prop the laptop on icepacks, I am taking steps to Do Something.

Lo these many months ago when I conducted an inventory of my materials, I found that I had different priorities and strategies for preserving music, video, captured sound, and text files. After exploring how best to optimize my devices to capture the best quality audio and video (the better to preserve), I floundered on how to store and manage the files. None of the options seemed like The One Solution.

Since I’m somewhat recently come from an academic institution, I assumed that having The One Solution (supported by central IT, of course) was the only solution. Silly me, forgetting the purpose of this exercise: Save my stuff sans the infrastructure and resources that public and private institutions afford. I’m taking one for the team. In which case, there isn’t The One Solution maintained by central IT. There are many solutions, and it’s complicated enough that I’ll have to write a plan just to remember what to do. The basic parts involve:

  • two 1-terabyte drives that rotate between my domicile and a safe deposit box at my bank
  • 1 terabyte of space in the ether (see comments below)
  • a new laptop to handle video and data processing
  • two old laptops that manage music (the oldest one) and travel (the second oldest one) – of course, backed up to the drives and the ether.

What – you don’t save your old computers?

I’m almost ready to blow my tax return on digital and analog storage, and I’ll provide the numbers in a subsequent post. But I’ve been reading and thinking, especially about online storage, and here’s what I’ve learned:

There’s a language problem: Web hosting. Cloud storage. Network storage. Everybody says they has it cheap, though their grammars aren’t always rights, and theys seem to have many company for one services. (note to self: look at DNS Registration). Everybody says they gives you tools to manages your stuffs. And it encrypted. And it unlimited. Caveat emptor. The web hosting peeps don’t do data storage, but they’re keen on SSL and will let you FTP an unlimited amount of stuff to their site. They do want to throttle traffic and maintain service for everyone who’s hosting a site on their servers, and they do promise much uptime. The cloud/network storage peeps don’t trust you with FTP and want you to use/download their synching tool. They have good thoughts about security mostly, but storage space is at a premium, and they don’t cop to how long it actually takes to send 1 terabyte of information into the cloud for storage. (Note to self: check bandwidth of ISP.) Several sites review web hosting and cloud/network storage options, though it seems that there is a “reward” for reviewers. Mad props to the peeps on the MacRumors forum (http://forums.macrumors.com/showthread.php?t=1127908) who suss out language and storage options.

There are choices to be made: Just because you’ve created an array of online and offline storage options doesn’t mean you can FTP and overwrite willy nilly. Should there be an Ur machine that contains everything? What if you have a video file from, say, a Cannon FS200 that saves a .MOD and a .MOI for each video clip. These files are virtually unplayable (thank goodness for VLC) and un-editable unless converted to another format. What do you save in your 3 1-terabyte locations: the .MOD/.MOI files; the converted .DV files; the edited and marked .DV clip that you plan to use for a movie? What about the original file on the SD chip? Safe deposit box?

There are other lessons, but this post is long enough, and the WordPress servers are limitless(!), so I can post more thoughts on storage. The lessons learned, for the moment: I have further confirmation that it’s best to make a storage and preservation plan for each type of item that’s dependent on its whole life cycle. Somehow, finding a way to print my high school thesis in Word Perfect 3.x seems like a piece of cake.

Optimal Settings for Image-Capture Devices

1. What are the optimal settings for my image capture devices to promote saving the images, including moving them through several environments? Brew a strong pot of coffee and read on.

After much reading… if you’re going to preserve digital image files, they should 1) be high resolution, 2) contain all of the original pixels, 3) have been minimally manipulated (or not at all), 4) be in a format that accommodates 1) and 2), and 5) be in a format that is not hardware- or software-dependent. The successful combination of these factors should result in a file that can migrate across multiple environments and be viewed in standard image software or on the web.

In the case of my Sony DSC-W5:

  1. The image should be high resolution: The image size should be as large as possible. For this camera, that setting is 5M. Other options were 3M, 1M or VGA.
  2. The image should contain all of its pixels: Briefly, the compression should be low in order to record a high quality picture. This setting is Fine (FINE in the menu).
  3. Less briefly, digital images are comprised of pixels, and each pixel contains 3 bytes of data to capture red, green, and blue. An image that contains all of its bytes creates a large file. The more you compress the pixels, the smaller the file but the more information about the image you lose. A lossless compression algorithm discards no information. It looks for more efficient ways to represent an image, while making no compromises in accuracy. In contrast, lossy algorithms accept some degradation in the image in order to achieve smaller file size. It’s difficult to talk about compression without veering into details of camera sensors, image structure, file formats, etc. I gathered basic info from Bob Atkins’ and Rick Matthews’ sites, but there is a gloriously detailed discussion of color sampling, albeit for video, from Karl Soule.

  4. The image should be minimally manipulated: Move images regularly off the camera into an appropriately named folder on a computer (e.g., yyyymm_archiveImages). I’ll discuss other storage steps elsewhere. If there are images to manipulate, copy them to a separate folder (e.g., yyyymm_workingImages). There is much discussion about the extent to which you can manipulate and save files without experiencing degradation. I’ll save that for the detailed workflow process. In short: Do NOT import directly into iPhoto or image viewer programs. Do NOT use the manufacturer’s software to manage images. Do NOT save the photos to a MyAnything folder. DO establish your own folder structure and transfer the files from camera to computer.
  5. The image should be in a format that accommodates 1) high resolution and 2) low compression: The Sony DSC-W5 creates JPEG files, so I have to make do. Some digital cameras create RAW files, which are proprietary to the manufacturer and therefore in conflict with 5)…
  6. The image should be in a format that is not hardware- or software-dependent: While JPEG is a common format, TIFF or JPEG2000 would be best. There are lots of resources that explain the differences between image file formats, but the US Library of Congress has a great page of Format Descriptions for Still Images. Also, the UPDIG Photographers Guidelines have a great discussion of archiving with regard to formats from a professional photographer’s point of view. Mad props to Ken Fleisher and Peter Krogh for this work.

Summary: I’ve configured my camera to ensure the best quality images for preservation. However, I can’t be guaranteed that the JPEG format will continue to be a standard in the future. If converting the JPEG images to TIFF won’t increase the image quality (but might increase the file size, requiring more storage – something to test), then the only reason to convert all of my images to TIFFs is to feel secure that I’ve done everything possible, at the file level, to ensure their preservation. And, even if the files are converted to TIFFs, I’ll continue to not overwrite the JPEG images on the storage Memory Stick(s), I’ll continue to store the original JPEG images in separate archival folders, and I’ll continue to backup those folders in different locations. Is it worth it to replicate this process for TIFF files? I’m not sure. More in the next installment…

An excellent read from the Library of Congress: Guidelines for Electronic Preservation of Visual Materials

What’s the goal for my digital images?

I still need to post the results and added questions from the Born Digital Survey that I modified. At the end of that exercise, I created a more succinct goal specifically for digital images, and then developed some guiding questions to structure the rest of my work. Tortuously thorough, I know.

Goal for Digital Images: I want to have an export (save-from-camera), storage, management, access, and sustainability plan in place that let’s me, or trusted people, manage and see digital images independent of a digital device or network at any time (within reason) and a migration plan that ensures the images are accessible and manageable at least 50 years in the future.

  • Priority: Remove redundancy in existing digital images.
  • Priority: Maintain datestamp and time for all photos.

Guiding Questions for Digital Images: I’ll answer them in separate posts and sub-posts.

  1. What are the optimal settings for my image capture devices to promote saving the images, including moving them through several environments?
  2. What is the process for moving digital images off my digital camera?
  3. What is the process for moving digital images off my cell phone, out of email or chat sessions, or from another mobile device?
  4. How should these images be organized once they are off the individual devices (e.g., file/folder structure)?
  5. Do I want to modify the filenames?
  6. Do I want to / need to convert the images to a different format?
  7. How will I indicate the priority level for “saving” the digital images?
  8. How will I indicate the access level associated with the digital images? Should this occur at the folder level?
  9. How will I indicate the rights associated with the digital images?
  10. How do I ensure that the image metadata moves with the image itself?
  11. Where should the digital images be stored: for organization; for migration; for immediate access; for long-term access?
  12. Who should assume control of my digital images if I die?
  13. Who should have access to my Flickr account if I die?
  14. How much storage space do I need to plan for now and going forward?
  15. What’s a reasonable cost to establish and maintain a preservation program for my images in terms of my effort, hardware, software, storage, and general goodwill of friends and family?

In Perpetuum Week 2 Report

The primary Week 2 activity was to take the “Born Digital Blog” AIMS survey and modify with supplementary questions as needed. The goal was to have a more structured survey to provide more context for my informal Materials Inventory. Unfortunately, taking the AIMS Survey was not as bounded an exercise as I’d hoped. After struggling with the broad scope of the survey, I found was easier to begin modifying the survey with respect to images, while trying to remain media-neutral, than to answer the whole survey for all of my materials.

Ultimately, I modified the two-part AIMS survey by:

  1. combining “digital environment” (e.g., hardware, software, back-up) questions from Parts I and II for capture in a spreadsheet;
  2. answering relevant “digital creation, use, and management” questions from Part I while interjecting my own; and,
  3. creating a list of “personal context” questions that reflect my priorities and process activities specifically for images.

The final result yielded answers that were structured in such a way that I could better act upon them (the difficult part!), which will be chronicled in subsequent entries.

Brief Analysis of the AIMS Survey for Personal Preservation

The survey is split into two sections. Part I begins with the note: This part of the survey is designed to be a prompt sheet for phone / face-to-face interview with donors by curators / digital archivists. Since I am playing both roles, the asking/answering is not such a fraught event as a one-time donor/curator conversation might be, though the survey asks for follow-up contact info. However, one of the lessons learned from acquiring and cataloging digital materials for an institutional repository is that at least one or two exchanges are required about the materials, the metadata, the policies. For a donor with even a minimal amount of hardware and history of creating digital files, I imagine this would be a prolonged conversation. In fact, I’m curious about the phases of an inventory / acquisition cycle before there is a complete hand-off to the institution, a very human-intensive process. Certainly schlepping materials off a few hard drives makes for quicker acquisition, but time invested at the beginning of the inventory and during acquisition might result in easier organization and less uninformed forensic work in the lab. Part I was very useful as a prompt to consider the range of digital environments where a donor’s content might reside and the informal policies or practices a donor might have regarding use.

Part II begins with the note: This part of the survey is designed to be filled out by digital archivists regarding technical details of the tools used to create digital material. This section was also useful as a prompt to consider hardware, software, networking, internet access, and security issues. However, this information begged to be organized in a spreadsheet, and there was some information from Part I that would make more sense when combined with Part II. Also, there was information from my informal inventory that I wanted to capture, hence the new spreadsheet.

The Born Digital Blog mentioned in a late 2010 post that the AIMS survey was to be put online with a database backend, but I can’t find the exact post at the moment.

I don’t mean to sound like I’m hating on the AIMS survey. It was developed and modified by two very thoughtful groups of people who are working in a different context from me and who have other pressures as well (e.g., library directors; institutional missions; project partners; funding requirements). Thanks to their hard work and willingness to share, I can adapt the survey for the home user or the lone preservationist.

Per the document, “This work is based on the Paradigm records survey published by the Bodleian Library, Oxford University.” Further, “This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License. Revision: July 16, 2010. Born Digital Collections: An Inter-Institutional Model for Stewardship (AIMS).”

Next posts will be the actual artifacts (spreadsheet, modified questions with answers, personal context questions) and some decisions about images.

Materials Inventory

As noted in the In Perpetuum project plan, this week’s activity was to take an inventory of my materials that I want to, shall we say, save. “Preserve” is such a loaded term and doesn’t fully express what I mean. Rather than give a blow-by-blow of this watching-paint-dry exercise, I’ll report some of the highlights, observations, and next steps.

Compulsive, but deductive, organizer that I am, I began by listing the information about the materials to capture. Whoa there, it’s not metadata yet. The result was a more thorough list of my materials (digital and analog) to save, how I’d like to sort the eventual metadata (by Type, Level of Access, and Priority), and other bits of context to capture in the inventory (e.g., dates, type of use, end goal for material, current storage, file type, OS, priority for saving, and unit(s) of measure (total can of worms)).

I was mostly successful in capturing what was outlined. The inventory itself took about 2 hours. Useful items: pen, paper, tape measure, lots of floor space, dust-free area, dust cloth, plastic bag and tape for batteries (tape nodes, save for recycle). I was remiss in not making photos but will do a better documentary job as this progresses. In fairness, I spent several days over the past two months consolidating files on my external drive and computer, so 2 hours is the culmination of a week of work.

Observation 1: In another post, I’ll summarize the amount of storage on all my devices and how much space I’m using. Don’t yawn. The units of measure don’t permit an apple-to-apple comparison between digital and analog when planning for saving or storage, but there is overlap between the two: a 250GB external hard drive in its box also takes up a 10×5 inch space on a shelf.

Observation 2: There is A LOT of redundancy in what’s been saved to date, at least for images. I don’t delete and re-use the image cards from my digital camera, and the images are on a computer, an external hard drive, and Flickr. What to do?

Observation 3: However, there are other files/folders for which I have only one copy. The worst case: At some point I put files from 1987-1999 that were on 3.25 floppies onto a PC and have migrated that folder through at least four computers (Mac and PC). It lives on the external hard drive only.

Observation 4: After the inventory, I was a bit overwhelmed. It’s a royal pain to manage my materials, and I like doing this. What is everyone else doing, and what’s being lost? I made the “fire list” of priorities in case I run out of steam, and go figure, I want to save all the materials on tape and film. I’m less concerned about the digital materials (except for my high school senior paper in Word Perfect). In twenty years, will today’s kids have to worry about their term papers being inaccessible in Google Docs? For the record, I would save: an interview with my great-grandmother on micro-cassette; a folder of “vital” documents; all pictures; all video; mix tapes (audio cassette); home VHS tapes; vinyl; my iTunes library; master file of work notes; email; work documents

Observation 5: I’ve spent a lot of time burning email to discs and exporting it from one machine to another, but it’s a really low priority to save. In the interest of “saving” (my time, trees, energy) should I print that special email from my mom, put it in a folder, and delete the million other mundane messages? Also, saving things that I’ve published is dead last on the list. Sorry open access folks, but right now I’m counting on publishers (open or not) to perpetuate the academic record of my work.

Observation 6: Take out your batteries! Especially the alkaline ones if you haven’t used the device for more than a year. I understand some people (not me) like to lick the white stuff that flakes off corroded batteries. Save them, and you and your materials, a trip to the emergency room, and remove the batteries from devices you don’t use but won’t throw away, ahem, recycle.

This post is way too long, so I’ll put next steps in another segment and will post the spreadsheet with inventory. Scintillating reading.

Personal Preservation

I’m beginning a new project to explore the options for preserving my digital (and maybe analog) stuff: images, videos, and texts. This three month, proof-of-concept endeavor is a classic plan-to-plan activity, but at the conclusion, I should have a solid idea of what I can do, as a layperson, about digital preservation and a plan for doing it. I’ll post the project plan shortly.