We have over the last few weeks run a number of workshops at Ashridge Business School to plan out how to tackle the daunting problem of narrowing down the research we need to do on this project.
For those that attended from Ashridge Business School, PERA, Pearson Education, Redtray, 2SMS, Lifecycle and our team at Milamber – thank you.
We made good progress!
The biggest challenge we have on this project is its scale so we need to work out what is the most important research we must conduct first, which will have the greatest impact.
Ashridge is an amazing place – the perfect location for debate, reflection, and sorting out ideas. Ashridge itself is set within 190 acres of beautifully kept gardens which are also housed within 5,000 acres of National Trust estate.

Ashridge Business School was founded in 1959 and is now ranked number one in the UK for tailored Executive Education in the 2009 FT rankings. However, the estate dates back to the 13th Century.

So within these settings we started to contemplate the issues we faced.
Our Project is all about creating a next generation Digital Library – a global repository for learning – and the point of an EC funding project is that in order to develop the solution properly we have to submit a proposal to justify the research we want to and work out how best to do it.
One of the critical issues that we face in building a Digital Library is the Digital Preservation of the content.
We mentioned scale earlier so let us start by understanding the scale of the problem.
Imagine if you will education or learning content from all over the world being housed in a library. We can imagine and visualise a library of books because most of us will have seen one. But now add to this an archive of digital images, pictures, video’s, short texts or whole documents, programs, assessments, simulations, games, audio, in multiple languages and much more.

Imagine if you could upload content to the library from your own organisation – how much knowledge based content or education based content does an organisation like Microsoft or IBM house itself?
Or what happens if with current technology you can upload to the library your own personal content – your notes on lecturers, your lessons on business or life.
Now imagine hundreds of thousands of organisations uploading their learning content and/or literally millions of people inputting their own learning be it in text, audio, or video formats into this library.
You start to understand the problem of scale.
So just housing this content creates lots of problems.
We have to catalogue the library and create links and tags so that people can easily find what they need when they most want it.
We have to find ways to navigate through vast amounts of content to get the right content for you when you need it, in the right format.
We have to preserve the content because digital content is dynamic it evolves and that opens up another can of worms.
Besser’s 5 Problems
Howard Besser Professor of Cinema Studies and Director of New York University’s Moving Image Archiving & Preservation Program (MIAP), as well as being Senior Scientist for Digital Library Initiatives for NYU’s Library sums up the key issues facing preserving digital assets or content into 5 key areas:
a) The viewing problem – Digital content needs technology to view them. But technology evolves so fast (software/hardware/formats) – will the technology be around in say 20 years when you want to view the data.
Who remembers Cine Film, or Betamax video tapes?

Another example that is around today is web video content. Video can be housed in different formats. The most popular are; Windows Media Player, Quick Time, and Flash. So what happens if one person edits the Flash version of a piece of content and another person edits, with different cuts, the Windows Media Player version. Two different derivative pieces of content have now been created out of the one original. Ten years from now will all three pieces of content still be around so we can see how the edits have changed from the original?
b) The Scrambling Problem – Content is compressed or scrambled to assist in storage or to protect the intellectual property in the content. The algorithms that do the compressing or scrambling, again change over time or are no longer supported. So for example if the company that produced the algorithm or software to scramble goes bust the content can not be unscrambled or uncompressed and you can’t read or access the content and if you do so by ‘unwrapping it yourself’ you could legally be breaking copyright laws by doing so.
c) The Inter-Relation problem – Digital information or content is often linked to other items. If links are not maintained, then the core information is incorrect, incomplete, or does not make sense. Example – where a document links to a web page that has died.
d) The Custodial Problem – who looks after the digital document? Do we allocate librarians to do this? And if a change is made to a document do we need to keep versions of the subsequent documents.
e) The Translation Problem – several issues here, if software is used to interpret content and software changes version to version, could the content be changed, and how meaningful are the changes. What about translating from languages – different people interpret different meanings in words, and as we translate the meanings change and are these minor or material changes?
Context
Aside from these 5 issues Besser alludes to another major problem in the preservation of digital content. It is the context of that content and because digital content is dynamic and easily changed, it evolves so fast that we easily loose the original meaning and the original context through its evolution. The more people that touch or edit the content the faster we lose control.
For example if you have a group of people in a circle and one person starts by whispering a phrase to the person next to them, and they pass on what they have heard to the next person, and you go round the group from person to person in the same way – you never get the same phrase at the start as you do by the time you finish. In fact the differences in meaning by the end can be hilarious hence it is played as a children’s game.
Over the years we have used oral traditions to pass on stories, or knowledge from one generation to the next. Elders would seek out protégé’s to pass on parables, or stories entwined with meaning or holding knowledge. The Elder would make sure that by repetition and oversight the protégé learnt the oral history accurately to pass on the communities wisdom. It was and still is considered an honour and a mark of responsibility to be given this knowledge.
If you think about it. The passing on of knowledge in this way is in itself an important process to which parameters must be adhered to show respect for being given that responsibility. In this way the culture surrounding this process has created a framework for preserving the authenticity of the content or stories being passed on.
Our challenge is we must do the same in the digital world.
Royalties and who owns what?
In today’s world someone usually owns the underlying content – it might be an individual, a media house, a company or an organisation. For example if you buy a book several people or companies get paid for your use of that content in the case of the book – the Author, the Agent, the Publisher, and the Distributor.
So to start with we have to find a way to track the ownership of the underlying content. That can be reasonably simple if it is a simple item like a book, but it gets more complicated when we start to take apart content and begin to mix and match content together to create hybrid products of multiple pieces of different types of content which we can do when they become digital in structure.
In the digital world when we break down a content asset into objects e.g. an object may be a single chapter out of a book, or a case study or perhaps a short video. We have to keep track of where that content object came from and how much we are using and who owns that content because if we are going to make money out of that object we will have to pay the underlying owner a Royalty.
Because digital content is dynamic and can change by the use of authoring tools so easily we find our selves with new content derivatives made up of several objects of content – we call these complex content objects. We can take a chapter of a book, a video and a case study and together we have created a new complex object – what we call a “Nugget”. Again if you think about it several parties could have contributed content to create that nugget and our system or digital library has to track who owns what so that if we make money out of that content or nugget we can pay Royalties back to the owners.

Effect of Communities on the Growth of the Library.
The corporate world is made up of Large Enterprises (LEs), which can have thousands of employees and Small and Medium Enterprises (SMEs), which can be extremely small in the number of employees they have.
LEs commonly access learning and training content from Learning Management Systems, the Internet and Intranets. Such content is protected by a number of security layers and firewalls. Access to the content and how it can be used is usually protected by HR and IT departments.
User-generated content such as content produced through video uploading and sharing, blogging and writing articles, image and link sharing is difficult due to the stringent access control. Therefore, it is difficult to add additional digital content by users to content already existing within large corporations.

SMEs, in contrast to LEs, not only access most of the digital content they need via the Internet using laptops and computers, but also mobile devices. Furthermore, due to the smaller sizes of SMEs, and the lack of departmental control mechanisms, it is easier for users to create user-generated content in order to produce value added information.
These differences in ability create digital content which can be explained further with an example; a lesson on how to hire someone for work. In a LE, a lesson may consist of a text-based web page, a video on the procedure and a simulation that takes you through a role play of the process. No additional user-generated content can be added easily. In a SME, a similar lesson may contain no text, a video and simulation but additional multimedia elements, such as other videos and simulations of users who have carried out similar tasks, and steps of the process written up in a blog.
The impact of these differences is that the evolution of digital content differs. In a LE, the process of creating, archiving and using digital content remains more static and standardised, as opposed to SMEs, whose content has more derivatives in nature and evolves at a faster rate. As digital objects in SMEs evolve quicker, their complexity also increases. Further, as the objects evolve and further derivatives are created, the context changes. The context can change in terms of meaning and usage.
Take a video tutorial about how to fire someone at work. The video contains some audio in it. If a small part of this audio is extracted from the video and used for a different lesson on how to reinstate an employee who has been through a redundancy process (but did not end up being made redundant), then this would change the meaning of the audio clip and change the reason for its use.
If we think about Large Enterprises (LE’s) and Small to Medium Enterprises (SME’s) as two different Communities or Eco-systems .
a) LE’s act as ‘controlled gated communities’ with Objects being delivered to Groups or often Mass Audiences (e.g. different employees of the NHS).
b) SME’s act as ‘uncontrolled non-gated’ Communities of Individuals with Objects being delivered to niche audiences – acting as Crowds (all the individuals at the Berlin Wall when it fell).
In LE’s, HR, IT, and Legal departments, act as gates. So content objects in this eco-system have to be processed so that they are ‘cleared’ and ‘standardized’ but this takes time to go through this process. So derivatives do not evolve as quickly. Also a single user is creating derivatives for mass audiences i.e. large groups audiences. The context of a message or brand in the object is therefore much more controlled.
SME’s will be much more active in ‘mashing’ objects (combining objects) together to create new derivatives, users will also add their own content much more prolifically, so creating diversity from the original objects context. So here we have multiple single users creating new derivative objects. These will move quickly, flow further away from the original context and change meaning – e.g. the game of people in a Group whispering a message to each other described earlier. This effect when influenced by a crowd of individuals and mixed with multiple sources of user generated content – creates multiple embedded structures that need to be tracked and put in context. Plus we need an understanding of how the core original object is being transformed away from its original purpose or meaning.
Research already being carried out.
As part of our own research we have been linking up with the leading Academics across Europe to find out what is the ‘State of the Art’ on current thinking in relating to Digital Preservation.
For example the EC recently funded a project called CASPAR:

CASPAR – Cultural, Artistic and Scientific knowledge for Preservation, Access and Retrieval – is an integrated project co-financed by the European Union within the Sixth Framework Program Support (Sixth Framework Program – Priority IST-2005-2.5.10 ” Access to and preservation of cultural and scientific resources “).
http://www.utc.fr/caspar/wiki/pmwiki.php
There are other Projects that we have been looking into InterPARES, PLANET etc.
In our discussions with these leading Academics we have been looking at how the current ‘State of the Art’ in Digital Preservation will be evolved further by our work. And as we did so it became clear that our next generation Digital Library is pushing the envelope in what is the current ‘State of the Art’ and it became evident that we are moving Digital Preservation into unknown territories. It is therefore, critical that we focus and define what are the next set of ‘Research’ questions that need to be answered for us to have the greatest impact not just for our own project but for the advancement of other European companies and organizations that are going to need to preserve digital content.