The Palais des Congrès, Brussels, Belgium
25-27 November 2002
Changes in technology have been leading to changes in the role of data centres. There is a trend to move away from the traditional data centre, with its main task of archiving data sets, to become more service-orientated.
Data centres can look towards libraries for inspiration to redefine their role; libraries provide expertise and guidance in cataloguing. Archives are grey and dusty, libraries are active and open; data centres should strive to resemble the latter rather than the former. Data management needs an equivalent to the ‘Web of Science’: a mechanism to bring up a list of relevant, available, quality controlled, peer reviewed data sets.
There is a need to create data and information products; not only towards other data managers and scientists, but also to the policy makers and society at large. These products will assist in increasing the visibility of the data centres, and so assisting in attracting both funding for further activities, and data submissions from scientists.
Some traditional roles of data centres remain important: long-term stewardship of data, integrating data sets, documenting and redistributing data sets, development of standards and standard operational procedures…
Both data centres, and data and information management procedures are very poorly known by marine scientists. In most university programmes, there is no training on data management, no information on data centres, data management procedures… Data management is perceived too much as an IT topic. There is a need to investigate how to put data and information management on the curriculum of academic institutions. This would result in a better knowledge of the data centres, and an increased quantity and quality of data submissions
Data managers should actively seek collaboration with scientists. If data managers have a background in science, it is possible to establish a relationship of trust with the scientists, a smoother collaboration, and a greater input of the data managers in the development of data collection. The involvement of the data managers in the planning of projects from a very early stage makes ‘End to end data management’ a reality.
EU has the mandate and the funds to support and improve training for scientists in data management, and could be playing a role in this.
To a large extent, data centres are dependent on scientists to submit data. Especially in view of the extent to which scientists are not aware of the role or even the existence of data centres, this is a potential problem. Several actions can be taken in this respect.
Data sets often result from projects, which usually have a limited time-span. Data management on short term, within the time span of the project, is usually no problem: scientists do need data management to produce the deliverables to the project; moreover, making provisions for data management is a prerequisite to have a proposal accepted in the first place. There is an obvious need for activities beyond the duration of the data-generating project, to assure continued availability of the data. This always has been, and probably should remain, one of the tasks of data centres.
Funds for long-term data management should not come from research budgets, but rather from operational networks or other mechanisms. Several initiatives of the EU are relevant in this respect. Within Framework 6, there is a possibility to fund the operations of large ‘Networks of Excellence’ that will operate on time spans much longer than a typical project. The Global Monitoring for Environment and Security (GMES) initiative is another potential mechanism.
A certain degree of duplication is unavoidable, and is a fundamental aspect of the scientific process. There has to be room for experimentation, different attempts at solving the same problem. After some time, however, experimenting should stop and be replaced with one or a couple of strategies.
Undesirable duplication can partly be stopped during the project proposal review process. One of the objectives of the Networks of Excellence, as proposed by the EU, is to increase communication between partners of the network, raising awareness of each other’s activities, and hence decrease the probability of duplication.
There has to be peer-review, as a way to measure and recognise progress, to recognise value and expertise, and as a foundation for standards and accepted procedures. Standards and audit procedures are needed to allow objective peer review. Developing these standards is a task for the data centres.
Peer review is a way to increase the compliance with standards. Countries, or even institutions or scientists, could be tempted to work along principles that were developed locally; obviously, these will fit local needs, and are usually much faster to develop. Doing so, however, can lead to fragmentation, and hamper data exchange.
The problems of biological and physical data management are different: physics data sets are often high volume and low complexity; biology data sets are low volume but high complexity. Taxonomy brings a 5th dimension to ocean data management.
The lower level of standardisation in biology makes importance of proper documentation with the data sets even greater.
Commonalities are more important than differences: both biological and physical data management need for long-term activities; need for quality control and peer review; need to create data products
Participation of developing countries in global programmes is the best way to transfer expertise. Global programmes can operate at several levels, so that they can serve both global and local needs.
Internet access is a problem in many third-world countries, and assisting with connectivity and basic telecommunications should be made a priority in any capacity-building programme. Where internet is available, the bandwidth is often very limited, making it virtually impossible to download large volumes of data. As long as this problem still remains, data should also be distributed on alternative carriers such as CD ROM or DVD. Data warehousing and brokering can assist in locating and selecting relevant data sets, and thus limiting the volumes of data to be downloaded.
Also funds to purchase hard- and software, and expertise to maintain the systems, is a factor that is more limiting in developing countries. The data management community should provide platform-independent software that is open source and runs on hardware that is compatible with technological expertise available. Reliable and stable standards should ensure that data are available in a form that can be handled by these tools. Capacity building programmes should be organised making use of these tools and standards.
back to COD home | sessions | panel |