Research Data Management: What Actually Works


Every researcher generates data, but not all manage it effectively. Poor data management wastes time, compromises research quality, and sometimes leads to inability to reproduce published findings. Australian institutions are improving their data infrastructure, though challenges remain.

The Problem

Research data comes in countless forms. Lab measurements, survey responses, satellite imagery, genetic sequences, interview transcripts, and simulation outputs all require different management approaches.

Many researchers use ad-hoc systems developed during their training. Files scattered across personal computers, external drives, and cloud storage accounts. Inconsistent naming conventions. Minimal documentation. No backup strategy beyond hoping drives don’t fail.

This approach works until it doesn’t. Lost data can end research careers. Inability to locate specific files wastes hours. Lack of documentation means even researchers can’t understand their own data after time passes.

The problem intensifies when research is collaborative. Multiple team members need access to shared data, requiring coordination that informal systems can’t support.

Storage Infrastructure

Universities have invested in research data storage systems, but capacity and performance vary. Some institutions offer substantial free storage to researchers; others charge for anything beyond a basic allocation.

Cloud storage has become standard for many researchers. Services like OneDrive, Google Drive, and Dropbox are convenient but weren’t designed for research data. They lack features like version control, metadata management, and long-term preservation that research requires.

Specialized research data platforms offer better functionality. These systems integrate with analysis tools, maintain data lineage, and support collaboration. However, they typically cost more and require more technical expertise to use effectively.

High-performance computing facilities provide storage for large-scale datasets, but access is often restricted to specific types of research. Not every project qualifies, leaving researchers to solve storage challenges independently.

Backup and Security

The 3-2-1 backup rule recommends three copies of data, on two different media types, with one copy off-site. Most researchers don’t achieve this standard.

A 2025 survey found that 40% of Australian researchers had experienced data loss at some point in their careers. Causes ranged from hardware failure to accidental deletion to theft of equipment. In most cases, the data was partially or completely unrecoverable.

Security requirements add complexity. Research involving human participants typically requires encryption and access controls. Health data has particularly strict requirements under privacy legislation.

Some researchers find security requirements burdensome, treating them as compliance exercises rather than genuine protection. This attitude creates vulnerabilities that have occasionally been exploited.

Metadata and Documentation

Data without documentation has limited value. Future users, including the researchers themselves, need to understand what data represents, how it was collected, and what processing has been applied.

Creating good metadata is time-consuming work that doesn’t directly advance research. It’s easy to defer documentation, intending to do it later. Later rarely comes, or when it does, important details have been forgotten.

Metadata standards exist for many research domains. Fields like genomics and climate science have well-developed conventions. Other areas lack standardization, leaving researchers to invent their own systems.

Documentation quality often reflects career incentives. Research outputs that advance careers are publications and grants, not well-documented datasets. Until data documentation affects promotion and funding, it’s likely to remain underprioritized.

Data Sharing and Reuse

Funders increasingly require researchers to share data underlying published results. This supports reproducibility and allows others to build on existing work without duplicating data collection.

However, data sharing rates remain lower than policy requirements suggest they should be. Researchers cite various reasons: concerns about being scooped, uncertainty about intellectual property, lack of time to prepare data for sharing, and technical difficulties uploading to repositories.

Some concerns are legitimate. Competitive research environments create real risks to sharing data before fully exploiting it. Balancing openness against strategic interests is genuinely difficult.

Data repositories have proliferated, creating confusion about where to deposit data. Discipline-specific repositories offer better functionality and discoverability than general-purpose platforms, but not all fields have suitable repositories.

Finding and reusing shared data presents its own challenges. Poor documentation limits usability. Incompatible formats require substantial processing before data can be analyzed. License ambiguity creates uncertainty about permitted uses.

Data Management Plans

Many funders require data management plans as part of grant applications. These documents describe how data will be collected, stored, documented, and shared throughout the research project.

In practice, data management plans are often treated as compliance documents with limited connection to actual practices. Researchers write what reviewers want to read, then manage data however is convenient.

Some institutions are trying to make data management plans more useful by providing templates and support. When done well, the planning process helps researchers think through data workflows and identify potential problems early.

However, plans made at project start often don’t survive contact with research reality. Data volumes may exceed expectations, collection methods may change, and team composition may shift. Rigid adherence to initial plans can be counterproductive.

Tools and Workflows

Electronic lab notebooks are replacing paper records in many fields. These systems offer advantages like searchability, integration with instruments, and automatic timestamping. However, adoption is uneven, with many researchers preferring familiar paper methods.

Version control systems like Git, originally developed for software, are increasingly used for research data and analysis scripts. They track changes, support collaboration, and enable returning to previous versions. The learning curve is steep, but researchers who adopt these tools typically find them valuable.

Data pipelines that automate processing steps reduce errors and improve reproducibility. However, building robust pipelines requires programming skills that not all researchers possess. Some institutions offer research software engineer support, but capacity is limited.

Workflow management platforms help coordinate complex multi-step analyses. These tools document the sequence of operations, making it easier to reproduce results and troubleshoot problems. Again, technical barriers limit adoption.

Personnel and Skills

Many data management challenges stem from inadequate training. Research methods courses teach statistics and experimental design but rarely cover practical data management.

Research data librarians and data stewards have emerged as specialist roles supporting researchers. These professionals understand both technical requirements and research workflows, bridging gaps that frustrate researchers and IT staff alike.

However, there aren’t enough data professionals to support all researchers who need help. Institutions must prioritize which projects receive intensive support, leaving others to manage independently.

Researchers themselves need better data literacy. This includes technical skills like scripting and database use, but also conceptual understanding of data lifecycle management. Developing this capability requires time that competes with research productivity.

Cost Considerations

Data management isn’t free. Storage costs money, as does staff time for documentation, curation, and support. Long-term data preservation is particularly expensive, requiring ongoing maintenance and migration as technologies change.

Many research budgets don’t adequately account for data management costs. When faced with trade-offs, researchers prioritize laboratory consumables, equipment, and personnel over data infrastructure.

Some funders allow data management costs in grant budgets, but researchers don’t always claim these funds. There’s a perception that proposing data costs makes applications less competitive.

The full economic cost of poor data management likely exceeds the investment required to do it properly. Lost productivity, duplicated work, and inability to reproduce findings add up. However, these costs are diffuse and indirect, making them easy to ignore.

What’s Working

Despite challenges, progress is happening. Institutional repositories are improving. Training programs are expanding. Researcher awareness of data management importance is increasing.

Success stories often involve combinations of good infrastructure, clear policies, and dedicated support staff. When researchers have convenient tools and help using them, compliance improves and benefits become apparent.

Discipline-specific initiatives are particularly effective. Communities that develop shared standards and infrastructure make data management easier for members. Genomics, astronomy, and climate science demonstrate what’s possible when fields prioritize data infrastructure.

Research data management will remain an ongoing challenge as data volumes grow and complexity increases. The question is whether investment in infrastructure and skills will keep pace with expanding needs. For Australian research to remain competitive and reproducible, getting data management right isn’t optional.