Is your “cold data” burning a hole in your pocket and the environment?
Q: Christof, the term "cold data" refers to data that is rarely accessed. What percentage of a company's data typically falls into this category?
A: There's no one-size-fits-all answer to how much data is typically "cold" in an organization, as it can vary considerably by industry and individual organization. However, here are some insights and factors to consider:
General estimates:
- Studies suggest that approximately 60–80% of corporate data can be classified as cold (rarely accessed).
- This percentage tends to increase over time as more data is generated but not actively used.
Factors that influence the percentage of cold data:
- Industry:
- Industries with strict regulations (finance, healthcare, legal) often need to retain more data for compliance, increasing their cold storage"-Pool enlarged.
- Companies exposed to rapid data growth (media, research) also tend to have larger amounts of inactive data.
- Data management practices:
- Companies with strict data retention policies and active data lifecycle management have a lower proportion of truly cold data.
- Companies without established practices tend to accumulate more unclassified and unused data over time.
- Business needs:
- Certain data types (historical sales figures, previous project files) may become irrelevant for day-to-day operations, but are still valuable for later analysis or reference purposes.
Q: Obviously, cold data makes up a large percentage of total data for many companies. Can you give us some advice on why cold data analysis is important?
A: Companies should make the effort to analyze their cold data for several reasons:
- Cost savings: Identifying cold data can help you implement tiered storage solutions by shifting it to cheaper, long-term storage options like tape libraries, thereby reducing the cost of active storage.
- Security: By reducing the active data footprint, the risk areas for cyber attacks can be minimized.
- Efficiency: Archiving cold data keeps core systems organized and improves the performance of frequently used information.
Estimation tips:
- Data classification tools: Software can analyze data access patterns and categorize them by frequency.
- Consult your internal experts: Interview IT staff, department heads, and data analysts to find out which data is actively used and which could be archived.
Remember that the focus should be on identifying truly inactive data, not just arbitrarily striving for a percentage.
Q: If we focus only on the cost impact, can you estimate the annual cost of storing 1 TB of data on a hard drive?
A: Unfortunately, there's no definitive answer to how much energy is needed to store 1 TB of data on hard drives in a data center for a year. Here's why and how we can estimate it:
Factors influencing energy consumption:
- Hard drive type: Power consumption varies between consumer-grade hard drives, enterprise-grade hard drives, and the specific model.
- Idle vs. active current: Hard drives consume less power when idle (not actively reading or writing). The frequency of data access affects overall power consumption.
- of the data center: Power Usage Effectiveness (PUE) measures the energy efficiency of a data center. A lower PUE means less overhead energy is used for cooling, etc.
- Power source: The data center's power source (renewable instead of fossil fuels) has no impact on the hard drive itself, but does affect its carbon footprint.
Create a cost estimate:
Let's use some assumptions to get an approximate number:
- Hard disk: A typical enterprise-class hard drive may consume 5-10 watts when active and 1-3 watts when idle.
- Usage patterns: Let's assume moderate activity, an average of about 5 watts per hour.
- Power Usage Effectiveness (PUE): We use a PUE of 1,5 (fairly efficient data center). This means that for every watt the disk consumes, an additional 0,5 watts are used for overhead.
Calculations:
- Daily consumption: 5 watts/hour * 24 hours/day = 120 Wh (watt hours) per day.
- With PUE: 120 Wh * 1,5 = 180 Wh per day.
- Annual consumption: 180 Wh /day * 365 days/year = 65.700 Wh or about 65,7 kWh (kilowatt hours) per year.
Important Notes:
- This is a rough estimate. Actual consumption may be higher or lower depending on the factors mentioned above.
- Energy costs depend on your electricity provider’s tariffs.
Useful resources:
- Backblaze regularly publishes data on hard drive power consumption: https://www.backblaze.com/b2/hard-drive-test-data.html
- PUE is a ratio that describes how efficiently a data center uses electricity. Learn more about PUE here: https://www.thegreengrid.org/
Q: What would be the cost to store the same 1 TB of data in a tape library?
A: Here's how storing 1 TB on tape compares to disk in a data center, with a focus on power consumption:
Advantages of the tape library
- Extremely low no-load current: Tape libraries consume minimal power when not actively accessing data. A dormant tape cartridge consumes virtually no energy.
- Reduced cooling requirements: Because of their low power consumption, tape libraries generate less heat, significantly reducing the energy load on a data center's cooling systems.
- Offline character: Tape provides inherent offline protection and mitigates some cybersecurity risks and the associated energy costs required for constant online security measures.
Rough estimate for band:
It's difficult to give an exact energy figure for tapes, as it depends heavily on how frequently they are accessed. However, in a primarily archival scenario, let's assume the following:
- Monthly tape access: Tapes are read/written once a month for backups or verification.
- Active power consumption: The library and drive consume approximately 200 watts for several hours during this activity.
Calculations:
- Monthly consumption: (200 watts * 3 hours) = 600 watt hours or 0,6 kWh per month
- Annual consumption: (0,6 kWh/month * 12 months) = 7,2 kWh per year.
Comparison
- HDD estimate: ~65,7 kWh/year
- Band estimate: ~7,2 kWh/year
The central theses:
- Tape libraries have the potential to significantly reduce energy consumption compared to always-on disk storage, especially for infrequently accessed data.
- This energy advantage directly leads to cost savings and a smaller ecological footprint.
Important Notes:
- If tape data needs to be accessed frequently, energy savings are reduced.
- Tape libraries incur upfront costs, so the total cost of ownership (TCO) comparison with disk must be done over a longer period of time. However, through the Cristie READY program, customers can enjoy all the benefits of tape archiving with no upfront costs with our true OPEX "pay-per-use""-Model.
Q: Thank you, Christof. Tape clearly offers tremendous energy savings and additional security benefits for cold data. However, Cristie offers tape backup and archiving in our data centers, which are located in wind turbines and powered directly by renewable energy at the source. Surely this is an incredible one-two punch for companies seeking secure backup and archiving while also massively reducing their carbon footprint?
A: Absolutely! The advantages of our Wind farm data centers and as Cristie READY program are far-reaching. Let's explore them in more detail in our next Q&A session.





