Data Management in the Cloud

U.S. government agencies, along with private corporations, are increasingly adapting cloud computing for data management. For many years organizations have struggled with traditional architectures due to high costs and ongoing maintenance. Companies such as Amazon, IBM, and Google, to name a few, are working closely with government agencies to help them accomplish their data management goals, by offering secured cloud services. With that comes a rigorous FedRAMP authorization that allows companies to handle government data in the cloud.

The Federal Risk and Authorization Management Program, or FedRAMP, is a government-wide program that provides a standardized approach to security assessment, authorization, and continuous monitoring for cloud products and services. In recent years many products have gone through the FedRAMP process. Currently, there are approximately 90 authorized FedRAMP products and 62 are in the process of authorization.

The US government has long prioritized electronic use of data. The 2012 Presidential Memorandum – Managing Government Records – called for a digital transition in government. Additionally, the memorandum called for research on the use of automated technologies to “reduce the burden of records management responsibilities”. Demands for data management and data analysis are growing rapidly, and government agencies need an architecture that can be cost efficient, energy efficient, scalable, fast, and secured.

Foundations for Data Management Architecture

Data integration helps government agencies gather data residing in different sources to provide business users with a unified view of their business domain. In recent years, government agencies have adopted an integrated architecture to improve data collection and process automation.

Data quality management and Metadata management are two other key processes government agencies are incorporating in their organizations. Data quality management empowers an agency to take a holistic approach to managing data quality across the enterprise. It proactively monitors and cleanses data across the enterprise to maximize the return on investment in data. Lastly, Metadata management processes collect metadata from a data integration environment and provide a visual map of the data flows within the environment. These three processes (data integration, data quality management and metadata management) provide a solid foundation to having a great data management architecture. Informatica and Talend, for example, are some of the companies that for many years have proven experience helping government agencies achieve business goals with data management products.

Cloud Data Management Principles

Cloud data management is a way to manage data across cloud platforms, either together with an on-premises storage infrastructure or without it. The cloud typically serves as a data storage tier for disaster recovery, backup, and long-term archiving.

With data management in the cloud, resources can be purchased as needed. Data can also be shared across private and public clouds, as well as with the on-premises storage infrastructure. While some platforms can manage and use data across cloud and on-premises environments, cloud data management takes into account that the data stored on-premises and the data stored in the cloud can be subject to completely different practices.

Indeed, data stored in the cloud has its own rules for data integrity and security. Traditional data management methods may not apply to the cloud, so having management in place designed for the particular requirements of the cloud is vital.

Typical Cloud Data Management components are the following:

  1.   Automation and orchestration, including services for application migration, provisioning and deploying virtual machines images and instances, and configuration management;
  2.  Cost management, including services for cloud instance right sizing and user chargeback and billing;
  3.  Performance monitoring of the compute, storage, networking and application infrastructure;
  4. Security, including services for identity and access management (IAM), encryption, and mobile/endpoint security; and
  5. Governance and compliance, including risk assessment/threat analysis, audits, and service and resource governance.

The benefits of using cloud data management include consolidation of processes such as backup, disaster recovery, archiving and analytics, as well as cost savings. Some cloud data management companies also offer ransomware protection, by keeping data and applications native to the platform in a secure, immutable format.

Best Practices for Cloud Data Management

The journey to the cloud can take many forms and follow diverse paths. An increasing number of organizations are making strategic commitments to the cloud as a preferred computing platform. These commitments involve a wide range of use cases, from operations to analytics to compliance. For example, the survey for TDWI’s Emerging Technologies Best Practices Report revealed that many enterprises already have cloud-based solutions for data warehousing (35% of respondents), analytics (31%), data integration (24%), and Hadoop (19%).

Organizations may move their entire application portfolio or just a single application to the cloud. However, the best practices for data management don’t go away, in fact, they are more important than ever. As more organizations begin their journey to the cloud, they need to plan how they will apply the best practices of data management to ensure that cloud-based, data-driven use cases are successful for end users while also complying with enterprise governance and data standards. The good news is that existing best practices work well in cloud environments, although adjustments and upgrades to existing skills and tool portfolios are usually needed.

Here are some of the key best practices to consider when deploying Cloud Data Management:

  •  Identify cloud use cases carefully;
  • The cloud should not be treated separately, rather the data should be managed and governed holistically, regardless of the data’s platform or location;
  • Substantial metadata management infrastructure is a must before deploying to the cloud;
  • Choose data integration platforms carefully with cloud in mind by giving priority to data integration requirements for clouds;
  • Data management should be designed for hybrid platforms and not entirely for on-premise or cloud platforms; and
  • Organizational changes (skills, processes, and people) before embracing the cloud is vital.

Review of the cloud based data management services providers

Rubrik, which brands itself as “the Cloud Data Management Company,” is considered a major cloud data management player. The vendor’s Cloud Data Management scale-out platform uses a single interface to manage data across public and private clouds. In addition to offering ransomware recovery, Rubrik’s platform is the first one to support hybrid cloud[A2]  environments.

Products that have traditionally excelled in on-premise implementations of data management solutions have also been adapting[A3]  their services for the cloud. For example, Informatica offers robust cloud data management services, such as integrated contact verification, cloud test data management, customer, product and supplier 360 cloud services, and cloud data quality radar. Tableau Online provides self-service analytics in the cloud, and allows users to share and collaborate by interacting, editing and authoring on the web, to connect and access data from a variety of cloud databases, and to stay secure in the cloud. Talend, on the other hand, offers two types of products for cloud data management: a fully functional, open source data management solution offering data integration, data profiling and master data management; and a subscription-based data management solution for organizations with enterprise-scale data management needs.

Other vendors in the cloud data management space include Oracle Cloud, which offers a comprehensive data management platform for traditional and modern solutions that combine a diverse set of workloads and data types; Commvault, which has a cloud data management platform that supports multiple clouds and on-premises data; and VMware and its vRealize Suite, which delivers cloud data management for hybrid cloud environments. Komprise and Red Hat’s Cloud Forms platform also focus on cloud-based data management.