THOUGHT LEADERSHIP ARTICLE
Welcome to the era of the modern data platform
To be truly data-driven, an enterprise must change the way it stores and consumes data. Combining performance and scalability, the modern data platform cloud is set to replace the historical "on premise" solutions. Yves Cointrelle, Director of Data & Analytics at VISEO shares some insights on the topic.
The health crisis has definitely validated the move to the cloud. While the period calls for rapid changes, companies no longer want to be constrained by hardware contingencies. Waiting weeks or even months to negotiate the purchase of servers, then sizing and configuring them is simply no longer acceptable.
The cloud allows us to deploy services instantly without worrying about performance issues. Computing power is potentially infinite with the only limit being the budget line. These "move to cloud" strategies concern infrastructure components, but also ERP and business applications when they are not natively cloud-based.
Databases, data lakes, analytical solutions and associated AI are also called upon to move into the cloud. This is known as a modern data platform. A term that aims to reconcile the concepts of DataLake, data warehousing or Analytics - a complete analytical environment that allows you to extract all the value from internal or external data.
The natural extension of the data warehouse and Big Data initiatives
The modern data platform is a natural extension of previous data warehouses and Big Data initiatives. Traditionally, these solutions gather a large volume of data, both raw and refined, which is activated by analysis, query or AI tools.
For companies that have to deal with large volumes of data, the alternative is to use dedicated software and hardware solutions (appliances) to guarantee high performance. These solutions are expensive and suitable for a broad range of use cases, but their cost has restricted them to the telecoms, air transport or distribution sectors. The big data approach consists of placing structured and unstructured data in a Hadoop cluster without previously knowing if the company will get value from it by finding the appropriate use cases. Adapted to manage large volumes of data, suitable for running AI algorithms, these infrastructures have gradually seen their use replaced by solutions such as data lakes or cloud storage accounts. Only customers (by choice or obligation) who are reluctant to use the cloud or those who are concerned about the portability of their environment continue to invest in these distributions, which are losing ground.
All these environments are mostly called to migrate to the cloud and become modern data platforms. This concept offers the best of all worlds, reconciling the strengths and weaknesses of existing platforms. A cloud data platform can process any type of data - management data, images, sounds, videos, data from IoT sensors, etc. - for any user, with a high level of performance and ease of implementation.
This new generation of platforms covers the entire data lifecycle, from acquisition, organization, transformation, storage and valuation. Their serverless approach means that they do not reproduce the traditional hardware architecture, but rather instantiate cloud services to create and query a database, perform analytics (reporting, datavisualization) or use AI bricks.
With cloud pricing, a company only pays for the time needed for a natural language processing, image recognition or high performance computing service without having to invest in a yearly ad hoc infrastructure.
Hyperscalers take the lead
Bringing together all the pieces of the puzzle, American hyperscalers offer this approach with Big Query for Google cloud, Redshift for AWS and Synapse Analytics for Microsoft Azure. However, these proprietary environments pose a risk of lock-in by not allowing users to move from one cloud to another.
To ensure this necessary portability, vendors such as Snowflake offer agnostic managed database services that work regardless of the platform chosen. Teradata and Oracle's cloud offerings are also competing with hyperscalers' offerings.
Another possibility: placing traditional databases (MySQL, PostgreSQL) in IaaS mode. They run virtually in the cloud, which gives them a certain portability without reaching the performance and scalability levels of native cloud databases.
The modern data platform is not limited to the database. It includes all the ingestion, transformation, upstream feeding, but also the cataloguing, data quality management, preparation and data visualization tools. For the ETL/ELT part, hyperscalers also offer their dedicated offerings with Azure Data Factory for Microsoft, Dataflow from Google Cloud or Glue from AWS. Some organizations also use the Spark and Databricks duo as a data integration platform.
There are also the historical players Informatica and Talend or pure players like Matillion born with the cloud.
The cloud reshuffles the deck within the IT department
These structural changes in architecture are not without impact within the IT department. The skills of system administrators and database administrators are changing dramatically, and new roles are emerging. The real-time and data-centric dimension of a modern data platform means, for example, that it must be fed with data as it happens and not in batch systems.
Consolidating or even eliminating machine rooms or datacenters can also lead to the elimination or outsourcing of supervision-related positions. The move to the modern data platform must therefore be accompanied by a change management strategy with major upskilling and reskilling actions. At the same time, the cloud is a lever for attracting and retaining talent, as young IT professionals naturally want to work in "up to date" environments.
Finally, a company must take into account the overall costs of this cloudification (TCO). If in certain cases of use, the cloud is not necessarily cheaper, the management and the valuation of the data, generally offers financially attractive scenarios. Furthermore, companies no longer have to worry about predicting costs by trying to anticipate their data volume over two or three years and the associated computing power. Another advantage of the cloud.
Contact us to discuss the digital transformation of your company!