My predictions for 2021 – Data and analytics

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Year 2020 has had a tremendous impact on our lives and has driven many changes. Since last year was a year of radical changes (which we were or were not prepared for, but had to accept them), these will certainly have an influence on what the year 2021 will bring us.

I have made a short list (curated list) of predictions for 2021 where data and analytics might head. For better clarity, I have grouped some of the relevant areas, mostly covering:
– Data Engineering
– Data Analytics
– Machine Learning
– Cloud Technology
– Languages and Roles
– Data Governance

Data Engineering will continue to grow and will see additional boom in 2021. Data consolidation will make this role expanding and will further more heavily depend on success of any ML project. New wave of ETL tools will emerge, making data transition, transformation and data availability easier, faster and more reliable. Depending on the infrastructure, but these might become even bigger players for data pipelining, data tool chains and ETL: dbt, Panoply, Airflow, Matillion, Dataform and Alteryx. All are vendor agnostics, some also great for connecting different tools, platftorms, OS and some are also great tools for data analytics. Exclusivity will be bought by developing fast drivers, API, connections between different data silos.

Following the expansion of data engineering teams, tasks and operations, people will become more mindful about Data strategy; term that will become more and more used. It is broadly used for describing strong data management vision, prioritising, aligning data with data analytic activities with key organisational priorities. With goals as: concepts and standards, collaboration, reuse, improved accuracy, access and sharing in mind. This will be driven – especially in Europe – throughout many of the organisations due to data growth and aligning with data teams to organisational goals.

Data Analytics have been reshaped to some extent in 2020 due to changing workplace, customer experience and faster digitalisation of daily life. Graph analytics will gain further traction due to pandemic causes, cybersecurity and need for tracking activities. Real-time dashboards and data visualisation will play further role in information segment of feeding consumers correct and non-biased information, as well as story telling will further gain popularity, due to changes in daily life of every individual. All will contribute to understanding basics on what is going on, making basic business decisions and understanding underlying concepts of why changes have happened. Many aspect of data analytics will play key role to dramatic changes and impact of pandemic and related events. Therefore we can also expect more logs being generated, kept for longer period of time and opening up many new opportunities.

Machine Learning (AI) will continue to rise in mid-size to large organisations. And will continue to decline in small organisations. Data scientists will continue to hunger for meaningful training datasets. They will fed their ML Algorithms to understand predictions, changes over time and results to cloud based services or SaaS applications. Giving more compute power will also create more pressure for data scientists to capture and ingest single change. Encapsulated environments will further drive the expansion among data science. Platforms as Databricks will grow in popularity, usability and will help DataOps ecosystem in large enterprises, making data more actionable for data science.

CI/CD and MLOps will continue to bloom and should gain even more traction in 2021. Year 2020 was the explosion year, offering many tools to data scientists, with the explosion of many startups and many offerings, there might be some consolidation and only few (frontrunners) vendors will remain. More focus will be put in developing solutions that require more and more effort due to rapid data changes, bringing build/deploy prediction model to higher frequency. This will also make the testing more difficult and version control more complex.

Natural language processing will see even further growth in 2021, mostly to digitalisation of many of the daily processes and storing many of the conversations. Also health industry (as other industries) will have a huge gain in NLP.

Machine learning will get further commoditised, and many of the cloud services and cloud platforms are offering ML out of the box. On the other hand, the need for white box (in comparison to black box ML algorithms) will be available in many of the platforms, from interpretability, explainability to fairness and many more.

Cloud technologies will have several players that will advocate new standards. Snowflake will become number top 3 in field of Data warehousing, bringing new concepts of datawarehouse to cloud. Decoupling compute from storage, making it cross-platform and cross-language available, ingesting any type of data, anywhere will bring closer cloud and into everyday use to ,big organizations. Cloud will be even more used in 2021 due to changes in workplace and how we make work, so additional services for making work easier, to collaborate better, exchange work will bring a lot of fundings from investors and many of smaller start-ups will flourish.

Live recordings of work in bigger companies will drive appetite in this direction with the help of cloud storage and services. Fog computing (in respect to edge computing) will be the buzz-word of the year with the companies that deal with IoT or organisations adopting IoT.

Everything as a code will revolutionise “as Code” concept in 2021, making it bigger part of DevOps teams

Languages and Roles will also change in 2021. Bringing new data roles as: Cloud data Prep, Analytics Engineer, Data Trustee, Data-Lake engineers, and mesh-up roles as DataOps Engineer will appear further more in large organisations. Data team will start aligning their methodologies to core software development for better data understanding, better data services to other data-orientated teams.

Data-Ops practices will become part Data Team, Data Engineers and in 2022 or later, of almost every team, because fast growing business needs will be tailoring new business use cases and cloud technologies will be pushing the data literacy further. In 2021, having knowledge in Python, R, Scala, Julia, PowerShell, Spark, or Machine Learning will not be an advantage anymore, but more a prerequisite for any data-orientated position.

Many of roles that have emerged in 2019,2020 will be further stabilised and will have a continuative growth.

R and Python, alongside with Scala, Julia, will remain and hold even a stronger position in data science. But the necessity of general comprehension of SQL, JavaScript, Bash/PowerShell, Java, C++ will become even bigger.

Spark will the key language for 2021, when we will talk about data science and infrastructure, alongside Presto and others. Investing in Spark in 2021 will pay-off.

Data Governance will become much bigger focus in 2021 as it has been in past 10 years. With the surge of data teams, data-ops and data officers, the need for catalogues, definitions and business rules will be corner stones to data trust. Having trustworthy data will speed up many of the later data ingestion, preparation or data analysis processes and thus making data much more agile and operationalised to business needs. Governance will almost be a key component between smart data cleaning, better ETL/data chaining/data processing operations, making and helping a stronger data management vision and building strong business cases on top.

Feel free to comment, post your views, agree, disagree, and debate. ???? I know we are bad at giving such predictions, but it is always nice to share the vision and have a contra-argument for incentive and further thinking.

As always, Stay Healthy and happy coding!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)