Platform Engineering is a buzzword in 2023 – it lands at number 5 in the Gartner Top Strategic Technology Trends 2023 below. We have heard about it from everyone and everywhere.
“Gartner expects that by 2026, 80% of software engineering organizations will establish platform teams as internal providers of reusable services, components, and tools for application delivery. Platform engineering will ultimately solve the central problem of cooperation between software developers and operators.”
Figure 1. Gartner Top Strategic Technology Trends 2023
In our last article, we wrote about the journey of how we can consult, design, develop, and deploy software for our customers with greenfield and brownfield software development approaches (please read it to know more about how we can do it, see software-engineering-at-nashtech). And the question is what is it related to Platform Engineering? Do we need to start it from scratch, or we can inherit something in NashTech Vietnam software engineering which existed for many years now, and build up the Platform Engineering from there?
To answer this question, we need to analyze how we can build software for our customers for many years now?
Let’s start off with how the world builds software first. Back in 2018, Evan Bottcher from Thoughtwork wrote a very excellent article about Digital Platform. And in this article, he gave us a definition of the digital platform as below.
Digital platform definition:“A digital platform is a foundation of self-service APIs, tools, services, knowledge and support which are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced co-ordination.” – Evan Bottcher @ Thoughtwork
With this definition, we can see it has a little bit of equivalent to what NashTech Vietnam Software Engineering process for the NashTech customers (trying to balance many factors between software engineering and team organization to make a better value for the customers).
The journey of designing and building software from the very ground up of how we can build software from an infrastructure point of view as very high level as below.
Figure 2. Infrastructure design and development view
Starting from a VM, we can build some software on top of the VM and host the application on it. But to make it more portable and adapt well to cloud infrastructure environments, we normally develop and package the application and its artifacts using container technology such as Docker, ContainerD, Podman, runC… And in this stage, we follow strictly the 12-factor App methodology for building software-as-a-service apps that:
- Use declarative formats for setup automation, to minimize time and cost for new developers joining the project.
- Have a clean contract with the underlying operating system, offering maximum portability between execution environments.
- Are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration.
- Minimize divergence between development and production, enabling continuous deployment for maximum agility. And can scale up without significant changes to tooling, architecture, or development practices.
Microservices or modular monolith architectural styles are chosen spontaneously due to their autonomous deployment and the urge to speed up time-to-market for customer products. And now, event-driven microservices or serverless is the best choice for almost all customers who want to build modern applications (trusted environment framework).
And the next step is with a standalone application perspective as in the picture below.
Figure 3. Application design and development view
As in the picture above, we see that any project needs to have a web framework (on the right) that works with underlined systems (hardware, POSIX, and any OS standard) and introduces better APIs for programming. Next, on top, we need application frameworks (web framework, in-house framework, or on-the-edge connector). Subsequently, there is some integration layer such as gRPC, REST, or Kafka. And finally, there are some data services, or data semantic layers to aggregate, cleanse, and filter data that in turn will serve back to end-users.
That is what we currently doing in NashTech for almost all kinds of customers in any business segment except for some special requirements such as GDPR (General Data Protection Regulation), PII (Personally Identifiable Information), Threat Modeling (DevSecOps – secure by design), PHI (Protected Health Information) …
And by the time, various requirements to automate many tasks on top of Kubernetes and its ecosystem, many tools, services, libraries, and frameworks are created by the CNCF community. See the landscape at the time of writing this article:
Figure 4. CNCF landscape
With many of them in the toolbox, developers become a full-stack style to work on every aspect of developing the application, write secure Dockerfile for each service, write Helm chart configuring Kubernetes cluster, write application integration and testing it, make end-to-end testing and run it on Kind in nightly build, know how to codify CI/CD pipelines, integrate with DevSecOps tools in the CI/CD pipeline, operate GitOps tasks with ArgoCD, DORA metrics, SRE tasks, just to name a few. And developers in the team become stressed and reduce their productivity, and in another hand, if they based on the work of the DevOps team too much then the waiting time in the big team is huge and leads to a decrease many the velocities of delivering the software feature to the market.
Many other aspects are mentioned in the excellent Team Topologies – Organizing Business and Technology Teams for Fast Flow book by Matthew Skelton and Manuel Pais.
Figure 5. 4 types of team topologies
The problems mentioned above will cause a lot of team cognitive load for developers in a big project team (many SCRUM teams operate in a distributed manner– both onshore and offshore sides–– which is very popular in projects inside NashTech Vietnam). And the mission is how can we help our developers reduce team cognitive load and shift the focus of them to what they favorite the most is writing business code and deploy it to the staging environment and roll out to a production environment in a Stream-aligned team manner (Goldern Paths, reduce toils), and sometimes with specific requirements they can help enough support from Complicated-Subsystem team as well as short term supportive from Enabling team.
Figure 6. 3 types of team interaction modes
Since then, we can increase our ability to deploy as fast as possible the product features into the market.
Now, let’s see how Gartner defines the core components of Platform Engineering:
Figure 7. Core platform engineering components
They proposed to have 3 core parts (reusable components, developer tools, and self-service developer portal) to help developers scale delivery by solving 3 aspects: complex architecture, hybrid infrastructure, and lack of skills. If we don’t solve the problem of the 3 main pillars, then sooner or later it will lead to wasted effort for the whole team in the software development lifecycle. Let’s narrow down these 3 core pillars in the next section.
NashTech’s Platform Engineering
Platform engineering definition“Platform engineering is the discipline of designing and building toolchains and workflows that enable self-service capabilities for software engineering organizations in the cloud-native era. Platform engineers provide an integrated product most often referred to as an “Internal Developer Platform” covering the operational necessities of the entire lifecycle of an application.” – Luca Galante @ Humanitec
To solve this problem for the developer team, we need to form a platform team up (at NashTech, it is a virtual team that gathers SWAT members, project technical architects, and DevOps …). And their purpose is to build a platform for developers to help them reduce team cognitive load. In fact, this process will help our customers to increase a lot of productivity if we apply it correctly in their project teams. We have benefited a lot from many years of working on software development for customers, so we collect these best practices in various software engineering areas, and map and align them with our expertise as you can see in the picture below.
Figure 8. High-level architecture of NashTech Vietnam Platform Engineering
Now let’s deep dive into each essential component in the holistic view picture above one by one.
Figure 9. The infrastructure of platform engineering
At the bottom, we have a complex infrastructure with many approaches nowadays. Customers might want to build their product on-premises to benefit from connectivity with their in-house built products, or maybe privacy problems in their business. Or they can build products totally in the public cloud to get the benefits of shifting the CaPex to the OpEx model and leverage the power of public cloud services. Or they can choose to maintain a balance model like a hybrid cloud. And that depends on what the customer wants to run their business. At NashTech, we can do all this business for them. See our talks about how we can do hybrid clouds in HashiTalks this year at Hybrid Microservices with HashiCorp Stack.
Then, we talk about how NashTech Viet Nam Software Engineering fits into the picture below.
Figure 10. NashTech Vietnam’s current software engineering
- Docker & Kubernetes best practices: we did many projects using Docker and Kubernetes on the offshore side (Viet Nam), many of them using hybrid cloud models. But many customers nowadays they like to use Kubernetes managed services such as AKS, EKS, and GCP to reduce the complexity of managing infrastructure. We normally have an upfront session with customers to analyze the pros and cons of each option for the better choice of Kubernetes as well as provide them with best practices in building cloud-native applications (application versioning with container apps, how to divide boundary of applications to leverage the power of the cloud-native model to delivery faster…).
- Cloud Well-architected practices: we train our developers at NashTech about the pillars of cloud well-architected practices, and certainly customers will benefit from that when working with us.
- Cloud adoption framework: as brownfield software development, and if customers want to do an assessment, lift-and-shift applications to the cloud, or perhaps modernize the whole applications to benefit the power of the public cloud, then we leverage cloud adoption frameworks (AWS, Azure, or GCP cloud adoption frameworks).
- 4C’s cloud-native security model: when building, deploying, and operating the application in Kubernetes, we normally use 4C’s cloud-native security model to secure the cloud layer, the container layer, the cluster layer, and the core layer. And many aspects need to be defined and done to avoid risks and vulnerabilities in the project. We will write another article addressing this topic soon.
- Zero Trust Authentication (ZTA) approach: with never trust and always verify principles in mind, we address this model with some technics like continuous verification and always verification of all actions and requests in the infrastructure and application levels all the time and for all resources. We limit the “blast radius” by minimizing the impact of external or insider breach that does occur. We advise customers to set up tools to automate collecting the context logs or metrics, subsequently providing the in-time response on any problems happening.
- OWASP Top 10: we strictly follow OWASP guidelines (the current version is 2021) at NashTech since 2017.
- GDPR, PCI/DSS, ISO 27001, ISO 9001: NT is certified by big organizations about quality and security information in the world.
- Monitoring & Observability: we invest much effort to research, define, and apply many tools and encourage project teams to use them to shed light on the application and infrastructure stack by numeric analysis instead of guessing works.
- Enterprise architecture integration practices: enterprise application architecture patterns (Martin Powler), enterprise integration patterns (Gregor Hohpe & Bobby Woolf), and cloud design patterns normally use as a standard in the NashTech development team, and to make it possible we have the training courses for our developers to aware it when codifying any enterprise business features.
- Architecture design and development practices: at NT, we are strong at event-driven architecture, microkernel, or plug-in architecture, microservices architecture (MSA), multi-tier or N-tier architecture, and space-based architecture architectural styles. And many architecture patterns are used and trained such as Domain-driven Design (DDD) & Command and Query Responsibility Segregation (CQRS) patterns, MSA patterns, and hexagonal or clean architecture. Many more shall come in future articles.
- Continuous testing practices: by using some famous patterns like AAA (Arrange-Act-Asset), and GWT (Given-When-Then), we force to normalize the unit testing and integration testing between developers in the team. But to make it more effective, we need to shift left by integrating it into CI/CD pipelines to run automatically.
- DevOps culture and practices such as CI/CD, Infrastructure as Code (IaC): almost projects nowadays want and use DevOps, at NashTech we experience this trend for many years now. We have experience in creating, and operating CI/CD effectively for customers, and automating provisioning infrastructure and application components using Terraform, and Crossplane.
- Continuous deployment with GitOps, Feature Flags: In some projects we use ArgoCD to increase privacy in project teams, separating the concerns between the dev and ops members to avoid leaking sensitive information in the teams. We also worked on some projects they applied feature flags to enable ability the application to be tested on production to solve the ineffective usage of cloud resources in the project.
- Continuous security with DevSecOps model (Secure by Design with Threat Modeling methodology): normally customer will do the Threat Modeling and we based on that to design our documents as well as integrate DevSecOps tools and plugins into CI/CD pipelines of customer projects to make it automation, earlier to get feedback and trigger operational excellent process though.
- Operational Excellence practices: Some project doesn’t apply the DevSecOps model, due to many legacy components in their stack, so NashTech offers the service to do an assessment (cost, performance, reliability problem…), and provide the appropriate method, software, and tools to work on it effectively.
- Chaos engineering: starting from this year, NashTech invested some people to investigate the ecosystem of this kind of engineering and we will document it in another article.
Now, we move to the next part on the top of NashTech software engineering as in the picture below.
Figure 11. Core platform engineering component at NashTech Vietnam
For some projects, they build the platform team, and certainly, they will use some kind of Self-service Portal (UI) pillar (on the rightmost, to let developers in the stream-align team self-service themself by creating a sandbox environment, or they can create on-demand components in the project like Postgres database or RabbitMQ instances, does self-learning about the system by reading the learning path or guidance in this portal). Spotify’s Backstage is normally used by many project teams around the world, and without exception NashTech, we are all in on it right now.
With the Re-usable Components pillar, we would like to be able to leverage practices and components in NashTech software engineering as much as possible. If we can do that, then our future customers who want to apply Platform Engineering in their projects can be benefited a lot. And fortunately, we have been building NashTech Accelerators for many years now. We have 6 accelerators to help customers in the journey of building software in both brownfield and greenfield software development as below.
- Architectural libraries and templates: A set of “batteries-included” templates enabling the quick, efficient, and developer-ready set-up of environments based on microservice, API-first, cloud, and headless design patterns. The templates include a range of invaluable building blocks and templates based on Azure and AWS including many open-source projects. Our simple StepOne CLI makes all this easy for developers to use.
- Data solutions: A fully customizable “starter pack” designed to accelerate the implementation of data-driven solutions. Includes a template, methodology, and tools that can help organizations quickly develop, deploy, and scale data-driven solutions.
- AI/ML libraries and solutions: Our AI & ML templates help organizations quickly set up an end-to-end solution including many useful features. For example, the templates enable the exposure of AI models via REST APIs, deployment of AI models using Docker, and deployment of end-to-end Azure ML solutions.
- Open-source boilerplate: Provide developers with pre-configured templates and tools for creating new applications in Java, Scala, and Kotlin. Developers can quickly get started building applications by selecting a template and configuring it to meet their specifications.
- Learning accelerator: These are your one-step solutions, providing curated templates in more than 22 technologies and languages. Our repository contains templates to accelerate all your development requirements. Kick-start your development with these ready-to-deploy templates.
- Intelligent automation (RPA): Leverage the powerful RPA and brilliant capabilities of AI, for a fully automated solution without human intervention, NashTech intelligent automation is combination of self-built RPA platform and self-built AI accelerator, to help speed up operational processes and bridge your business to era of intelligent technology.
Besides, we encourage you to re-use the knowledge of using libraries and frameworks in over 20 years in software development to help our customers as well.
With the Developer tools pillar, we use popular tools, libraries, frameworks, and software:
- VS Code, Jetbrains toolbox
- SCM (Source Control Management) such as GitHub, GitLab
- Docker, Docker-compose, tye
- Terraform, Ansible
- Sonar, Gitleaks, Trivy
- Azure Pipelines, GitHub actions
- Management tools: Jira, Azure DevOps
- Just to name a few
Based on these solid bedrocks to abstract away the complex infrastructure and process, we consolidate and build out more components and tools on top of it to introduce the simple CLI, or API to consume underline services/components themselves without needing the interference or coupling with other teams.
It leads us to what NashTech Platform Engineering is going to be like in the picture below.
Figure 12. Put it all together with the holistic view of NashTech Vietnam’s platform engineering
Keeping in mind the principles of building products for developers, data scientists, and end users, we need to create backlogs for the platform engineering team to work on it and evolve these items to make the platform better gradually. And it aims to build solid golden paths for developers, data scientists, and end users on the platform.
During the article, we walk through the platform engineering trend in the world, and the reasons why we need to adopt and prepare for it in the next couple of years. Subsequently, we let you know what we have inside NashTech right now, and what missing pieces we need to fulfill in the next couple of months to make it complete, and with that we can serve our customers who want to build platform engineering for their projects better. Some of the aspects we didn’t address were how we can organize the current roles in NT to adapt to team topologies (virtual team), platform engineering material levels, building an internal developer portal with Backstage, and so on and so forth. Let’s see how the world moves, but with that, we need to catch up and face it.