CTO insights: lower costs, maintain high quality apps
Webinar transcript
We utilized ChatGPT to enhance the grammar and syntax of the transcript.
Transcript highlights
1. Infrastructure-as-Code Transformation
- Key Insight: Infrastructure-as-code (IaC) has revolutionized IT operations by enabling automation, consistency, and scalability. It replaces manual provisioning with machine-readable configuration files, integrating infrastructure into version-controlled workflows.
- Takeaway: Embracing the “cattle, not pets” mindset for infrastructure fosters modular, stateless systems, aligning with cloud-native architectures.
2. Microservices vs. Monoliths
- Key Insight: While microservices promised flexibility and scalability, many organizations have reverted to monolithic architectures or hybrid approaches due to challenges in implementation, maintainability, and team structure.
- Takeaway: Evaluate your organization’s specific needs before adopting a microservices architecture. Simplicity often trumps complexity when resources and expertise are limited.
3. Serverless Computing: Opportunities and Challenges
- Key Insight: Serverless computing offers scalability and abstraction from hardware management but has significant cost and performance trade-offs, especially during cold starts or database interactions.
- Takeaway: Use serverless for optimized, specific use cases rather than general backend needs to balance scalability and cost-effectiveness.
4. DevSecOps and Security Integration
- Key Insight: Security is no longer an afterthought. Integrating tools like Snyk and HashiCorp Vault into CI/CD pipelines ensures vulnerabilities are addressed throughout the software lifecycle.
- Takeaway: Shift-left security minimizes late-stage issues, reducing costs and improving software reliability.
5. FinOps: Managing Cloud Costs
- Key Insight: For many organizations, cloud expenses are a top operational cost, often exceeding 40% of revenue. FinOps—dedicated cloud cost management—optimizes spending and ensures efficient resource use.
- Takeaway: Establish FinOps practices early to prevent runaway costs, and regularly audit cloud usage to align with business goals.
6. AI in Development
- Key Insight: Tools like GitHub Copilot accelerate development but require careful implementation to avoid outdated or unsuitable solutions. Over-reliance on AI by junior developers can hinder skill development.
- Takeaway: Use AI as a complement to human expertise, ensuring outputs align with best practices and project needs.
7. Edge Computing’s Emerging Role
- Key Insight: Edge computing enhances performance by running processes closer to users, but tooling and best practices are still evolving.
- Takeaway: Monitor edge computing advancements from leaders like Cloudflare and Fastly to integrate these capabilities into your workflow as they mature.
8. Sustainability and Carbon Optimization
- Key Insight: Moving workloads to carbon-friendly regions and integrating sustainability into cloud practices can significantly reduce environmental impact.
- Takeaway: Prioritize carbon reporting and optimization to comply with regulations and align with global sustainability goals.
Corey Dockendorf: Good morning everyone, thank you for joining us today for Today's webinar. My name is Corey Dockendorf, I am Solutions Architect here at Platform.sh and I am joined by my counterpart. Sir, would you like to introduce yourself?
Guillaume Moigneu: Sure, Guillaume, aka. G, I have been here for 10 years, first I would like to apologize, I am jetlagged, sick, and French. So we will do our best to give you a good hour.
Corey Dockendorf: Thank you so much for joining us Today. Ideally what we want to do. I will help moderate this session as we dive into a critical topic for 2025. And that is how to lower cost and maintain higher quality while DevOps and infrastructure costs are changing in 2025. We will go back and forth between myself and G. We will discuss a couple of topics, we have QA as well. Please feel free to put your question in the chat channel as we go along. So first let's start with Guillaume. First let's start with this: where 2024 predictions accurate, where everything is going with these tech trends today.
Guillaume Moigneu: Every year, everybody has their own predictions but nobody is correct so we actually don't know what will happen, while things can evolve, as you know many things have changed in the past year, so let's start with the adoption of AI. It couldn't be a business webinar if we don't slap AI or machine learning somewhere. everyone was promising us big changes in all the different tooling, in all the ways we manage infrastructure and DevOps with AI. What we have seen in a year is not actually that much. Sure things have changed about LLMs, and all the wonderful things we can actually do with them now. But not really on the infrastructure side. That is a good question for the upcoming year, but as of now we have only seen a couple of features on some different products, like Gilab for example, and all the others are trying to retrofit machine learning into this. What I really want to see there in the future is how companies can build their own models to optimize things, like for example autoscaling based on their own needs, and not generic ones. That is the first thing, we will get deeper into this later, but now let's go into 2024 ones.
Corey Dockendorf: Let's talk about the rise of infrastructure-as-code. Infrastructure-as-code has been transformative for IT operations over the past decade, if not longer. We have seen everybody start to make change because driven by this need for scalability, consistency, speed, and managing complex cloud environments, traditionally infrastructure used to be provisioned and maintained by hand which is not a best practice these days. It requires time, a lot of talent, and a lot of things can happen and go wrong. So it is basically time-consuming and error-prone. As organizations increasingly adopted cloud computing and DevOps methodologies the demand for automation, repeatability, and infrastructure management became critical. It became our day-to-day. Infrastructure-as-code solves these challenges by using machine-readable configuration files, whether it is YAML, JSON, HCL, or anything along these lines, to define and manage what that infrastructure is. So now I have obfuscated a code peace, and infrastructure, and put that in my workflow all together. So now I am managing it, version control on infrastructure, as well as my code. So this rise of infrastructure-as-code is also tied closely to the broader shift, and I love this methodology “the cattle, not pets” in infrastructure management. Where resources are treated as disposable and interchangeable. This mindset basically encourages infrastructure to be modular, and stateless, making it a natural fit for cloud-native architecture. Which we have seen a gigantic shift to these days. And even moving to containers and Kubernetes which we will talk about in a little bit.
Guillaume Moigneu: I think we all agree this one has been done. Most companies are actually doing it one way or the other. On the other side, it is not really the case for microservices. I thought about reading books about microservices 8 years ago with all those big concepts and best practices. What we've seen in our client base right now is kind of the opposite move. They've tried microservices, put some projects up, and then realized that somehow the architecture wasn’t well thought out. Or maybe they don’t have enough different teams to justify the need to split those services.
What we've seen so far is that for the sake of performance and maintainability, a lot of our clients are moving back to more of a monolith approach or just a few specific microservices, but not a fully distributed architecture. For example, we had a client running 17 microservices for an extranet at a rental company. They decided to move back to just three applications, which made much more sense. They grouped everything into UI, invoicing, and sales, rather than having 17 services communicating with each other.
So, I would say the trend isn't happening in the way we might expect—quite the opposite.
Corey Dockendorf: I completely agree. With the expansion of containerization and Kubernetes, everything is now in containers—your code, your services, everything. However, this isn’t necessarily a new practice.
To Guillaume's point, we’ve seen teams attempt to implement Kubernetes or similar architectures and then revert to a monolithic design. Often, this happens because they lack the necessary development skills or the initial implementation was poor. The code might not have been structured correctly, so instead of scaling seamlessly, it created more problems.
The promise of Kubernetes—like autoscaling, eliminating concerns about application scaling or flexibility—often leads teams to adopt it prematurely. But many later realize the costs associated with poor implementation.
I suspect we may see a renewed interest in containerization and Kubernetes as teams learn from their mistakes and understand the true costs of transitioning from a monolithic structure. Hopefully, the self-healing capabilities and scaling features of Kubernetes will regain prominence because they enable faster innovation and resource optimization.
Guillaume Moigneu: Yes, containers are definitely here to stay, whether based on Kubernetes, Docker, or something else. They provide many advantages compared to virtual machines—especially when it comes to efficiency and scalability.
Another interesting trend is serverless computing. By serverless, I’m referring to services like AWS Lambda, where you have function-as-a-service capabilities rather than completely serverless operations.
With serverless, you don’t manage the servers actively. They exist and run processes, but you’re abstracted away from the underlying hardware. This model was a big hit two or three years ago because of its near-infinite scalability and lack of resource management requirements. However, serverless comes with some downsides.
The first issue is cost. When your application uses more resources or runs larger workloads, the costs can escalate rapidly. Bootstrapping functions, particularly during cold starts, consumes significant resources and time, making it expensive. For many applications, running a traditional daemon 24/7 is more cost-effective.
The second issue is performance. When serverless functions interact with databases or perform other resource-intensive tasks, they can face locking mechanisms or other execution delays, adding latency. This can ultimately negate some of the expected benefits of serverless.
Lastly, while serverless frameworks and applications initially gained traction, their adoption has slowed. Clients now tend to reserve serverless for highly optimized, specific use cases. Most backends and monolithic architectures still run on traditional servers and processes.
So, serverless remains a mixed bag. It’s not disappearing, but its adoption has plateaued for now. Perhaps we’ll see advancements or new use cases in the coming years.
Corey Dockendorf: That aligns with what we’ve observed. There’s also been a growing emphasis on security integration. Cyber threats have become more sophisticated, and organizations are prioritizing security integration throughout the development and operational workflow. This shift highlights the adoption of DevSecOps, embedding security into every phase of the software development lifecycle.
Instead of treating security as an afterthought or a final step, companies are integrating it directly into their CI/CD pipelines and development processes. Tools like Snyk and OWASP dependency checks are used to scan code for vulnerabilities automatically. Infrastructure security tools, such as HashiCorp Vault, are also being leveraged to secure the overall development lifecycle.
With advanced bot traffic affecting sites regularly—sometimes accounting for nearly 50% of activity—security can no longer be an afterthought. Including it throughout the lifecycle ensures that by the time an application or site goes live, potential vulnerabilities have been addressed, reducing the risk of breaches or downtime.
Guillaume Moigneu: Something that’s changed significantly in the past two years is the emergence of new vector attacks targeting dependencies and packages. We’ve seen this with ecosystems like NPM and PyPI. Nearly all programming languages have been affected.
For example, an attacker takes over a public package on GitHub, injects malicious code, and developers unknowingly pull in the compromised dependency. This can happen with something as common as bootstrapping a Next.js project, which involves hundreds of dependencies. If just one of those has a backdoor, your application is at risk.
This makes static analysis tools and dependency management tools an absolute necessity in your CI/CD pipelines. They’ve become a critical part of modern software development to detect and mitigate these risks.
Corey Dockendorf: Absolutely. And while automation is key here, human oversight remains essential. Automation tools can catch a lot, but they can’t replace human intuition and judgment. You still need to review what’s being flagged and ensure nothing slips through the cracks.
Guillaume Moigneu: Agreed. Another trend we’re seeing is edge computing, which can mean many things—running processes directly on IoT devices, in vehicles, or even within CDN points of presence.
At Platform.sh, our approach to edge computing focuses on content delivery and running workloads closer to the end user. This opens up new possibilities for performance optimization and data collection while adhering to privacy laws. For example, we’re exploring ways to collect analytics data in compliance with GDPR while bypassing traditional tracking methods that are increasingly being blocked.
The edge computing space is still evolving, and best practices are hard to pin down. However, major CDN providers like Cloudflare and Fastly are leading the charge, building new capabilities that make edge computing more versatile. Developers will need to adapt and develop new paradigms to harness these innovations effectively.
Corey Dockendorf: And edge computing isn’t limited to CDNs. We’ll likely see it ramp up in public transit systems, healthcare, and mobile data exchanges. The shift to embedding more computation locally on devices, such as AI inference on mobile phones, is going to change workflows for developers significantly.
Guillaume Moigneu: Definitely. However, tooling for edge computing still has limitations. For instance, deploying workers on Cloudflare doesn’t even offer versioning yet. So while the technology is promising, there’s room for improvement in terms of usability and risk management.
Corey Dockendorf: Continuous improvement ties directly into containerization and all these other advancements we’ve been discussing. This is something that’s ongoing and will remain a cornerstone of modern development practices. However, we’ve noticed that not all organizations are leveraging it effectively.
Continuous improvement has become critical for optimizing application performance. It’s no longer acceptable to push bad code. With the tools available today, developers can gain insights and identify inefficiencies early in the pipeline. This not only reduces costs but also enhances the user experience, which is crucial for staying competitive.
Guillaume Moigneu: Tools like Blackfire.io play a pivotal role here. Blackfire not only provides performance monitoring but also allows for profiling, helping teams identify bottlenecks in their code, inefficient database queries, and resource utilization issues. It calculates an impact score, prioritizing the most critical bottlenecks—those that affect the majority of users or consume the most resources.
This level of insight is invaluable. It integrates directly into CI/CD pipelines, ensuring issues are addressed in development and staging environments before they ever reach production. This saves time and money and allows developers to focus on building features rather than troubleshooting.
Corey Dockendorf: Exactly. Developers don’t want to waste cycles digging through logs to find that one bug causing issues. Tools like Blackfire streamline this process, alerting developers to specific problems so they can fix them quickly and get back to innovating.
Guillaume Moigneu: And it’s not just about the tools. DevOps teams often spend significant time on tasks that don’t add value—like updating PHP versions or upgrading servers. These repetitive, non-value-added tasks consume up to 40% of their time. This is a problem we’ve been addressing for the past 12 years at Platform.sh, automating as much as possible to free up teams for higher-impact work.
Corey Dockendorf: Monitoring and analytics go hand in hand with continuous improvement. You can’t improve what you don’t measure. As applications grow more complex and scale up, understanding what’s happening becomes even more critical.
There’s a growing ecosystem of tools—like Datadog, Sentry, and, as we’ve mentioned, Blackfire—that help monitor performance, infrastructure, and user behavior. These tools are essential for identifying pain points and opportunities for optimization.
Guillaume Moigneu: A great example of this is Blackfire’s impact score, which helps prioritize fixes by focusing on transactions that occur frequently and cause the most significant slowdowns. If a transaction runs a million times a month and is slow, addressing it has a much larger impact than optimizing something that only runs twice a month.
But the challenge remains in consolidating monitoring tools. Developers often have their own, sysadmins use another, and DevOps teams have theirs as well. To truly understand what’s happening across the entire stack, there needs to be better integration and collaboration between these tools.
Moving on, let’s talk about the changes brought about by COVID-19. Even though it’s been several years, its impact on working practices is still evident. Many companies are trying to bring employees back to the office, but collaboration remains a significant challenge.
What’s your take on this, Corey?
Corey Dockendorf: Collaboration has become more complicated because we have so many options now. Every tool seems to do something unique, but the lack of standardization causes fragmentation. For example, switching between Zoom, Google Meet, Teams, or Azure calls can be frustrating. You have to adjust your camera, audio, and settings constantly. It seems minor, but these repeated interruptions add up.
The same challenges apply internally within companies. Different teams use different tools—Jira, Trello, or others—and this creates silos unintentionally. You end up with team members not on the same page because they’re working in different systems. It’s bloated and inefficient.
Guillaume Moigneu: Exactly. It would be great to see more unified remote work tools that simplify collaboration instead of complicating it. A lot of this fragmentation comes from teams adopting new tools without a clear strategy, leading to inefficiency.
Now, let’s shift gears and talk about trends that are critical both for us as a company and for our clients. One of the biggest topics is AI. While the impact of AI on infrastructure might not be clear yet, its influence on development is undeniable—for better or worse.
Corey Dockendorf: Tools like GitHub Copilot and Cursor are amazing for speeding up the development process. They help debug code, write faster, and even learn new things. However, these tools require a level of expertise to use effectively. For someone with 20 years of development experience, they’re incredibly useful because you know what to ask and how to evaluate the output. But for junior developers, there’s a risk of over-reliance and skill degradation.
Guillaume Moigneu: Another issue is the potential for outdated or unsuitable solutions being suggested by these tools. Since AI models are trained on publicly available data, they might offer code that doesn’t fit your specific use case or is no longer best practice. Developers must carefully review the results instead of blindly accepting them.
Corey Dockendorf: That’s a great point. AI is as powerful as we make it, but it requires thoughtful implementation. Intellectual property concerns, security risks, and compliance issues add another layer of complexity. Still, ignoring AI isn’t an option. Companies need to start experimenting with these tools, even if they’re not ready to use them in production workflows yet.
Guillaume Moigneu: Exactly. Platforms like Poolside.ai, which are focusing on AI models dedicated to coding, are advancing rapidly. GitHub Copilot is improving daily as well. Companies that start testing and adopting these tools now will have a significant advantage in the coming months.
Corey Dockendorf: Let’s talk about cloud application platforms next. Gartner recently released a quadrant focused on these platforms, highlighting their growing importance. Companies are moving away from managing their own hardware—whether that’s AWS Elasticsearch, Azure VMs, or something else. They want to focus on building applications, not managing infrastructure.
Guillaume Moigneu: Absolutely. Developers want to get started quickly without worrying about firewalls, scaling, or operating system updates. Cloud application platforms handle all of this, allowing developers to focus on coding. These platforms simplify deployment and runtime management, making it easier to build modern cloud applications.
Corey Dockendorf: And this shift isn’t just about speed. It’s also about reducing complexity and lowering costs. Instead of spending time managing VMs or tuning Kubernetes clusters, developers can focus on creating value. That agility makes cloud application platforms an essential part of the development ecosystem.
Guillaume Moigneu: For many organizations, especially non-tech companies, trying to build and maintain their own internal developer platforms (IDPs) has been a challenge. The learning curve is steep, and retaining the right talent is difficult. Even large enterprises in industries like healthcare or manufacturing often struggle with these initiatives.
As I was saying, companies that aren’t primarily tech-focused often underestimate the time and budget required to build internal developer platforms (IDPs). We’ve seen this with some of the largest enterprises in industries like healthcare and food services. Even multi-billion-dollar companies have failed to make these projects work because they lacked the right strategy, people, and commitment.
For most use cases—especially web applications—cloud application platforms are a much better option. They allow you to get up and running quickly without the overhead of building and maintaining your own infrastructure. That’s not to say there aren’t specific use cases, like IoT or highly customized projects, where building your own platform might make sense. But for most businesses, leveraging existing solutions is the smarter choice.
Corey Dockendorf: Let’s dive into finances—everyone’s favorite topic, right?
Guillaume Moigneu: (laughing) Not mine, but it’s important. When I interviewed startups, one of the key questions I asked was how much of their revenue goes toward cloud costs. The average was about 40%. That’s a huge chunk of revenue being poured into AWS, Azure, or GCP without much oversight.
Startups are often so focused on building their products and gaining users that they don’t prioritize cost management. Two years later, they realize their cloud expenses are out of control and need to dig into it. This is where FinOps becomes critical. Managing cloud costs isn’t just about reducing expenses—it’s about understanding and optimizing them. It often requires a dedicated team to analyze usage, negotiate contracts, and ensure efficiency.
Corey Dockendorf: Post-COVID, the focus on profitability has intensified. Investors are less interested in user growth and more focused on sustainable business models. For many companies, cloud costs are the second-largest expense after payroll. If these costs aren’t managed, they can cripple a business.
Guillaume Moigneu: Absolutely. At Platform.sh, we’ve spent years building our FinOps capabilities. It’s a complex task that requires the right data, people, and processes. For example, we have a six-person team dedicated to cloud cost optimization. They ensure that we’re not wasting resources and that we’re leveraging the most cost-effective options available.
Corey Dockendorf: Another angle to this is the total cost of ownership (TCO). It’s not just about infrastructure costs but also the time spent managing it. Automating tasks—like scaling resources or reducing underutilized instances—can save hundreds of hours, translating into significant cost savings.
Guillaume Moigneu: Exactly. I remember a time when we accidentally ran a test cluster with 1,200 CPUs for a month because we didn’t have the processes in place to catch it. These kinds of mistakes are costly and highlight the need for better monitoring and management.
Corey Dockendorf: That ties into another critical concept: sustainability and carbon optimization. This has become a major focus for companies, especially in Europe, where regulations like the Corporate Sustainability Reporting Directive (CSRD) require detailed reporting on carbon emissions.
Guillaume Moigneu: At Platform.sh, we’ve been working on reducing our carbon footprint since signing the Paris Climate Accord in 2017. For example, moving workloads to carbon-friendly regions like Quebec or Sweden can significantly reduce emissions compared to regions with coal-heavy power grids. The network impact is minimal, but the environmental benefits are huge.
For European companies, there’s now a financial incentive as well. They’re starting to face fines for high emissions and are required to report their carbon consumption. To support this, we provide our clients with detailed carbon reports to help them comply with these regulations.
Corey Dockendorf: That brings us to the concept of “shift-left security.” Traditionally, testing—especially security testing—occurs late in the development lifecycle, after the software is built. The problem with this approach is that it can slow down or halt development when issues are discovered.
Guillaume Moigneu: Shift-left security integrates testing earlier in the process, during the planning and creation phases. By anticipating security needs upfront, teams can reduce issues later in development. It’s like tasting the batter before baking the cake—you catch mistakes early before they become costly to fix.
Corey Dockendorf: Exactly. Waiting until the end to test can increase the cost of fixing bugs exponentially—sometimes up to 30 or 40 times more than addressing them early. By incorporating security and quality assurance throughout the development process, teams can resolve issues faster and avoid costly delays.
That’s it for the main discussion. I’m Corey Dockendorf, Solutions Architect at Platform.sh. If you have any questions, comments, or concerns, feel free to ask. We’re also happy to provide custom solutions tailored to your needs. You can book a complimentary one-on-one session with me or another Solutions Architect through the form we’ve posted in the chat. We’d love to hear about what you’re building and see how we can help optimize your workflows for 2025 and beyond.