SurfaceNet: An Observational Data Platform Improved by Cloud

SurfaceNet: An Observational Data Platform Improved by Cloud

Met Office are responsible for collecting and processing observation data, used to analyse the country’s weather and climate, at weather stations around the world and coast. The observations are valuable to several different consumers, from meteorologists forecasting the weather to climate scientists trying to predict global trends resulting from global warming.

Met Office have been looking to build a replacement observations platform that is more efficient and appropriate for their current needs. CACI have been working with Met Office for the last two years to deliver the next generation system: ‘SurfaceNet’, with the primary requirements being that it is cost effective and scalable.

Given Met Office’s ethos of adopting a ‘Cloud First’ approach, and its partnership with AWS, it was an obvious choice to build the system in AWS’ Cloud.  The first important decision was to select what would be our main compute resource. The observation data would be arriving once a minute and, given this spiky arrival time, Lambda proved to be the most cost-effective solution, allowing us to only pay for small periods where compute was required. The platform processes several observations from roughly 400 stations every minute – equating to 15 billion observations per month – so any marginal improvements on compute cost would soon add up.

Choosing Lambda complemented our desire to have a largely serverless system to minimise maintenance costs, using other serverless AWS resources such as S3, Aurora Serverless, DynamoDB and SQS. This approach avoided the need to provision and manage servers and the associated costs involved with this. Serverless resources are highly available by design; Aurora Serverless mandates at least two Availability Zones that the database is deployed into, while DynamoDB and S3 resources have their data intrinsically spread over multiple data centres.

Most of the data ingest occurs by remote data loggers communicating via MQTT with the platform; AWS IoT Core was the ideal resource for managing this. Using API gateway, we developed a simple API on top of IoT Core allowing those administering the system to onboard new loggers, manage their certificates and monitor their statuses. The Simple Email Service (SES) allows ingestion of data from marine buoys and ships that transmit their data via Iridium Satellite. Both IoT Core and SES are fully managed by AWS, supplying an easy method of handling data from a range of protocols with minimal operational management.

From a development perspective, the stand-out benefit of working in the cloud has been having the ability to deploy fully representative environments to test against. Our infrastructure is defined using CloudFormation, enabling each developer to stand up their own copy of the system when adding a new feature. Eliminating the classic ‘works on my machine’ problems that plague local development allowed for rapid iteration cycles and far fewer bugs during testing. The process means constantly exercising the ability to deploy the system from scratch, which will come in handy when an unforeseen problem occurs in the future.

Whilst this suggests a flawless venture into the Cloud sector, the journey hasn’t been without problems. CloudFormation has been incredibly useful, but given the scale and the number of resources, it has become cumbersome. Despite our best mitigation efforts there is still a large amount of repetition, and the cumulative lines of YAML we have committed is on par with the number of lines of python. We would consider using the newer AWS CDK if we were to approach the project again. Additionally, we started off making new repositories for each new Lambda, but this has ended up limiting our ability to share code effectively across components, not to mention having to update ~40 repositories when we want to update buildspecs to use a new version of python.

It has been a fascinating couple of years and a main takeaway has been that large organisations such as Met Office, with large-scale bespoke data problems, see the cloud as a desired environment for building solutions. The maturity of the AWS platform has shown the cloud to be both robust and cheap enough to satisfy the requirements of complex systems, such as SurfaceNet, and will certainly play a big part in the future of both CACI and the Met Office.

Serverless Cloud Security Principles

Serverless Cloud Security Principles

Enterprise IT has evolved from on-premise, to renting space in datacentres, into the cloud and even more abstracted approaches like the current crop of serverless offerings. At CACI, we have worked with numerous customers to deliver serverless projects and each time the security considerations are always central to the design. Previous security practices and guidance have been focussed on the more traditional routes to enterprise IT, but does that guidance relate to serverless solutions?

The UK’s National Cyber Security Centre has always provided well considered and proven guidance on security practices. CACI have recently worked with them on a serverless project and used the opportunity to help review their 14 Cloud Security Principles.

So, how do they hold up?

One of the main benefits of moving to a cloud service is the delegation of responsibility for managing the physical infrastructure to a provider. All the cloud security principles that relate to the selection of that provider are still relevant and comprise of a very useful set of considerations. Those principles can be used a checklist of requirements that you should be looking for when you decide on a provider, if you look around, the main players in the market have already documented responses to the NCSCs guidance to make that easier.

A serverless solution to a problem typically has a few more moving parts than its monolithic counterparts and the some of the principles become more important as a result. Protecting your data in transit is a fundamental consideration for any project, but with greater amounts of communication between components in a serverless system, and the nature of the shared infrastructure these services are provided on, this becomes an ever more important concern. Measures such as ensuring connections to datastores, messages sent to queues and REST interfaces are all secured using TLS with a robust key policy go a long way to answering this concern, and many of the services provided by the major players come with these safeguards built in.

The principle concerning secure development practices are still very relevant and the adoption of a new style of architecting solutions with serverless components brings its own challenges. If your team do not have a good understanding of the provider’s services and the constraints that may be applied to them, for example some serverless versions of services only support certain versions of software, it is easy to leave routes open to malicious actors. Each of the major providers have partner programs where there are companies that offer a range of services from the traditional penetration test to a full architectural review of your solution. It is worth considering if the use of these external services is appropriate to you.

Ultimately, the last principle in the list is still one of the most important messages. You are responsible for the proper use of the tools you opt to use from the provider. If you don’t fully understand what each service does, the constraints around its use and the best practices for that use, you run the risk of undermining whatever protection your provider has built into the service and exposing your solution to attack from malicious or misinformed use. Some of the simplest mistakes have led to massive breaches of data, accidently checking the box to make an S3 bucket public allows anybody to download the data and numerous high-profile companies have lost control of their data this way.

The use of cloud services in general, and serverless options in particular, give you an almost unlimited opportunity to scale your solutions to solve problems at a massive scale but remember – you pay for what you use. Including some good service monitoring into your solution, and a basic understanding of the pricing models of your chosen provider, should give you the peace of mind to fully utilise the power and flexibility of the serverless architecture.