2022 Gartner® Magic Quadrant™ SIEM
Get the reportMore
Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.
Logs are valuable. Logs generated by a major backend resource that provides clients with access to crucial data are more than just valuable; knowing where they are and being able to manage and understand the information that they contain can mean the difference between smooth, secure operation and degraded performance or even catastrophic failure for your application.
How have monitoring tools evolved over the years? That’s a big question, and one that few people are capable of answering based on personal experience. Monitoring software has been around in one form or another since the early years of computing, and few people who are active in the profession today were working then.
AWS Config is an indispensable service with a bit of an identity problem: It really should be called something like “AWS Monitors Everything And Keeps Your Apps In Compliance,” because it is that important. But since there’s no way to put everything it does in a short, snappy name, “AWS Config” will do.What does AWS Config do? Basically, it monitors the current and past configurations of your AWS resources, compares those configurations to your target configurations, and reports current configurations, changes to configurations, and the ways in which your resources interact (with reference to configuration).Let’s take a closer look at what that means and how it works, starting with the “how it works” part…How AWS Config WorksAWS Config continually monitors the configuration of your AWS resources. It records configuration changes in a normalized format, and makes that information available through its API. It also compares current configurations with configuration standards you have established, and makes that information available in dashboard format via its API.AWS Config can also be optionally set to send text alerts regarding both configuration changes and its evaluation of existing configurations vs. your configuration standards.By default, AWS Config tracks the configuration of all of your resources, recording configurations, metadata, attributes, and associated relationships and events. You can, however, tell it to track only specific types of resources.It takes snapshots of resource configurations, and it records an ongoing stream of resource configuration changes, storing this data in configuration histories. These histories can include software (down to the application level), providing you with a comprehensive record of your AWS operation’s configuration.Configuration standards are contained in rules. You can use Amazon’s preconfigured set of rules (which may be fully adequate for many operations), customize those rules, or define your own set of rules. In all cases, AWS Config checks configurations against these rules, and reports the current state of compliance with them.What AWS Config Means to YouWhat does this mean for your organization’s AWS operations? Monitoring is vital to any Internet or network-based application or service, of course. Without it, you cannot guarantee the functionality or security of your software. Configuration monitoring has a special role, since it provides direct insight into an application’s state, its relationship with its environment, and the rules and conditions under which it is currently operating.Most kinds of software monitoring are symptomatic, recording behavior in one form or another, whether it is I/O, CPU or memory use, calls to other modules or system resources, or error messages. This makes it possible to detect many types of trouble and track performance, but it generally does not directly indicate the cause of most functional or performance problems.Configuration monitoring, on the other hand, can give you a direct view into the possible causes of such problems. How does this work? Since AWS Config allows you to codify configuration rules, let’s start with compliance.Regulatory ComplianceMany of the online services available today are in regulated industries. This is true of banking and other financial services, of course, but it also applies to such things as health services, insurance, and public utilities. In many cases, failure to comply with regulatory standards for online services can result in significant financial or even legal penalties. These standards (particularly those affecting confidentiality and data security) can and often are reflected in configuration settings.If, for example, you provide online financial services, you may be required to provide a high level of security for both customer and transaction records, to maintain secure records of all activity, and to detect and record anomalous actions. At least some of these requirements may in turn require you to maintain specific configuration settings.If you include the required settings in your customized AWS Config rules, you will have a way to automatically determine whether your site’s configuration has gone out of compliance. You can set AWS Config to automatically send a text alert to the engineers and managers responsible for compliance, so that they can quickly investigate the problem and adjust the configuration to bring your services back into compliance.In-House StandardsEven if you do not operate in a regulated industry, you may need to comply with in-house standards within your company, particularly when it comes to things such as security and performance, both of which can require you to maintain specific configuration settings. AWS Config can automatically notify you of any configuration changes which may have an effect on security or performance, so that you remain fully compliant with your company’s standards.Error and Performance TroubleshootingThe configuration histories that AWS Config records can also be very valuable in tracing both errors and performance problems. You can look back through the historical record to find out when specific configuration changes took place, and try to correlate them with software failures or performance degradation.AWS Config and SumoAs is often the case with monitoring data, the output from AWS Config becomes considerably more valuable when it is integrated into a comprehensive, analytics-based dashboard system.The Sumo Logic App for AWS Config provides easy integration of AWS Config data into Sumo’s extensive analytics and dashboard system. It gives you not only a powerful overview, but also a detailed look at resource modifications, as well as drill-down insight into resource details.Analytics-based features such as these, which turn AWS Config’s raw data into genuine, multidimensional insights, make it possible to use such data for real-time configuration and performance management, security monitoring, and application optimization.Monitoring configuration data gives you greater hands-on control over security, performance, and functionality, and it provides you with insights which are simply not available with conventional, behavior-based application monitoring by itself. By combining AWS Config and the power of Sumo Logic’s analytics, you can turn your team into genuine software-management superheroes.About the AuthorMichael Churchman is involved in the analysis of software development processes and related engineering management issues.AWS Config: Monitoring Resource Configurations for Compliance is published by the Sumo Logic DevOps Community. Be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.
Why do you need patterns for building a successful microservices architecture? Shouldn’t the same basic principles apply, whether you’re designing software for a monolithic or microservices architecture? Those principles do largely hold true at the highest and most abstract levels of design (i.e., the systems level), and at the lowest and most concrete levels (such as classes and functions). But most code design is really concerned with the broad range between those two extremes, and it is there that the very nature of microservices architecture requires not only new patterns for design, but also new patterns for reimagining existing monolithic applications. The truth is that there is nothing in monolithic architecture that inherently imposes either structure or discipline in design. Almost all programming languages currently in use are designed to enforce structure and discipline at the level of coding, of course, but at higher levels, good design still requires conscious adherence to methodologies that enforce a set of architectural best practices. Microservices architecture, on the other hand, does impose by its very nature a very definite kind of structural discipline at the level of individual resources. Just as it makes no sense to cut a basic microservice into arbitrary chunks, and separate them, it makes equally little sense to bundle an individual service with another related or unrelated service in an arbitrary package, when the level of packaging that you’re working with is typically one package per container. Microservices Architecture Requires New Patterns In other words, you really do need new patterns in order to successfully design microservices architecture. The need for patterns starts at the top. If you are refactoring a monolithic program into a microservices-based application, the first pattern that you need to consider is the one that you will use for decomposition. What pattern will you use as a guide in breaking the program down into microservices? What are the basic decomposition patterns? At the higher levels of decomposition, it makes sense to consider such functional criteria as broad areas of task-based responsibility (subdomains), or large-scale business/revenue-generating responsibilities (business capabilities). In practice, there is considerable overlap between these two general functional patterns, since a business’ internal large-scale organization of tasks is likely to closely match the organization of business responsibilities. In either case, decomposition at this level should follow the actual corporate-level breakdown of basic business activities, such as inventory, delivery, sales, order processing, etc. In the subsequent stages of decomposition, you can define groups of microservices, and ultimately individual microservices. This calls for a different and much more fine-grained pattern of decomposition—one which is based largely on interactions within the application, with individual users, or both. Decomposition Patterns for Microservices Architecture There are several ways to decompose applications at this level, depending in part on the nature of the application, as well as the pattern for deployment. You can combine decomposition patterns, and in many if not most cases, this will be the most practical and natural approach. Among the key microservice-level decomposition patterns are: Decomposition by Use Case In many respects, this pattern is the logical continuation of a large-scale decomposition pattern, since business capabilities and subdomains are both fundamentally use case-based. In this pattern, you first identify use cases: sequences of actions which a user would typically follow in order to perform a task. Note that a user (or actor) does not need to be a person; it can, in fact, be another part of the same application. A use case could be something as obvious and common as filling out an online form or retrieving and displaying a database record. It could also include tasks such as processing and saving streaming data from a real-time input device, or polling multiple devices to synchronize data. If it seems fairly natural to model a process as a unified set of interactions between actors with an identifiable purpose, it is probably a good candidate for the use case decomposition pattern. Decomposition by Resources In this pattern, you define microservices based on the resources (storage, peripherals, databases, etc.) that they access or control. This allows you to create a set of microservices which function as channels for access to individual resources (following the basic pattern of OS-based peripheral/resource drivers), so that resource-access code does not need to be duplicated in other parts of the application. Isolating resource interfaces in specific microservices has the added advantage of allowing you to accommodate changes to a resource by updating only the microservice that accesses it directly. Decomposition by Responsibilities/Functions This pattern is likely to be most useful in the case of internal operations which perform a clearly defined set of functions that are likely to be shared by more than one part of the application. Such responsibility domains might include shopping cart checkout, inventory access, or credit authorization. Other microservices could be defined in terms of relatively simple functions (as is the case with many built-in OS-based microservices) rather than more complex domains. Microservices Architecture Deployment Patterns Beyond decomposition, there are other patterns of considerable importance in building a microservices-based architecture. Among the key patterns are those for deployment. There are three underlying patterns for microservices deployment, along with a few variations: Single Host/Multiple Services In this pattern, you deploy multiple instances of a service on a single host. This reduces deployment overhead, and allows greater efficiency through the use of shared resources. It has, however, greater potential for conflict, and security problems, since services interacting with different clients may be insufficiently isolated from each other. Single Service per Host, Virtual Machine, or Container This pattern deploys each service in its own environment. Typically, this environment will be a virtual machine (VM) or container, although there are times when the host may be defined at a less abstract level. This kind of deployment provides a high degree of flexibility, with little potential for conflict over system resources. Services are either entirely isolated from those used by other clients (as is the case with single-service-per-VM deployment), or can be effectively isolated while sharing some lower-level system resources (i.e., containers with appropriate security features). Deployment overhead may be greater than in the single host/multiple services model, but in practice, this may not represent significant cost in time or resources. Serverless/Abstracted Platform In this pattern, the service runs directly on pre-configured infrastructure made available as a service (which may be priced on a per-request basis); deployment may consist of little more than uploading the code, with a small number of configuration settings on your part. The deployment system places the code in a container or VM, which it manages. All you need to make use of the microservice is its address. Among the most common serverless environments are AWS Lambda, Azure Functions, and Google Cloud Functions. Serverless deployment requires very little overhead. It does, however, impose significant limitations, since the uploaded code must be able to meet the (often strict) requirements of the underlying infrastructure. This means that you may have a limited selection of programming languages and interfaces to outside resources. Serverless deployment also typically rules out stateful services. Applying Other Patterns to Microservices Architecture There are a variety of other patterns which apply to one degree or another to microservices deployment. These include patterns for communicating with external applications and services, for managing data, for logging, for testing, and for security. In many cases, these patterns are similar for both monolithic and microservices architecture, although some patterns are more likely to be applicable to microservices than others. Fully automated parallel testing in a virtualized environment, for example, is typically the most appropriate pattern for testing VM/container-based microservices. As is so often the case in software development (as well as more traditional forms of engineering), the key to building a successful microservices architecture lies in finding the patterns that are most suitable to your application, understanding how they work, and adapting them to the particular circumstances of your deployment. Use of the appropriate patterns can provide you with a clear and accurate roadmap to successful microservices architecture refactoring and deployment. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. Top Patterns for Building a Successful Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.
How can you get the most out of monitoring your AWS Lambda functions? In this post, we’ll take a look at the monitoring and logging data that Lambda makes available, and the value that it can bring to your AWS operations. You may be thinking, “Why should I even monitor AWS Lambda? Doesn’t AWS take care of all of the system and housekeeping stuff with Lambda? I thought that all the user had to do was write some code and run it!” A Look at AWS Lambda If that is what you’re thinking, then for the most part, you’re right. AWS Lambda is designed to be a simple plug-and-play experience from the user’s point of view. Its function is simply to run user-supplied code on request in a standardized environment. You write the code, specifying some basic configuration parameters, and upload the code, the configuration information, and any necessary dependencies to AWS Lambda. This uploaded package is called a Lambda function. To run the function, you invoke it from an application running somewhere in the AWS ecosystem (EC2, S3, or most other AWS services). When Lambda receives the invoke request, it runs your function in a container; the container pops into existence, does its job, and pops back out of existence. Lambda manages the containers—You don’t need to (and can’t) do anything with them. So there it is—Lambda. It’s simple, it’s neat, it’s clean, and it does have some metrics which can be monitored, and which are worth monitoring. Which Lambda Metrics to Monitor? So, which Lambda metrics are important, and why would you monitor them? There are two kinds of monitoring information which AWS Lambda provides: metrics displayed in the AWS CloudWatch console, and logging data, which is handled by both CloudWatch and the CloudTrail monitoring service. Both types of data are valuable to the user—the nature of that value and the best way to make use of it depend largely on the type of data. Monitoring Lambda CloudWatch Console Metrics Because AWS Lambda is strictly a standardized platform for running user-created code, the metrics that it displays in the CloudWatch console are largely concerned with the state of that code. These metrics include the number of invocation requests that a function receives, the number of failures resulting from errors in the function, the number of failures in user-configured error handling, the function’s duration, or running time, and the number of invocations that were throttled as a result of the user’s concurrency limits. These are useful metrics, and they can tell you a considerable amount about how well the code is working, how well the invocations work, and how the code operates within its environment. They are, however, largely useful in terms of functionality, debugging, and day-to-day (or millisecond-to-millisecond) operations. Monitoring and Analyzing AWS Lambda Logs With AWS Lambda, logging data is actually a much richer source of information in many ways. This is because logging provides a cumulative record of actions over time, including all API calls made in connection with AWS Lambda. Since Lambda functions exist for the most part to provide support for applications and websites running on other AWS services, Lambda log data is the main source of data about how a function is doing its job. “Logs,” you say, like Indiana Jones surrounded by hissing cobras. “Why does it always have to be logs? Digging through logs isn’t just un-fun, boring, and time-consuming. More often than not, it’s counter-productive, or just plain impractical!” And once again, you’re right. There isn’t much point in attempting to manually analyze AWS Lambda logs. in fact, you have three basic choices: either ignore the logs, write your own script for extracting and analyzing log data, or let a monitoring and analytics service do the work for you. For the majority of AWS Lambda users, the third option is by far the most practical and the most useful. Sumo Logic’s Log Analytics Dashboards for Lambda To get a clearer picture of what can be done with AWS Lambda metrics and logging data, let’s take a look at how the Sumo Logic App for AWS Lambda extracts useful information from the raw data, and how it organizes that data and presents it to the user. On the AWS side, you can use a Lambda function to collect CloudWatch logs and route them to Sumo Logic. Sumo integrates accumulated log and metric information to present a comprehensive picture of your AWS Lambda function’s behavior, condition, and use over time, using three standard dashboards: The Lambda Overview Dashboard The Overview dashboard provides a graphic representation of each function’s duration, maximum memory usage, compute usage, and errors. This allows you to quickly see how individual functions perform in comparison with each other. The Overview dashboard also breaks duration, memory, and compute usage down over time, making it possible to correlate Lambda function activity with other AWS-based operations, and it compares the actual values for all three metrics with their predicted values over time. This last set of values (actual vs. predicted) can help you pinpoint performance bottlenecks and allocate system resources more efficiently. The Lambda Duration and Memory Dashboard Sumo Logic’s AWS Lambda Duration and Memory dashboard displays duration and maximum memory use for all functions over a 24-hour period in the form of both outlier and trend charts. The Billed Duration by Hour trend chart compares actual billed duration with predicted duration on an hourly basis. In a similar manner, the Unused Memory trend chart shows used, unused, and predicted unused memory size, along with available memory. These charts, along with the Max Memory Used box plot chart, can be very useful in determining when and how to balance function invocations and avoid excessive memory over- or underuse. The Lambda Usage Dashboard The Usage dashboard breaks down requests, duration, and memory usage by function, along with requests by version alias. It includes actual request counts broken down by function and version alias. The Usage dashboard also includes detailed information on each function, including individual request ID, duration, billing, memory, and time information for each request. The breakdown into individual requests makes it easy to identify and examine specific instances of a function’s invocation, in order to analyze what is happening with that function on a case-by-case level. It is integrated, dashboard-based analytics such as those presented by the Sumo Logic App for AWS Lambda that make it not only possible but easy to extract useful data from Lambda, and truly make the most of AWS Lambda monitoring. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.
Looking for some logging moves that will impress your business partner? In this post, we’ll show you a few. But first, a note of caution:If you’re going to wow your business partner, make a visiting venture capitalist’s jaw drop, or knock the socks off of a few stockholders, you could always accomplish that with something that has a lot of flash, and not much more than that, or you could show them something that has real and lasting substance, and will make a difference in your company’s bottom line. We’ve all seen business presentations filled with flashy fireworks, and we’ve all seen how quickly those fireworks fade away.Around here, though, we believe in delivering value—the kind that stays with your organization, and gives it a solid foundation for growth. So, while the logging moves that we’re going to show you do look good, the important thing to keep in mind is that they provide genuine, substantial value—and discerning business partners and investors (the kind that you want to have in your corner) will recognize this value quickly.Why Is Log Monitoring Useful?What value should logs provide? Is it enough just to accumulate information so that IT staff can pick through it as required? That’s what most logs do, varying mostly in the amount of information and the level of detail. And most logs, taken as raw data, are very difficult to read and interpret; the most noticeable result of working with raw log data, in fact, is the demand that it puts on IT staff time.5 Log Monitoring Steps to SuccessMost of the value in logs is delivered by means of systems for organizing, managing, filtering, analyzing, and presenting log data. And needless to say, the best, most impressive, most valuable logging moves are those which are made possible by first-rate log management. They include:Quick, on-the-spot, easy-to-understand analytics. Pulling up instant, high-quality analytics may be the most impressive move that you can make when it comes to logging, and it is definitely one of the most valuable features that you should look for in any log management system. Raw log data is a gold mine, but you need to know how to extract and refine the gold. A high-quality analytics system will extract the data that’s valuable to you, based on your needs and interests, and present it in ways that make sense. It will also allow you to quickly recognize and understand the information that you’re looking for.Monitoring real-time data. While analysis of cumulative log data is extremely useful, there are also plenty of situations where you need to see what is going on right at the moment. Many of the processes that you most need to monitor (including customer interaction, system load, resource use, and hostile intrusion/attack) are rapid and transient, and there is no substitute for a real-time view into such events. Real-time monitoring should be accompanied by the capacity for real-time analytics. You need to be able to both see and understand events as they happen.Fully integrated logging and analytics. There may be processes in software development and operations which have a natural tendency to produce integrated output, but logging isn’t one of them. Each service or application can produce its own log, in its own format, based on its own standards, without reference to the content or format of the logs created by any other process. One of the most important and basic functions that any log management system can perform is log integration, bringing together not just standard log files, but also event-driven and real-time data. Want to really impress partners and investors? Bring up log data that comes from every part of your operation, and that is fully integrated into useful, easily-understood output.Drill-down to key data. Statistics and aggregate data are important; they give you an overall picture of how the system is operating, along with general, system-level warnings of potential trouble. But the ability to drill down to more specific levels of data—geographic regions, servers, individual accounts, specific services and processes —is what allows you to make use of much of that system-wide data. It’s one thing to see that your servers are experiencing an unusually high level of activity, and quite another to drill down and see an unusual spike in transactions centered around a group of servers in a region known for high levels of online credit card fraud. Needless to say, integrated logging and scalability are essential when it comes to drill-down capability.Logging throughout the application lifecycle. Logging integration includes integration across time, as well as across platforms. This means combining development, testing, and deployment logs with metrics and other performance-related data to provide a clear, unified, in-depth picture of the application’s entire lifecycle. This in turn makes it possible to look at development, operational, and performance-related issues in context, and see relationships which might not be visible without such cross-system, full lifecycle integration.Use Log Monitoring to Go for the GoldSo there you have it—five genuine, knock-’em-dead logging moves. They’ll look very impressive in a business presentation, and they’ll tell serious, knowledgeable investors that you understand and care about substance, and not just flash. More to the point, these are logging capabilities and strategies which will provide you with valuable (and often crucial) information about the development, deployment, and ongoing operation of your software.Logs do not need to be junkpiles of unsorted, raw data. Bring first-rate management and analytics to your logs now, and turn those junk-piles into gold.5 Log Monitoring Moves to Wow Your Business Partner is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.About the AuthorMichael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.
When you roll out a new deployment, how do you roll? With a big bang? A blue/green deployment? Or do you prefer a Canary Release? There’s a lot to be said for the Canary Release strategy of testing new software releases on a limited subset of users. It reduces the risk of an embarrassing and potentially costly public failure of your application to a practical minimum. It allows you to test your new deployment in a real-world environment and under a real-world load. It allows a rapid (and generally painless) rollback. And if there’s a failure of genuinely catastrophic proportions, only a small subset of your users will even notice the problem. But when you use Canary Release, are you getting everything you can out of the process? A full-featured suite of analytics and monitoring tools is — or should be — an indispensable part of any Canary Release strategy. The Canary Release Pattern In a Canary Release, you initially release the new version of your software on a limited number of servers, and make it available to a small subset of your users. You monitor it for bugs and performance problems, and after you’ve taken care of those, you release it to all of your users. The strategy is named after the practice of taking canaries into coal mines to test the quality of the air; if the canary stopped singing (or died), it meant that the air was going bad. In this case, the “canary” is your initial subset of users; their exposure to your new release allows you detect and fix the bugs, so your general body of users won’t have to deal with them. Ideally, in a strategy such as this, you want to get as much useful information as possible out of your initial sample, so that you can detect not only the obvious errors and performance issues, but also problems which may not be so obvious, or which may be relatively slow to develop. This is where good analytic tools can make a difference. Using Analytics to Support a Canary Release In fact, the Canary Release strategy needs at least some analytics in order to work at all. Without any analytics, you would have to rely on extremely coarse-grained sources of information, such as end-user bug reports and obvious crashes at the server end, which are very likely to miss the problems that you actually need to find. Such problems, however, generally will show up in error logs and performance logs. Error statistics will tell you whether the number, type, and concentration (in time or space) of errors is out of the expected range. Even if they can’t identify the specific problem, such statistics can suggest the general direction in which the problem lies. And since error logs also contain records of individual errors, you can at least in theory pinpoint any errors which are likely to be the result of newly-introduced bugs, or of failed attempts to eliminate known bugs. The problem with identifying individual errors in the log is that any given error is likely to be a very small needle in a very large haystack. Analytics tools which incorporate intelligent searches and such features as pattern analysis and detection of unusual events allow you to identify likely signs of a significant error in seconds. Without such tools, the equivalent search might take hours, whether it uses brute force or carefully-crafted regex terms. Even being forced by necessity to do a line-by-line visual scan of an error log, however, is better than having no error log at all. Logs that monitor such things as performance, load, and load distribution can also be useful in the Canary Release strategy. Bugs which don’t produce clearly identifiable errors may show up in the form of performance degradation or excessive traffic. Design problems may also leave identifiable traces in performance logs; poor design can cause traffic jams, or lead to excessive demands on databases and other resources. You can enhance the value of your analytics, and of the Canary Release itself, if you put together an in-depth demographic profile of the user subset assigned to the release. The criteria which you use in choosing the subset, of course, depends on your needs and priorities, as well as the nature of the release. It may consist of in-house users, of a random selection from the general user base, or of users carefully chosen to represent either the general user base, or specific types of user. In any of these cases, however, it should be possible to assemble a profile of the users in the subset. If you know how the users in the subset make use of your software (which features they access most frequently, how often they use the major features, and at what times of day, how this use is reflected in server loads, etc.), and if you understand how these patterns of use compared to those of you general user base, the process of extrapolation from Canary Release analytics should be fairly straightforward, as long as you are using analytic tools which are capable of distilling out the information that you need. So yes, Canary Release can be one of the most rewarding deployment strategies — when you take full advantage of what it has to offer by making intelligent use of first-rate analytic tools. Then the canary will really sing! About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the 90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.
“Of course our pipeline is fully automated! Well, we have to do some manual configuration adjustments on a few of our bare metal servers after we run the install scripts, but you know what I mean…” We do know what you mean, but that is not full automation. Call it what it really is — partial automation in a snowflake environment. A snowflake configuration is ad-hoc and “unique” to the the environment at large. But in DevOps, you need to drop unique configurations, and focus on full-automation. What’s Wrong With Snowflake Configurations? In DevOps, a snowflake is a server that requires special configuration beyond that covered by automated deployment scripts. You do the automated deployment, and then you tweak the snowflake system by hand. For a long time (through the ‘90s, at least), snowflake configurations were the rule. Servers were bare metal, and any differences in hardware configuration or peripherals (as well as most differences in installed software) meant that any changes had to be handled on a system-by-system basis. Nobody even called them snowflakes. They were just normal servers. But what’s normal in one era can become an anachronism or an out-and-out roadblock in another era -— and nowhere is this more true than in the world of software development. A fully-automated, script-driven DevOps pipeline works best when the elements that make up the pipeline are uniform. A scripted deployment to a thousand identical servers may take less time and run more smoothly than deployment to half a dozen servers that require manual adjustments after the script has been run. For more on DevOps pipelines, see, “How to Build a Continuous Delivery Pipeline” › No Virtual Snowflakes … A virtual snowflake might be at home on a contemporary Christmas tree, but there’s no place for virtual snowflakes in DevOps. Cloud-based virtual environments are by their nature software-configurable; as long as the cloud insulates them from any interaction with the underlying hardware, there is no physical reason for a set of virtual servers running in the same cloud environment to be anything other than identical. Any differences should be based strictly on functional requirements — if there is no functional reason for any of the virtual servers to be different, they should be identical. Why is it important to maintain such a high degree of uniformity? In DevOps, all virtual machines (whether they’re full VMs or Docker) are containers in much the same way as steel shipping containers. When you ship something overseas, you’re only concerned with the container’s functional qualities. Uniform shipping containers are functionally useful because they have known characteristics, and they can be filled and stacked efficiently. This is equally true of even the most full-featured virtual machine when it is deployed as a DevOps container. This is all intrinsic to core DevOps philosophy. The container exists solely to deliver the software or services, and should be optimized for that purpose. When delivery to multiple virtual servers is automated and script-driven, optimization requires as much uniformity as possible in server configurations. For more on containers, see, “Kubernetes vs. Docker: What Does It Really Mean?” › What About Non-Virtual Snowflakes? If you only deal in virtual servers, it isn’t hard to impose the kind of standardization described above. But real life isn’t always that simple; you may find yourself working in an environment where some or all of the servers are bare metal. How do you handle a physical server with snowflake characteristics? Do you throw in the towel, and adjust it manually after each deployment, or are there ways to prevent a snowflake server from behaving like a snowflake? As it turns out, there are ways to de-snowflake a physical server — ways that are fully in keeping with core DevOps philosophy. First, however, consider this question: What makes a snowflake server a snowflake? Is it the mere fact that it requires special settings, or is it the need to make those adjustments outside of the automated deployment process (or in a way that interrupts the flow of that process)? A thoroughgoing DevOps purist might opt for the first definition, but in practical terms, the second definition is more than adequate. A snowflake is a snowflake because it must be treated as a snowflake. If it doesn’t require any special treatment, it’s not a snowflake. One way to eliminate the need for special treatment during deployment (as suggested by Daniel Lindner) is to install a virtual machine on the server, and deploy software on the virtual machine. The actual deployment would ignore the underlying hardware and interact only with the virtual system. The virtual machine would fully insulate the deployment from any of the server’s snowflake characteristics. What if it isn’t practical or desirable to add an extra virtual layer? It may still be possible to handle all of the server’s snowflake adjustments locally by means of scripts (or automated recipes, as Martin Fowler put it in his original Snowflake Server post), running on the target server itself. These local scripts would need to be able to recognize elements in the deployment which might require adjustments to snowflake configurations, then translate those requirements into local settings and apply them. If the elements that require local adjustments are available as part of the deployment data, the local scripts might intercept that data as the main deployment script runs. But if those elements are not obvious (if, for example, they are part of the compiled application code), it may be necessary to include a table of values which may require local adjustments as part of the deployment script (if not full de-snowflaking, at least a 99.99% de-snowflaking strategy). So, what is the bottom line on snowflake servers? In an ideal DevOps environment, they wouldn’t exist. In the less-than-ideal world where real-life DevOps takes place, they can’t always be eliminated, but you can still neutralize most or all of their snowflake characteristics to the point where they do not interfere with the pipeline. For more on virtual machines, see, “Docker Logs vs Virtual Machine Logs” › Next Up DevOps as a Service: Build Automation in the Cloud Learn about DevOps as a managed cloud service, the tools available, mutable vs immutable infrastructure, and more. The State of Modern Applications & DevSecOps in the Cloud Sumo Logic’s third annual report reveals how the world’s most cloud-savvy companies manage their modern applications. DevOps and Continuous Delivery Discover how Sumo Logic accelerates the CD pipeline with automated testing, integrated threat intelligence, and more.
Is bare metal infrastructure relevant in a DevOps world? The cloud has reduced hardware to little more than a substrate for the pool of resources that is the cloud itself. Those resources are the important part; the hardware is little more than a formality. Or at least that’s been the standard cloud-vs-metal story, until recently. Times change, and everything that was old does eventually become new again — usually because of a combination of unmet needs, improved technology, and a fresh approach. And the bare-metal comeback is no different. Unmet Needs The cloud is a pool not just of generic, but also shared resources (processor speed, memory, storage space, bandwidth). Even if you pay a premium for a greater share of these things, you are still competing with the other premium-paying customers. And the hard truth is that cloud providers can’t guarantee a consistent, high level of performance. Cloud performance depends on the demand placed on it by other users — demand which you can’t control. If you need reliable performance, there is a good chance that you will not find it in the cloud. This is particularly true if you’re dealing with large databases; Big Data tends to be resource-hungry, and it is likely to do better on a platform with dedicated resources down to the bare-metal level, rather than in a cloud, where it may have to contend with dueling Hadoops. The cloud can present sticky compliance issues, as well. If you’re dealing with formal data-security standards, such as those set by the Securities and Exchange Commission or by overseas agencies, verification may be difficult in a cloud environment. Bare metal provides an environment with more clearly-defined, hardware-based boundaries and points of entry. Improved Technology Even if Moore’s Law has been slowing down to sniff the flowers lately, there have been significant improvements in hardware capabilities, such as increased storage capacity, and the availability of higher-capacity solid state drives, resulting in a major boost in key performance parameters. And technology isn’t just hardware — it’s also software and system architecture. Open-source initiatives for standardizing and improving the hardware interface layers, along with the highly scalable, low-overhead CoreOS, make lean, efficient bare metal provisioning and deployment a reality. And that means that it’s definitely time to look closely at what bare metal is now capable of doing, and what it can now do better than the cloud. A Fresh Approach As technology improves, it makes sense to take a new look at existing problems, and see what could be done now that hadn’t been possible (or easy) before. That’s where Docker and container technology come in. One of the major drawbacks of bare metal in comparison to cloud systems has always been the relative inflexibility of available resources. You can expand such things as memory, storage, and the number of processors, but the hard limit will always be what is physically available to the system; if you want to go beyond that point, you will need to manually install new resources. If you’re deploying a large number of virtual machines, resource inflexibility can be a serious problem. VMs have relatively high overhead; they require hypervisors, and they need enough memory and storage to contain both a complete virtual machine and a full operating system. All of this requires processor time as well. In the cloud, with its large pool of resources, it isn’t difficult to quickly shift resources to meet rapidly changing demands as virtual machines are created and deleted. In a bare-metal system with hardware-dependent resources, this kind of resource allocation can quickly run up against the hard limits of the system. Docker-based deployment, however, can radically reduce the demands placed on the host system. Containers are built to be lean; they use the kernel of the host OS, and they include only those applications and utilities which must be available locally. If a virtual machine is a bulky box that contains the application being deployed, plus plenty of packing material, a container is a thin wrapper around the application. And Docker itself is designed to manage a large number of containers efficiently, with little overhead. On bare metal, the combination of Docker, a lean, dedicated host system such as CoreOS, and an open-source hardware management layer makes it possible to host a much higher number of containers than virtual machines. In many cases, this means that bare metal’s relative lack of flexibility with regard to resources is no longer a factor; if the number of containers that can be deployed using available resources is much greater than the anticipated short-to-medium-term demand, and if the hardware resources themselves are easily expandable, then the cloud really doesn’t offer much advantage in terms of resource flexibility. In effect, Docker moves bare metal from the “can’t use” category to “can use” when it comes to the kind of massive deployments of VMs and containers which are a standard part of the cloud environment. This is an important point — very often, it is this change from “can’t use” to “can use” that sets off revolutions in the way that technology is applied (most of the history of personal computers, for example, could be described in terms of “can’t use”/”can use” shifts), and that change is generally one in perception and understanding as much as it is a change in technology. In the case of Docker and bare metal, the shift to “can use” allows system managers and architects to take a close look at the positive advantages of bare metal in comparison to the cloud. Hardware-based solutions, for example, are often the preferred option in situations where access to dedicated resources is important. If consistent speed and reliable performance are important, bare metal may be the best choice. And the biggest surprises may come when designers start asking themselves, “What can we do with Docker on bare metal that we could do with anything before?” So, does Docker make bare metal relevant? Yes it does, and more than that, it makes bare metal into a new game, with new and potentially very interesting rules. About the Author @mazorstorn Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry, working on the prototype for the ground-breaking laser-disc game Dragon’s Lair. He spent much of the 90s in the high-pressure bundled software industry, where near-continuous release cycles and automated deployment were already de facto standards; during that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. Sources source:”IBM 700 logic module” by Autopilot – Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:IBM_700_logic_module.jpg#/media/File:IBM_700_logic_module.jpg
Until recently, deploying containers on Windows (or on Microsoft’s Azure cloud) meant deploying them on a Linux/UNIX VM managed by a Windows-based hypervisor. Microsoft is a member of the Open Container Initiative, and has been generally quite supportive of such solutions, but they necessarily required the added VM layer of abstraction, rather than being native Windows containers. If you wanted containers, and if you wanted Docker, at some point, you needed Linux or UNIX. Microsoft’s much anticipated Technical Preview of the Docker Engine for Windows Server is out. Let’s look at some of the differences between Docker on Linux. First, though, let’s look at the difference between a virtual machine and containerization? If you’re reading this, you probably know the answer in detail, but we’ll do a quick run-through anyway. Containerization From the Past A virtual machine (or VM) is a complete computer hardware layer (CPU, RAM, I/O, etc.) abstracted to software, and running as if it were a self-contained, independent machine within the host operating system. A VM is managed by a hypervisor application (such as VirtualBox or VMware) running on the host system. The VM itself has an operating system that is completely independent of the host system, so a virtual machine with a Linux OS can run on Windows, or vice versa. While a container may look like a virtual machine, it is a very different sort of device when you look under the hood. Like a VM, a container provides a self-contained, isolated environment in which to run a program. A container, however, uses many of the resources of the host system. At the same time, the applications within the container can’t see or act outside of the bounds of the container. Since the container uses so many of the host system’s basic resources, the container’s OS is essentially the same as the host OS. While VM hypervisors are available for most operating systems (allowing an individual instance of a VM to be essentially machine-independent), containers developed largely out of the Linux/UNIX world, and have been closely tied to Linux and UNIX systems. Docker has become the de facto standard for deploying and managing containers, and Docker itself is native to the Linux/UNIX world. Windows Server 2016 and the New Hyper-V Enter Windows Server Containers, and Windows Server 2016. Windows Server Containers are native Windows containers running on Windows Server 2016, and they come with a Windows implementation of Docker. Now, if you want container and you want Docker, you can have them directly on Windows. But what does this mean in practice? First and foremost, a Windows Server Container is exactly what the name implies — a Windows container. Just as a Linux-based container makes direct use of the resources of the underlying operating system and is dependent on it, a Windows Server Container uses Windows resources, and is dependent on Windows. This means that you can’t deploy a Windows Server Container directly on a Linux/UNIX system, any more than you can deploy a Linux container directly on a Windows system. Note: Initially system resources used in a Windows Server Container instance must exactly match those used by the host system in terms of version number, build, and patch; since Windows Server Containers are still in the technical preview stage, however, this may change.) Microsoft has added a bonus: along with Windows Server Containers, which are standard containers at heart, it is also offering a kind of hybrid container, which it calls Hyper-V. A Hyper-V container is more like a standard virtual machine, in that it has its own operating system kernel and memory space, but in other ways, it is dependent on system resources in the manner of a typical container. Microsoft says that the advantage of Hyper-V containers is that they have greater isolation from the host system, and thus more security, making them a better choice for situations where you do not have full control over what’s going to be going on within the containers. Hyper-V containers can be deployed and managed exactly like Windows Server Containers. How they Compare So, then, is Windows Docker really Docker? Yes, it is. Microsoft has taken considerable care to make sure that all of the Docker CLI commands are implemented in the Windows version; Windows Server Containers (and Hyper-V containers) can be managed either from the Docker command line or the PowerShell command line. Now for some of the “mostly, but not 100%” part: Windows Server Containers and Docker containers aren’t quite the same thing. You can use Docker to create a Windows Server Container from an existing Windows Server Container image, and you can manage the new container using Docker. You can also create and manage Windows Server Containers from PowerShell, but if you provision a container with PowerShell it cannot be managed directly with the Docker client/server and vice versa. You must stick with one provisioning method. (Microsoft, however, has indicated that this may change.) In many ways, these are the complications that you would expect when porting a conceptually rather complex and platform-dependent system from one OS to another. What’s more important is that you can now deploy containers using Docker directly on Windows. It may not be a threat to Linux, but it does keep Microsoft in the game at a time when that game has been shifting more and more towards open-source and generally Linux/UNIX-based DevOps tools. So to answer the question, “Are Windows Server Containers really Docker?” — they are as much Docker as you could reasonably expect, and then some. They are also definitely Windows containers, and Microsoft to the core. Docker Engine for Windows Server is not yet GA – If you’re currently running apps from Docker containers running on Linux, check out the Docker Log Analyzer from Sumo Logic.
Log analysis is a first-rate debugging tool for DevOps. But if all you’re using it for is finding and preventing trouble, you may be missing some of the major benefits of log analysis. What else can it offer you? Let’s talk about growth. First of all, not all trouble shows up in the form of bugs or error messages; an “error-free” system can still be operating far below optimal efficiency by a variety of important standards. What is the actual response time from the user’s point of view? Is the program eating up clock cycles with unnecessary operations? Log analysis can help you identify bottlenecks, even when they aren’t yet apparent in day-to-day operations. Use Cases for Log Analysis Consider, for example, something as basic as database access. As the number of records grows, access time can slow down, sometimes significantly; there’s nothing new about that. But if the complexity and the number of tables in the database are also increasing, those factors can also slow down retrieval. If the code that deals with the database is designed for maximum efficiency in all situations, it should handle the increased complexity with a minimum of trouble. The tricky part of that last sentence, however, is the phrase “in all situations”. In practice, most code is designed to be efficient under any conditions which seem reasonable at the time, rather than in perpetuity. A routine that performs an optional check on database records may not present any problem when the number of records is low, or when it only runs occasionally, but it may slow the system down if the number of affected records is too high, or if it is done too frequently. As conditions change, hidden inefficiencies in existing code are likely to make themselves known, particularly if the changes put greater demands on the system. As inefficiencies of this kind emerge (but before they present obvious problems in performance) they are likely to show up in the system’s logs. As an example, a gradual increase in the time required to open or close a group of records might appear, which gives you a chance to anticipate and prevent any slowdowns that they might cause. Log analysis can find other kinds of potential bottlenecks as well. For example, intermittent delays in response from a process or an external program can be hard to detect simply by watching overall performance, but they will probably show up in the log files. A single process with significant delays in response time can slow down the whole system. If two process are dependent on each other, and they each have intermittent delays, they can reduce the system’s speed to a crawl or even bring it to a halt. Log analysis should allow you to recognize these delays, as well as the dependencies which can amplify them. Log Data Analytics – Beyond Ops Software operation isn’t the only thing that can be made more efficient by log analysis. Consider the amount of time that is spent in meetings simply trying to get everybody on the same page when it comes to discussing technical issues. It’s far too easy to have a prolonged discussion of performance problems and potential solutions without the participants having a clear idea of the current state of the system. One of the easiest ways to bring such a meeting into focus and shorten discussion time is to provide everybody involved with a digest of key items from the logs, showing the current state of the system and highlighting problem areas. Log analysis can also be a major aid to overall planning by providing detailed picture of how the system actually performs. It can help you map out the parts of the system are the most sensitive to changes in the performance in other areas, allowing you to avoid making alterations which are likely to degrade performance. It can also reveal unanticipated dependencies, as well as suggesting potential shortcuts in the flow of data. Understanding Scalability via Log Analysis One of the most important things that log analysis can do in terms of growth is to help you understand how the system is likely to perform as it scales up. When you know the time required to perform a particular operation on 100,000 records, you can roughly calculate the time required to do the same operation with 10,000,000 records. This in turn allows you to consider whether the code that performs the operation will be adequate at a larger scale, or whether you will need to look at a new strategy for producing the same results. Observability and Baseline Metrics A log analysis system that lets you establish a baseline and observe changes to metrics in relation to that baseline is of course extremely valuable for troubleshooting, but it can also be a major aid to growth. Rapid notification of changes in metrics gives you a real-time window into the way that the system responds to new conditions, and it allows you to detect potential sensitivities which might otherwise go unnoticed. In a similar vein, a system with superior anomaly detection features will make it much easier to pinpoint potential bottlenecks and delayed-response cascades by alerting you to the kinds of unusual events which are often signatures of such problems. All of these things — detecting bottlenecks and intermittent delays, as well as other anomalies which may signal future trouble, anticipating changes in performance as a result of changes in scale, recognizing inefficiencies — will help you turn your software (and your organization) into the kind of lean, clean system which is so often necessary for growth. And all of these things can, surprisingly enough, come from something as simple as good, intelligent, thoughtful log analysis.
Once again, DevOps is moving the needle. This time, it’s in the open source world, and both open source and commercial software may never be the same again.As more and more open source projects have become not only successful, but vital to the individual and organizations that use them, the open source world has begun to receive some (occasionally grudging) respect from commercial software developers. And as commercial developers have become increasingly dependent on open source tools, they have begun to take the open source process itself more seriously.Today some large corporations have begun to actively participate in the open source world, not merely as users, but as developers of open source projects. SAP, for example, has a variety of projects on github, and Capital One has just launched its open source Hygeia project, also on github. Why would large, corporate, and commercial software developers place their code in open source repositories? Needless to say, they’re making only a limited number of projects available as open source, but for companies used to treating their proprietary source code as a valuable asset that needs to be closely guarded, even limited open source exposure is a remarkable concession. It’s reasonable to assume that they see significant value in the move. What kind of payoff are they looking for? Hiring. The open source community is one of the largest and most accessible pools of programming talent in software history, and a high percentage of the most capable participants are underemployed by traditional standards. Posting an attractive-looking set of open source projects is an easy way to lure new recruits. It allows potential employees to get their feet wet without making any commitments (on either end), and it says, “Hey, we’re a casual, relaxed open source company — working for us will be just like what you’re doing now, only you’ll be making more money!” Recognition. It’s an easy way to recognize employees who have made a contribution — post a (non-essential) project that they’ve worked on, giving them credit for their work. It’s cheaper than a bonus (or even a trophy), and the recognition that employees receive is considerably more public and lasting than a corporate award ceremony. Development of open source as a resource. Large corporations are already major users and sponsors of open-source software, often with direct involvement at the foundation level. By entering the open source world as active contributors, they are positioning themselves to exert even greater influence on its course of development by engaging the open source community on the ground. Behind the direct move into open source is also the recognition that the basic model of the software industry has largely shifted from selling products, which by their nature are at least somewhat proprietary, to selling services, where the unique value of what is being sold depends much more on the specific combination of services being provided, along with the interactions between the vendor and the customer. The backend code at a website can all be generic; the “brand” — the combination of look-and-feel, services provided, name recognition, and trademarks — is the only thing that really needs to be proprietary. And even when other providers manage to successfully clone a brand, they may come up short, as Facebook’s would-be competitors have discovered. Facebook is an instructive example, because (even though its backend code is unique and largely proprietary) the unique service which it provides, and which gives it its value, is the community of users — something that by its nature isn’t proprietary. In the service model, the uniqueness of tools becomes less and less important. In a world where all services used the same basic set of tools, individual service providers could and would still distinguish themselves based on the combinations of services that they offered and the intangibles associated with those services. This doesn’t mean that the source code for SAP’s core applications is about to become worthless, of course. Its value is intimately tied to SAP’s brand, its reputation, its services, and perhaps more than anything, to the accumulated expertise, knowledge, and experience which SAP possesses at the organizational level. As with Facebook, it would be much easier to clone any of SAP’s applications than it would be to clone these intangibles. But the shift to services does mean that for large corporate developers like SAP, placing the code for new projects (particularly for auxiliary software not closely tied to their core applications) in an open source repository may be a reasonable option. The boundary between proprietary and open source software is no longer the boundary between worlds, or between commercial developers and open source foundations. It is now more of a thin line between proprietary and open source applications (or components, or even code snippets) on an individual basis, and very possibly operating within the same environment. For current and future software developers, this does present a challenge, but one which is manageable: to recast themselves not as creators of unique source code, or even as developers of unique applications, but rather as providers of unique packages of applications and services. These packages may include both proprietary and open source elements, but their value will lie in what they offer the user as a package much more than it lies in the intellectual property rights status of the components. This kind of packaging has always been smart business, and the most successful software vendors have always made good use of it. We are rapidly entering a time when it may be the only way to do business.