Sign up for a live Kubernetes or DevSecOps demo

Click here
Michael Churchman

Michael Churchman

Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. He is a regular Fixate.io contributor.

Posts by Michael Churchman

Blog

The History of Monitoring Tools

Blog

DevSecOps 2.0

Blog

Log Aggregation vs. APM: No, They’re Not the Same Thing

Are you a bit unsure about the difference between log aggregation and Application Performance Monitoring (APM)? If so, you’re hardly alone. These are closely related types of operations, and it can be easy to conflate them—or assume that if you are doing one of them, there’s no reason to do the other. In this post, we’ll take a look at log aggregation vs APM, and the relationship between these two data accumulation/analysis domains, and why it is important to address both of them with a suite of domain-appropriate tools, rather than a single tool. Defining APM First, let’s look at Application Performance Monitoring, or APM. Note that APM can stand for both Application Performance Monitoring and Application Performance Management, and in most of the important ways, these terms really refer to the same thing—monitoring and managing the performance of software under real-world conditions, with emphasis on the user experience, and the functional purpose of the software. Since we’ll be talking mostly about the monitoring side of APM, we’ll treat the acronym as being interchangeable with Application Performance Monitoring, but with the implicit understanding that it includes the performance management functions associated with APM. What does APM monitor, and what does it manage? Most of the elements of APM fall into two key areas: user experience, and resource-related performance. While these two areas interact (resource use, for example, can have a strong effect on user experience), there are significant differences in the ways in which they are monitored (and to a lesser degree, managed): APM: User Experience The most basic way to monitor application performance in terms of user experience is to monitor response time. How long does it take after a user clicks on an application input element for the program to display a response? And more to the point, how long does it take before the program produces a complete response (i.e., a full database record displayed in the correct format, rather than a partial record or a spinning cursor)? Load is Important Response time, however, is highly dependent on load—the conditions under which the application operates, and in particular, the volume of user requests and other transactions, as well as the demand placed on resources used by the application. To be accurate and complete, user experience APM should include in-depth monitoring and reporting of response time and related metrics under expected load, under peak load (including unreasonably high peaks, since unreasonable conditions and events are rather alarmingly common on the Internet), and under continuous high load (an important but all too often neglected element of performance monitoring and stress testing). Much of the peak-level and continuous high-level load monitoring, of course, will need to be done under test conditions, since it requires application of the appropriate load, but it can also be incorporated into real-time monitoring by means of reasonably sophisticated analytics: report performance (and load) when load peaks above a specified level, or when it remains above a specified level for a given minimum period of time. APM: Resource Use Resource-based performance monitoring is the other key element of APM. How is the application using resources such as CPU, memory, storage, and I/O? When analyzing these metrics, the important numbers to look at are generally percentage of the resource used, and percentage still available. This actually falls within the realm of metrics monitoring more than APM, and requires tools dedicated to metrics monitoring. If percent used for any resource (such as compute, storage or memory usage) approaches the total available, that can (and generally should) be taken as an indication of a potential performance bottleneck. It may then become necessary to allocate a greater share of the resource in question (either on an ongoing basis, or under specified conditions) in order to avoid such bottlenecks. Remember: bottlenecks don’t just slow down the affected processes. They may also bring all actions dependent on those processes to a halt. Once Again, Load Resource use, like response time, should be monitored and analyzed not only under normal expected load, but also under peak and continuous high loads. Continuous high loads in particular are useful for identifying potential bottlenecks which might not otherwise be detected. Log Aggregation It should be obvious from the description of APM that it can make good use of logs, since the various logs associated with the deployment of a typical Internet-based application provide a considerable amount of performance-related data. Much of the monitoring that goes into APM, however, is not necessarily log-based, and many of the key functions which logs perform are distinct from those required by APM. Logs as Historical Records Logs form an ongoing record of the actions and state of the application, its components, and its environment; in many ways, they serve as a historical record for an application. As we indicated, much of this data is at least somewhat relevant to performance (load level records, for example), but much of it is focused on areas not closely connected with performance: Logs, for example, are indispensable when it comes to analyzing and tracing many security problems, including attempted break-ins. Log analysis can detect suspicious patterns of user activity, as well as unusual actions on the part of system or application resources. Logs are a key element in maintaining compliance records for applications operating in a regulated environment. They can also be important in identifying details of specific transactions and other events when they require verification, or are in dispute. Logs can be very important in tracing the history and development of functional problems, both at the application and infrastructure level—as well as in analyzing changes in the volume or nature of user activity over time. APM tools can also provide historical visibility into your environment, but they do it in a different way and at a different level. They trace performance issues to specific lines of code. This is a different kind of visibility and is not a substitute for the insight you gain from using log aggregation with historical data in order to research or analyze issues after they have occurred. The Need for Log Aggregation The two greatest problems associated with logs are the volume of data generated by logging, and the often very large number of different logs generated by the application and its associated resources and infrastructure components. Log aggregation is the process of automatically gathering logs from disparate sources and storing them in a central location. It is generally used in combination with other log management tools, as well as log-based analytics. It should be clear at this point that APM and log aggregation are not only different—It also does not make sense for a single tool to handle both tasks. It is, in fact, asking far too much of any one tool to take care of all of the key tasks required by either domain. Each of them requires a full suite of tools, including monitoring, analytics, a flexible dashboard system, and a full-featured API. A suite of tools that can fully serve both domains, such as that offered by Sumo Logic, can, on the other hand, provide you with the full stack visibility and search capability into your network, infrastructure and application logs.

July 27, 2017

Blog

AWS Config: Monitoring Resource Configurations for Compliance

AWS Config is an indispensable service with a bit of an identity problem: It really should be called something like “AWS Monitors Everything And Keeps Your Apps In Compliance,” because it is that important. But since there’s no way to put everything it does in a short, snappy name, “AWS Config” will do. What does AWS Config do? Basically, it monitors the current and past configurations of your AWS resources, compares those configurations to your target configurations, and reports current configurations, changes to configurations, and the ways in which your resources interact (with reference to configuration). Let’s take a closer look at what that means and how it works, starting with the “how it works” part… How AWS Config Works AWS Config continually monitors the configuration of your AWS resources. It records configuration changes in a normalized format, and makes that information available through its API. It also compares current configurations with configuration standards you have established, and makes that information available in dashboard format via its API. AWS Config can also be optionally set to send text alerts regarding both configuration changes and its evaluation of existing configurations vs. your configuration standards. By default, AWS Config tracks the configuration of all of your resources, recording configurations, metadata, attributes, and associated relationships and events. You can, however, tell it to track only specific types of resources. It takes snapshots of resource configurations, and it records an ongoing stream of resource configuration changes, storing this data in configuration histories. These histories can include software (down to the application level), providing you with a comprehensive record of your AWS operation’s configuration. Configuration standards are contained in rules. You can use Amazon’s preconfigured set of rules (which may be fully adequate for many operations), customize those rules, or define your own set of rules. In all cases, AWS Config checks configurations against these rules, and reports the current state of compliance with them. What AWS Config Means to You What does this mean for your organization’s AWS operations? Monitoring is vital to any Internet or network-based application or service, of course. Without it, you cannot guarantee the functionality or security of your software. Configuration monitoring has a special role, since it provides direct insight into an application’s state, its relationship with its environment, and the rules and conditions under which it is currently operating. Most kinds of software monitoring are symptomatic, recording behavior in one form or another, whether it is I/O, CPU or memory use, calls to other modules or system resources, or error messages. This makes it possible to detect many types of trouble and track performance, but it generally does not directly indicate the cause of most functional or performance problems. Configuration monitoring, on the other hand, can give you a direct view into the possible causes of such problems. How does this work? Since AWS Config allows you to codify configuration rules, let’s start with compliance. Regulatory Compliance Many of the online services available today are in regulated industries. This is true of banking and other financial services, of course, but it also applies to such things as health services, insurance, and public utilities. In many cases, failure to comply with regulatory standards for online services can result in significant financial or even legal penalties. These standards (particularly those affecting confidentiality and data security) can and often are reflected in configuration settings. If, for example, you provide online financial services, you may be required to provide a high level of security for both customer and transaction records, to maintain secure records of all activity, and to detect and record anomalous actions. At least some of these requirements may in turn require you to maintain specific configuration settings. If you include the required settings in your customized AWS Config rules, you will have a way to automatically determine whether your site’s configuration has gone out of compliance. You can set AWS Config to automatically send a text alert to the engineers and managers responsible for compliance, so that they can quickly investigate the problem and adjust the configuration to bring your services back into compliance. In-House Standards Even if you do not operate in a regulated industry, you may need to comply with in-house standards within your company, particularly when it comes to things such as security and performance, both of which can require you to maintain specific configuration settings. AWS Config can automatically notify you of any configuration changes which may have an effect on security or performance, so that you remain fully compliant with your company’s standards. Error and Performance Troubleshooting The configuration histories that AWS Config records can also be very valuable in tracing both errors and performance problems. You can look back through the historical record to find out when specific configuration changes took place, and try to correlate them with software failures or performance degradation. AWS Config and Sumo As is often the case with monitoring data, the output from AWS Config becomes considerably more valuable when it is integrated into a comprehensive, analytics-based dashboard system. The Sumo Logic App for AWS Config provides easy integration of AWS Config data into Sumo’s extensive analytics and dashboard system. It gives you not only a powerful overview, but also a detailed look at resource modifications, as well as drill-down insight into resource details. Analytics-based features such as these, which turn AWS Config’s raw data into genuine, multidimensional insights, make it possible to use such data for real-time configuration and performance management, security monitoring, and application optimization. Monitoring configuration data gives you greater hands-on control over security, performance, and functionality, and it provides you with insights which are simply not available with conventional, behavior-based application monitoring by itself. By combining AWS Config and the power of Sumo Logic’s analytics, you can turn your team into genuine software-management superheroes. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. AWS Config: Monitoring Resource Configurations for Compliance is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Top Patterns for Building a Successful Microservices Architecture

Why do you need patterns for building a successful microservices architecture? Shouldn’t the same basic principles apply, whether you’re designing software for a monolithic or microservices architecture? Those principles do largely hold true at the highest and most abstract levels of design (i.e., the systems level), and at the lowest and most concrete levels (such as classes and functions). But most code design is really concerned with the broad range between those two extremes, and it is there that the very nature of microservices architecture requires not only new patterns for design, but also new patterns for reimagining existing monolithic applications. The truth is that there is nothing in monolithic architecture that inherently imposes either structure or discipline in design. Almost all programming languages currently in use are designed to enforce structure and discipline at the level of coding, of course, but at higher levels, good design still requires conscious adherence to methodologies that enforce a set of architectural best practices. Microservices architecture, on the other hand, does impose by its very nature a very definite kind of structural discipline at the level of individual resources. Just as it makes no sense to cut a basic microservice into arbitrary chunks, and separate them, it makes equally little sense to bundle an individual service with another related or unrelated service in an arbitrary package, when the level of packaging that you’re working with is typically one package per container. Microservices Architecture Requires New Patterns In other words, you really do need new patterns in order to successfully design microservices architecture. The need for patterns starts at the top. If you are refactoring a monolithic program into a microservices-based application, the first pattern that you need to consider is the one that you will use for decomposition. What pattern will you use as a guide in breaking the program down into microservices? What are the basic decomposition patterns? At the higher levels of decomposition, it makes sense to consider such functional criteria as broad areas of task-based responsibility (subdomains), or large-scale business/revenue-generating responsibilities (business capabilities). In practice, there is considerable overlap between these two general functional patterns, since a business’ internal large-scale organization of tasks is likely to closely match the organization of business responsibilities. In either case, decomposition at this level should follow the actual corporate-level breakdown of basic business activities, such as inventory, delivery, sales, order processing, etc. In the subsequent stages of decomposition, you can define groups of microservices, and ultimately individual microservices. This calls for a different and much more fine-grained pattern of decomposition—one which is based largely on interactions within the application, with individual users, or both. Decomposition Patterns for Microservices Architecture There are several ways to decompose applications at this level, depending in part on the nature of the application, as well as the pattern for deployment. You can combine decomposition patterns, and in many if not most cases, this will be the most practical and natural approach. Among the key microservice-level decomposition patterns are: Decomposition by Use Case In many respects, this pattern is the logical continuation of a large-scale decomposition pattern, since business capabilities and subdomains are both fundamentally use case-based. In this pattern, you first identify use cases: sequences of actions which a user would typically follow in order to perform a task. Note that a user (or actor) does not need to be a person; it can, in fact, be another part of the same application. A use case could be something as obvious and common as filling out an online form or retrieving and displaying a database record. It could also include tasks such as processing and saving streaming data from a real-time input device, or polling multiple devices to synchronize data. If it seems fairly natural to model a process as a unified set of interactions between actors with an identifiable purpose, it is probably a good candidate for the use case decomposition pattern. Decomposition by Resources In this pattern, you define microservices based on the resources (storage, peripherals, databases, etc.) that they access or control. This allows you to create a set of microservices which function as channels for access to individual resources (following the basic pattern of OS-based peripheral/resource drivers), so that resource-access code does not need to be duplicated in other parts of the application. Isolating resource interfaces in specific microservices has the added advantage of allowing you to accommodate changes to a resource by updating only the microservice that accesses it directly. Decomposition by Responsibilities/Functions This pattern is likely to be most useful in the case of internal operations which perform a clearly defined set of functions that are likely to be shared by more than one part of the application. Such responsibility domains might include shopping cart checkout, inventory access, or credit authorization. Other microservices could be defined in terms of relatively simple functions (as is the case with many built-in OS-based microservices) rather than more complex domains. Microservices Architecture Deployment Patterns Beyond decomposition, there are other patterns of considerable importance in building a microservices-based architecture. Among the key patterns are those for deployment. There are three underlying patterns for microservices deployment, along with a few variations: Single Host/Multiple Services In this pattern, you deploy multiple instances of a service on a single host. This reduces deployment overhead, and allows greater efficiency through the use of shared resources. It has, however, greater potential for conflict, and security problems, since services interacting with different clients may be insufficiently isolated from each other. Single Service per Host, Virtual Machine, or Container This pattern deploys each service in its own environment. Typically, this environment will be a virtual machine (VM) or container, although there are times when the host may be defined at a less abstract level. This kind of deployment provides a high degree of flexibility, with little potential for conflict over system resources. Services are either entirely isolated from those used by other clients (as is the case with single-service-per-VM deployment), or can be effectively isolated while sharing some lower-level system resources (i.e., containers with appropriate security features). Deployment overhead may be greater than in the single host/multiple services model, but in practice, this may not represent significant cost in time or resources. Serverless/Abstracted Platform In this pattern, the service runs directly on pre-configured infrastructure made available as a service (which may be priced on a per-request basis); deployment may consist of little more than uploading the code, with a small number of configuration settings on your part. The deployment system places the code in a container or VM, which it manages. All you need to make use of the microservice is its address. Among the most common serverless environments are AWS Lambda, Azure Functions, and Google Cloud Functions. Serverless deployment requires very little overhead. It does, however, impose significant limitations, since the uploaded code must be able to meet the (often strict) requirements of the underlying infrastructure. This means that you may have a limited selection of programming languages and interfaces to outside resources. Serverless deployment also typically rules out stateful services. Applying Other Patterns to Microservices Architecture There are a variety of other patterns which apply to one degree or another to microservices deployment. These include patterns for communicating with external applications and services, for managing data, for logging, for testing, and for security. In many cases, these patterns are similar for both monolithic and microservices architecture, although some patterns are more likely to be applicable to microservices than others. Fully automated parallel testing in a virtualized environment, for example, is typically the most appropriate pattern for testing VM/container-based microservices. As is so often the case in software development (as well as more traditional forms of engineering), the key to building a successful microservices architecture lies in finding the patterns that are most suitable to your application, understanding how they work, and adapting them to the particular circumstances of your deployment. Use of the appropriate patterns can provide you with a clear and accurate roadmap to successful microservices architecture refactoring and deployment. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. Top Patterns for Building a Successful Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Making the Most of AWS Lambda Logs

How can you get the most out of monitoring your AWS Lambda functions? In this post, we’ll take a look at the monitoring and logging data that Lambda makes available, and the value that it can bring to your AWS operations. You may be thinking, “Why should I even monitor AWS Lambda? Doesn’t AWS take care of all of the system and housekeeping stuff with Lambda? I thought that all the user had to do was write some code and run it!” A Look at AWS Lambda If that is what you’re thinking, then for the most part, you’re right. AWS Lambda is designed to be a simple plug-and-play experience from the user’s point of view. Its function is simply to run user-supplied code on request in a standardized environment. You write the code, specifying some basic configuration parameters, and upload the code, the configuration information, and any necessary dependencies to AWS Lambda. This uploaded package is called a Lambda function. To run the function, you invoke it from an application running somewhere in the AWS ecosystem (EC2, S3, or most other AWS services). When Lambda receives the invoke request, it runs your function in a container; the container pops into existence, does its job, and pops back out of existence. Lambda manages the containers—You don’t need to (and can’t) do anything with them. So there it is—Lambda. It’s simple, it’s neat, it’s clean, and it does have some metrics which can be monitored, and which are worth monitoring. Which Lambda Metrics to Monitor? So, which Lambda metrics are important, and why would you monitor them? There are two kinds of monitoring information which AWS Lambda provides: metrics displayed in the AWS CloudWatch console, and logging data, which is handled by both CloudWatch and the CloudTrail monitoring service. Both types of data are valuable to the user—the nature of that value and the best way to make use of it depend largely on the type of data. Monitoring Lambda CloudWatch Console Metrics Because AWS Lambda is strictly a standardized platform for running user-created code, the metrics that it displays in the CloudWatch console are largely concerned with the state of that code. These metrics include the number of invocation requests that a function receives, the number of failures resulting from errors in the function, the number of failures in user-configured error handling, the function’s duration, or running time, and the number of invocations that were throttled as a result of the user’s concurrency limits. These are useful metrics, and they can tell you a considerable amount about how well the code is working, how well the invocations work, and how the code operates within its environment. They are, however, largely useful in terms of functionality, debugging, and day-to-day (or millisecond-to-millisecond) operations. Monitoring and Analyzing AWS Lambda Logs With AWS Lambda, logging data is actually a much richer source of information in many ways. This is because logging provides a cumulative record of actions over time, including all API calls made in connection with AWS Lambda. Since Lambda functions exist for the most part to provide support for applications and websites running on other AWS services, Lambda log data is the main source of data about how a function is doing its job. “Logs,” you say, like Indiana Jones surrounded by hissing cobras. “Why does it always have to be logs? Digging through logs isn’t just un-fun, boring, and time-consuming. More often than not, it’s counter-productive, or just plain impractical!” And once again, you’re right. There isn’t much point in attempting to manually analyze AWS Lambda logs. in fact, you have three basic choices: either ignore the logs, write your own script for extracting and analyzing log data, or let a monitoring and analytics service do the work for you. For the majority of AWS Lambda users, the third option is by far the most practical and the most useful. Sumo Logic’s Log Analytics Dashboards for Lambda To get a clearer picture of what can be done with AWS Lambda metrics and logging data, let’s take a look at how the Sumo Logic App for AWS Lambda extracts useful information from the raw data, and how it organizes that data and presents it to the user. On the AWS side, you can use a Lambda function to collect CloudWatch logs and route them to Sumo Logic. Sumo integrates accumulated log and metric information to present a comprehensive picture of your AWS Lambda function’s behavior, condition, and use over time, using three standard dashboards: The Lambda Overview Dashboard The Overview dashboard provides a graphic representation of each function’s duration, maximum memory usage, compute usage, and errors. This allows you to quickly see how individual functions perform in comparison with each other. The Overview dashboard also breaks duration, memory, and compute usage down over time, making it possible to correlate Lambda function activity with other AWS-based operations, and it compares the actual values for all three metrics with their predicted values over time. This last set of values (actual vs. predicted) can help you pinpoint performance bottlenecks and allocate system resources more efficiently. The Lambda Duration and Memory Dashboard Sumo Logic’s AWS Lambda Duration and Memory dashboard displays duration and maximum memory use for all functions over a 24-hour period in the form of both outlier and trend charts. The Billed Duration by Hour trend chart compares actual billed duration with predicted duration on an hourly basis. In a similar manner, the Unused Memory trend chart shows used, unused, and predicted unused memory size, along with available memory. These charts, along with the Max Memory Used box plot chart, can be very useful in determining when and how to balance function invocations and avoid excessive memory over- or underuse. The Lambda Usage Dashboard The Usage dashboard breaks down requests, duration, and memory usage by function, along with requests by version alias. It includes actual request counts broken down by function and version alias. The Usage dashboard also includes detailed information on each function, including individual request ID, duration, billing, memory, and time information for each request. The breakdown into individual requests makes it easy to identify and examine specific instances of a function’s invocation, in order to analyze what is happening with that function on a case-by-case level. It is integrated, dashboard-based analytics such as those presented by the Sumo Logic App for AWS Lambda that make it not only possible but easy to extract useful data from Lambda, and truly make the most of AWS Lambda monitoring. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

5 Log Monitoring Moves to Wow Your Business Partner

Looking for some logging moves that will impress your business partner? In this post, we’ll show you a few. But first, a note of caution: If you’re going to wow your business partner, make a visiting venture capitalist’s jaw drop, or knock the socks off of a few stockholders, you could always accomplish that with something that has a lot of flash, and not much more than that, or you could show them something that has real and lasting substance, and will make a difference in your company’s bottom line. We’ve all seen business presentations filled with flashy fireworks, and we’ve all seen how quickly those fireworks fade away. Around here, though, we believe in delivering value—the kind that stays with your organization, and gives it a solid foundation for growth. So, while the logging moves that we’re going to show you do look good, the important thing to keep in mind is that they provide genuine, substantial value—and discerning business partners and investors (the kind that you want to have in your corner) will recognize this value quickly. Why Is Log Monitoring Useful? What value should logs provide? Is it enough just to accumulate information so that IT staff can pick through it as required? That’s what most logs do, varying mostly in the amount of information and the level of detail. And most logs, taken as raw data, are very difficult to read and interpret; the most noticeable result of working with raw log data, in fact, is the demand that it puts on IT staff time. 5 Log Monitoring Steps to Success Most of the value in logs is delivered by means of systems for organizing, managing, filtering, analyzing, and presenting log data. And needless to say, the best, most impressive, most valuable logging moves are those which are made possible by first-rate log management. They include: Quick, on-the-spot, easy-to-understand analytics. Pulling up instant, high-quality analytics may be the most impressive move that you can make when it comes to logging, and it is definitely one of the most valuable features that you should look for in any log management system. Raw log data is a gold mine, but you need to know how to extract and refine the gold. A high-quality analytics system will extract the data that’s valuable to you, based on your needs and interests, and present it in ways that make sense. It will also allow you to quickly recognize and understand the information that you’re looking for. Monitoring real-time data. While analysis of cumulative log data is extremely useful, there are also plenty of situations where you need to see what is going on right at the moment. Many of the processes that you most need to monitor (including customer interaction, system load, resource use, and hostile intrusion/attack) are rapid and transient, and there is no substitute for a real-time view into such events. Real-time monitoring should be accompanied by the capacity for real-time analytics. You need to be able to both see and understand events as they happen. Fully integrated logging and analytics. There may be processes in software development and operations which have a natural tendency to produce integrated output, but logging isn’t one of them. Each service or application can produce its own log, in its own format, based on its own standards, without reference to the content or format of the logs created by any other process. One of the most important and basic functions that any log management system can perform is log integration, bringing together not just standard log files, but also event-driven and real-time data. Want to really impress partners and investors? Bring up log data that comes from every part of your operation, and that is fully integrated into useful, easily-understood output. Drill-down to key data. Statistics and aggregate data are important; they give you an overall picture of how the system is operating, along with general, system-level warnings of potential trouble. But the ability to drill down to more specific levels of data—geographic regions, servers, individual accounts, specific services and processes —is what allows you to make use of much of that system-wide data. It’s one thing to see that your servers are experiencing an unusually high level of activity, and quite another to drill down and see an unusual spike in transactions centered around a group of servers in a region known for high levels of online credit card fraud. Needless to say, integrated logging and scalability are essential when it comes to drill-down capability. Logging throughout the application lifecycle. Logging integration includes integration across time, as well as across platforms. This means combining development, testing, and deployment logs with metrics and other performance-related data to provide a clear, unified, in-depth picture of the application’s entire lifecycle. This in turn makes it possible to look at development, operational, and performance-related issues in context, and see relationships which might not be visible without such cross-system, full lifecycle integration. Use Log Monitoring to Go for the Gold So there you have it—five genuine, knock-’em-dead logging moves. They’ll look very impressive in a business presentation, and they’ll tell serious, knowledgeable investors that you understand and care about substance, and not just flash. More to the point, these are logging capabilities and strategies which will provide you with valuable (and often crucial) information about the development, deployment, and ongoing operation of your software. Logs do not need to be junkpiles of unsorted, raw data. Bring first-rate management and analytics to your logs now, and turn those junk-piles into gold. 5 Log Monitoring Moves to Wow Your Business Partner is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Technical Debt and its Impact on DevOps

“DevOps and technical debt? But isn’t DevOps the cure for technical debt?” Well, yes and no, but largely no. It’s important to understand the ways in which DevOps is not the cure for technical debt — and particularly the ways in which DevOps may produce its own, characteristic forms of technical debt. First, a very quick refresher: In software development, technical debt is code that is substandard (in design, execution, or both), and which may cause problems in development, debugging, or operation. It may not be buggy (and often isn’t), but it’s poor design and/or execution makes it more likely that it will eventually result in some bugs. More than anything, it can increase the time, effort, and cost of development and release. It’s important to note that technical debt usually isn’t just a case of bad coding by bad programmers. Very often, it represents a tradeoff: a quick-and-dirty implementation of a new feature or bug fix in order to deliver the new release on time. If delivery on time sells enough software, the eventual cost of the quick-and-dirty code may be worth it. But there will be an eventual cost, and that cost is technical debt. DevOps (in the context of CI and CD) is often pitched as the solution to the technical debt problem, and to some degree, that claim is accurate. Manual release and manual testing lend themselves far too easily to technical-debt workarounds: servers and test machines with one-of-a-kind, handcrafted environment settings, politely ignoring (or turning off) error messages, and special handling of special cases. The process of automating these tasks forces the DevOps implementation team to confront such workarounds, with a choice between incorporating them (and the technical debt that they represent) into the new, automated system, or eliminating them, along with the underlying debt. But what about technical debt that doesn’t actually interact with the DevOps pipeline, or that only affects developers, without any real impact on QA, deployment, or operations? A self-encapsulated blob of bad code could run through the entire pipeline without even being noticed — but it could still cause problems whenever a developer has to deal with it. The truth is that there isn’t any development methodology that can completely eliminate technical debt, if only because the causes of such debt often lie outside of the scope of such methodologies. If it is more cost-efficient based on market conditions for a software company to include third-rate-code in a new release, then it will ship third-rate-code, no matter what development methodology it uses. That methodology may play a role in determining whether the code is refactored sooner, or later, or not at all, but even then, if the cost of refactoring is greater than the cost of the technical debt, the inferior code may remain in place. And DevOps itself isn’t immune to technical debt, either in terms of generating its own debt, or of being affected by existing debt. If process is code in DevOps, then the code that constitutes the processes can include technical debt: scripts for testing and deployment, or any code that governs the pipeline. And DevOps, with its emphasis on rapid response and continuous release, lends itself all too easily to the kind of quick-and-dirty solutions that “we’ll fix later.” Technical debt embedded in pipeline code may be as difficult to address and refactor as technical debt in application code, and its effects may be even more pronounced, since it involves the delivery process itself, and not just the product. The open-source utilities on which DevOps often depends also carry their own technical debt, and too many DevOps teams are lax (if not outright careless) about keeping up with the current versions of those utilities. And typically, a DevOps implementation is not a clean slate. If an organization is moving from traditional methods of development and operations to DevOps, there are plenty of opportunities for existing practices to insert themselves into the mix — and those practices, along with the workarounds required to accommodate them, will all be technical debt. Not all of the debt that is embodied in the application code will be of the kind that can be easily eliminated — or that will travel through the pipeline without causing any problems. Any application interacts with something, and usually several things — users, databases, operating systems — and even in a virtualized or containerized environment, technical debt may affect those interactions. Automated testing isn’t really immune to technical debt, either; it may have to carry over some of the special-snowflake routines from manual testing, even against the better judgement of the DevOps team. When existing technical debt does get carried over into DevOps, it may be very difficult to tackle, in part because the mere fact that it has been carried over means that it is deeply entrenched in the product, the organization, or both — or that the people in charge of the DevOps transition lack the insight or the leverage required to keep it from being carried over (which can represent a serious problem in itself). By its nature, carried-over technical debt will also be deeply embedded in the DevOps process, because it will be incorporated into that process from the start, and all later changes to that process may be affected by that debt. It may be that DevOps’ real strength in relation to technical debt is its fundamental dynamism. DevOps assumes that nothing is static, that everything will change and should change, and it casts itself as the appropriate framework for change. This means that existing technical debt, whether in the application code or in the process, will (at least in theory) eventually be eliminated by the same dynamic process of self-renewal that is at the core of DevOps itself. But no DevOps team can afford to assume that it is free of technical debt. *Figure 1. Roelbob, Jan, 2016. DevOps.com. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry, working on the prototype for the ground-breaking laser-disc game Dragon’s Lair. He spent much of the 90s in the high-pressure bundled software industry, where near-continuous release cycles and automated deployment were already de facto standards; during that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Using Analytics to Support the Canary Release

When you roll out a new deployment, how do you roll? With a big bang? A blue/green deployment? Or do you prefer a Canary Release? There’s a lot to be said for the Canary Release strategy of testing new software releases on a limited subset of users. It reduces the risk of an embarrassing and potentially costly public failure of your application to a practical minimum. It allows you to test your new deployment in a real-world environment and under a real-world load. It allows a rapid (and generally painless) rollback. And if there’s a failure of genuinely catastrophic proportions, only a small subset of your users will even notice the problem. But when you use Canary Release, are you getting everything you can out of the process? A full-featured suite of analytics and monitoring tools is — or should be — an indispensable part of any Canary Release strategy. The Canary Release Pattern In a Canary Release, you initially release the new version of your software on a limited number of servers, and make it available to a small subset of your users. You monitor it for bugs and performance problems, and after you’ve taken care of those, you release it to all of your users. The strategy is named after the practice of taking canaries into coal mines to test the quality of the air; if the canary stopped singing (or died), it meant that the air was going bad. In this case, the “canary” is your initial subset of users; their exposure to your new release allows you detect and fix the bugs, so your general body of users won’t have to deal with them. Ideally, in a strategy such as this, you want to get as much useful information as possible out of your initial sample, so that you can detect not only the obvious errors and performance issues, but also problems which may not be so obvious, or which may be relatively slow to develop. This is where good analytic tools can make a difference. Using Analytics to Support a Canary Release In fact, the Canary Release strategy needs at least some analytics in order to work at all. Without any analytics, you would have to rely on extremely coarse-grained sources of information, such as end-user bug reports and obvious crashes at the server end, which are very likely to miss the problems that you actually need to find. Such problems, however, generally will show up in error logs and performance logs. Error statistics will tell you whether the number, type, and concentration (in time or space) of errors is out of the expected range. Even if they can’t identify the specific problem, such statistics can suggest the general direction in which the problem lies. And since error logs also contain records of individual errors, you can at least in theory pinpoint any errors which are likely to be the result of newly-introduced bugs, or of failed attempts to eliminate known bugs. The problem with identifying individual errors in the log is that any given error is likely to be a very small needle in a very large haystack. Analytics tools which incorporate intelligent searches and such features as pattern analysis and detection of unusual events allow you to identify likely signs of a significant error in seconds. Without such tools, the equivalent search might take hours, whether it uses brute force or carefully-crafted regex terms. Even being forced by necessity to do a line-by-line visual scan of an error log, however, is better than having no error log at all. Logs that monitor such things as performance, load, and load distribution can also be useful in the Canary Release strategy. Bugs which don’t produce clearly identifiable errors may show up in the form of performance degradation or excessive traffic. Design problems may also leave identifiable traces in performance logs; poor design can cause traffic jams, or lead to excessive demands on databases and other resources. You can enhance the value of your analytics, and of the Canary Release itself, if you put together an in-depth demographic profile of the user subset assigned to the release. The criteria which you use in choosing the subset, of course, depends on your needs and priorities, as well as the nature of the release. It may consist of in-house users, of a random selection from the general user base, or of users carefully chosen to represent either the general user base, or specific types of user. In any of these cases, however, it should be possible to assemble a profile of the users in the subset. If you know how the users in the subset make use of your software (which features they access most frequently, how often they use the major features, and at what times of day, how this use is reflected in server loads, etc.), and if you understand how these patterns of use compared to those of you general user base, the process of extrapolation from Canary Release analytics should be fairly straightforward, as long as you are using analytic tools which are capable of distilling out the information that you need. So yes, Canary Release can be one of the most rewarding deployment strategies — when you take full advantage of what it has to offer by making intelligent use of first-rate analytic tools. Then the canary will really sing! About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the 90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Snowflake Configurations and DevOps Automation

“Of course our pipeline is fully automated! Well, we have to do some manual configuration adjustments on a few of our bare metal servers after we run the install scripts, but you know what I mean…” We do know what you mean, but that is not full automation. Call it what it really is — partial automation in a snowflake environment. A snowflake configuration is ad-hoc and “unique” to the the environment at large. But in DevOps, you need to drop unique configurations, and focus on full-automation. What’s Wrong With Snowflake Configurations? In DevOps, a snowflake is a server that requires special configuration beyond that covered by automated deployment scripts. You do the automated deployment, and then you tweak the snowflake system by hand. For a long time (through the ‘90s, at least), snowflake configurations were the rule. Servers were bare metal, and any differences in hardware configuration or peripherals (as well as most differences in installed software) meant that any changes had to be handled on a system-by-system basis. Nobody even called them snowflakes. They were just normal servers. But what’s normal in one era can become an anachronism or an out-and-out roadblock in another era -— and nowhere is this more true than in the world of software development. A fully-automated, script-driven DevOps pipeline works best when the elements that make up the pipeline are uniform. A scripted deployment to a thousand identical servers may take less time and run more smoothly than deployment to half a dozen servers that require manual adjustments after the script has been run. For more on DevOps pipelines, see, “How to Build a Continuous Delivery Pipeline” › No Virtual Snowflakes … A virtual snowflake might be at home on a contemporary Christmas tree, but there’s no place for virtual snowflakes in DevOps. Cloud-based virtual environments are by their nature software-configurable; as long as the cloud insulates them from any interaction with the underlying hardware, there is no physical reason for a set of virtual servers running in the same cloud environment to be anything other than identical. Any differences should be based strictly on functional requirements — if there is no functional reason for any of the virtual servers to be different, they should be identical. Why is it important to maintain such a high degree of uniformity? In DevOps, all virtual machines (whether they’re full VMs or Docker) are containers in much the same way as steel shipping containers. When you ship something overseas, you’re only concerned with the container’s functional qualities. Uniform shipping containers are functionally useful because they have known characteristics, and they can be filled and stacked efficiently. This is equally true of even the most full-featured virtual machine when it is deployed as a DevOps container. This is all intrinsic to core DevOps philosophy. The container exists solely to deliver the software or services, and should be optimized for that purpose. When delivery to multiple virtual servers is automated and script-driven, optimization requires as much uniformity as possible in server configurations. For more on containers, see, “Kubernetes vs. Docker: What Does It Really Mean?” › What About Non-Virtual Snowflakes? If you only deal in virtual servers, it isn’t hard to impose the kind of standardization described above. But real life isn’t always that simple; you may find yourself working in an environment where some or all of the servers are bare metal. How do you handle a physical server with snowflake characteristics? Do you throw in the towel, and adjust it manually after each deployment, or are there ways to prevent a snowflake server from behaving like a snowflake? As it turns out, there are ways to de-snowflake a physical server — ways that are fully in keeping with core DevOps philosophy. First, however, consider this question: What makes a snowflake server a snowflake? Is it the mere fact that it requires special settings, or is it the need to make those adjustments outside of the automated deployment process (or in a way that interrupts the flow of that process)? A thoroughgoing DevOps purist might opt for the first definition, but in practical terms, the second definition is more than adequate. A snowflake is a snowflake because it must be treated as a snowflake. If it doesn’t require any special treatment, it’s not a snowflake. One way to eliminate the need for special treatment during deployment (as suggested by Daniel Lindner) is to install a virtual machine on the server, and deploy software on the virtual machine. The actual deployment would ignore the underlying hardware and interact only with the virtual system. The virtual machine would fully insulate the deployment from any of the server’s snowflake characteristics. What if it isn’t practical or desirable to add an extra virtual layer? It may still be possible to handle all of the server’s snowflake adjustments locally by means of scripts (or automated recipes, as Martin Fowler put it in his original Snowflake Server post), running on the target server itself. These local scripts would need to be able to recognize elements in the deployment which might require adjustments to snowflake configurations, then translate those requirements into local settings and apply them. If the elements that require local adjustments are available as part of the deployment data, the local scripts might intercept that data as the main deployment script runs. But if those elements are not obvious (if, for example, they are part of the compiled application code), it may be necessary to include a table of values which may require local adjustments as part of the deployment script (if not full de-snowflaking, at least a 99.99% de-snowflaking strategy). So, what is the bottom line on snowflake servers? In an ideal DevOps environment, they wouldn’t exist. In the less-than-ideal world where real-life DevOps takes place, they can’t always be eliminated, but you can still neutralize most or all of their snowflake characteristics to the point where they do not interfere with the pipeline. For more on virtual machines, see, “Docker Logs vs Virtual Machine Logs” › Next Up DevOps as a Service: Build Automation in the Cloud Learn about DevOps as a managed cloud service, the tools available, mutable vs immutable infrastructure, and more. The State of Modern Applications & DevSecOps in the Cloud Sumo Logic’s third annual report reveals how the world’s most cloud-savvy companies manage their modern applications. DevOps and Continuous Delivery Discover how Sumo Logic accelerates the CD pipeline with automated testing, integrated threat intelligence, and more.

Blog

Does Docker Deployment Make Bare Metal Relevant?

Is bare metal infrastructure relevant in a DevOps world? The cloud has reduced hardware to little more than a substrate for the pool of resources that is the cloud itself. Those resources are the important part; the hardware is little more than a formality. Or at least that’s been the standard cloud-vs-metal story, until recently. Times change, and everything that was old does eventually become new again — usually because of a combination of unmet needs, improved technology, and a fresh approach. And the bare-metal comeback is no different. Unmet Needs The cloud is a pool not just of generic, but also shared resources (processor speed, memory, storage space, bandwidth). Even if you pay a premium for a greater share of these things, you are still competing with the other premium-paying customers. And the hard truth is that cloud providers can’t guarantee a consistent, high level of performance. Cloud performance depends on the demand placed on it by other users — demand which you can’t control. If you need reliable performance, there is a good chance that you will not find it in the cloud. This is particularly true if you’re dealing with large databases; Big Data tends to be resource-hungry, and it is likely to do better on a platform with dedicated resources down to the bare-metal level, rather than in a cloud, where it may have to contend with dueling Hadoops. The cloud can present sticky compliance issues, as well. If you’re dealing with formal data-security standards, such as those set by the Securities and Exchange Commission or by overseas agencies, verification may be difficult in a cloud environment. Bare metal provides an environment with more clearly-defined, hardware-based boundaries and points of entry. Improved Technology Even if Moore’s Law has been slowing down to sniff the flowers lately, there have been significant improvements in hardware capabilities, such as increased storage capacity, and the availability of higher-capacity solid state drives, resulting in a major boost in key performance parameters. And technology isn’t just hardware — it’s also software and system architecture. Open-source initiatives for standardizing and improving the hardware interface layers, along with the highly scalable, low-overhead CoreOS, make lean, efficient bare metal provisioning and deployment a reality. And that means that it’s definitely time to look closely at what bare metal is now capable of doing, and what it can now do better than the cloud. A Fresh Approach As technology improves, it makes sense to take a new look at existing problems, and see what could be done now that hadn’t been possible (or easy) before. That’s where Docker and container technology come in. One of the major drawbacks of bare metal in comparison to cloud systems has always been the relative inflexibility of available resources. You can expand such things as memory, storage, and the number of processors, but the hard limit will always be what is physically available to the system; if you want to go beyond that point, you will need to manually install new resources. If you’re deploying a large number of virtual machines, resource inflexibility can be a serious problem. VMs have relatively high overhead; they require hypervisors, and they need enough memory and storage to contain both a complete virtual machine and a full operating system. All of this requires processor time as well. In the cloud, with its large pool of resources, it isn’t difficult to quickly shift resources to meet rapidly changing demands as virtual machines are created and deleted. In a bare-metal system with hardware-dependent resources, this kind of resource allocation can quickly run up against the hard limits of the system. Docker-based deployment, however, can radically reduce the demands placed on the host system. Containers are built to be lean; they use the kernel of the host OS, and they include only those applications and utilities which must be available locally. If a virtual machine is a bulky box that contains the application being deployed, plus plenty of packing material, a container is a thin wrapper around the application. And Docker itself is designed to manage a large number of containers efficiently, with little overhead. On bare metal, the combination of Docker, a lean, dedicated host system such as CoreOS, and an open-source hardware management layer makes it possible to host a much higher number of containers than virtual machines. In many cases, this means that bare metal’s relative lack of flexibility with regard to resources is no longer a factor; if the number of containers that can be deployed using available resources is much greater than the anticipated short-to-medium-term demand, and if the hardware resources themselves are easily expandable, then the cloud really doesn’t offer much advantage in terms of resource flexibility. In effect, Docker moves bare metal from the “can’t use” category to “can use” when it comes to the kind of massive deployments of VMs and containers which are a standard part of the cloud environment. This is an important point — very often, it is this change from “can’t use” to “can use” that sets off revolutions in the way that technology is applied (most of the history of personal computers, for example, could be described in terms of “can’t use”/”can use” shifts), and that change is generally one in perception and understanding as much as it is a change in technology. In the case of Docker and bare metal, the shift to “can use” allows system managers and architects to take a close look at the positive advantages of bare metal in comparison to the cloud. Hardware-based solutions, for example, are often the preferred option in situations where access to dedicated resources is important. If consistent speed and reliable performance are important, bare metal may be the best choice. And the biggest surprises may come when designers start asking themselves, “What can we do with Docker on bare metal that we could do with anything before?” So, does Docker make bare metal relevant? Yes it does, and more than that, it makes bare metal into a new game, with new and potentially very interesting rules. About the Author @mazorstorn Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry, working on the prototype for the ground-breaking laser-disc game Dragon’s Lair. He spent much of the 90s in the high-pressure bundled software industry, where near-continuous release cycles and automated deployment were already de facto standards; during that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues. Sources source:”IBM 700 logic module” by Autopilot – Own work. Licensed under CC BY-SA 3.0 via Commons – https://commons.wikimedia.org/wiki/File:IBM_700_logic_module.jpg#/media/File:IBM_700_logic_module.jpg

Blog

VPC Flow Logging in a DevOps Environment

Your website (or online application or service) isn’t just a website. It’s also the backend, plus the hosting medium (local hardware, cloud, etc.), plus the IP connection, plus search engine visibility, plus an open-ended (and growing) list of other factors. And one of the most basic factors is the flow of traffic. In many ways, it is the key factor in the dynamic environment in which any website or service lives. Without any traffic, you site doesn’t really exist in the online universe. Your site’s traffic defines its relationship with your user base, and with the broader internet user base in general. So understanding web application traffic flow is critical. Amazon’s Virtual Private Cloud Flow Logging VPC flow intends to help you with that. VPC Flow Logging is a service provided by Amazon which allows you to create separate logs of the flow of traffic for an entire VPC, for a specified subnet, or for a specific cloud-based network interface; when you log a VPC or a subnet, all of its subordinate network interfaces are included in the log. The logs themselves aren’t real-time; they are broken up into capture windows, each of which represents about ten minutes’ worth of activity. While this precludes real-time response to log events, it does allow you to respond (or set up automated responses) with a ten-minute time granularity. Within this limit, the log data provides an extremely valuable record of specific events, along with statistical information which can be used to analyze a variety of metrics. Amazon VPC itself is designed for near-maximum flexibility; you can use it to do, if not everything, at least a large enough subset of everything to meet the DevOps deployment needs of most organizations. This includes creating a precisely-controlled mix of private and public-facing resources. Each line in a VPC Flow Log represents a record of the network flow from a specific IP/port combination to a specific IP/port combination using a specific protocol during the capture window time. The record captures the source and destination IPs and ports, the protocol, the size of the data transferred in packets and bytes, the capture window start and end times, and whether the traffic was accepted or rejected according to the network or security group access control lists (ACLs). Each record also includes some basic housekeeping information, such as the IDs of the account and network ID interfaces, and the status of the log record itself (normal logging, no data during the capture window, data skipped during the window). VPC Flow Logs Impact on DevOps What does this mean for AWS DevOps? Thew logs are filterable, so, for example, you can create a log of nothing but records with the REJECTED status, allowing you to quickly examine and analyze all rejected connection requests. You can also create alarms, which will send notifications of specific types of events, such as a high rate of rejected requests within a specified time period, or unusual levels of data transfer. And you can use API commands to pull data from the logs for use with your own automation tools. The ability to create logs for specific subnets and interfaces makes it possible to monitor and control individual elements of your VPC independently. This is an important capability in a system which combines private and public-facing elements, particularly when the private and public-facing regions interface with each other. VPC Logging can, for example, help you diagnose security and access problems in such a mixed environment by indicating at what point in an interaction an unexpected ACCEPT or REJECT response occurred, and which direction it came from. You can also tailor automated responses based on the role of a specific subnet or interface, so, for instance, a high volume of data transfer, or repeated REJECTS on a private subnet would produce a different response than they would with a public-facing interface (or even with another private subnet), based on expected user behavior in relation to the specific subnet or interface. The biggest impact that VPC Flow Logging has on DevOps may come from the ability of third-party analytics applications to interface with the Flow Logs. An analytics suite with access to Flow Log data, for example, can map such things as ACCEPT/REJECT and dropped data statistics by region and time. It can also provide detailed insight into user traffic for specific elements of a site or specific services; transferring the raw Flow Log data to a visual analytics dashboard can make specific traffic patterns or anomalies stand out instantly. Although the data contained in individual Flow Log records may appear to be somewhat basic at first glance, an analytics suite can provide a sophisticated and detailed picture of your site’s traffic by correlating data within individual Flow Logs, with data from other Flow Logs for the same system, and with data from outside sources. When displayed on a visual analytics dashboard with drill-down capabilities, correlated log data can provide you with an in-depth picture of your site’s traffic at varying levels of resolution, and it can provide you with a detailed view of any anomalies, suspicious access attempts, or configuration errors. Sumo Logic App for Amazon VPC Flow Logs The Sumo Logic App for Amazon VPC Flow Logs leverages this data to provide real-time visibility and analysis of your environment. The app consists of predefined searches, and Live and Interactive Dashboards that give you real-time visibility and data visualization on where network traffic is coming from and going to, actions that were taken such as accepted or rejected as well as data volumes (number of packets and byte counts). You can also share relevant dashboards and alerts with your security teams to streamline operational, security and auditing processes. Specifically, the Sumo Logic application for Amazon VPC Flow Logs allows you to: Understand where there is latency and failures in your network. Monitor trending behaviors and traffic patterns over time. Generate alarms for observed anomalies & outliers within the network traffic such as source/destination IP address, number of packets accepted/rejected and byte count. It’s worth noting that Sumo Logic also provides a community-supported collection method using Kinesis stream. If you are using Amazon Kinesis and are interested in this option, log into the Sumo Logic Support Portal and see the instructions in Sumo Logic App for Amazon VCP Flow Logs using Kinesis. But for most use cases, the Sumo Logic App for Amazon VPC Flow Logs described here is the preferred solution. Summary Because the VPC Flow Log system is native to the Amazon Cloud environment, it eliminates the overhead and the limitations associated with agent-based logging, including the limited scope of network flow that is visible to most agents. VPC Flow Logs provide a multidimensional, multi-resolution view of your network flow, along with an API that is versatile enough for both automation and sophisticated analytics. All in all, the VPC Flow Log system is one of the most valuable and flexible cloud-based tools available for DevOps today. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the 90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Docker Engine for Windows Server 2016

Until recently, deploying containers on Windows (or on Microsoft’s Azure cloud) meant deploying them on a Linux/UNIX VM managed by a Windows-based hypervisor. Microsoft is a member of the Open Container Initiative, and has been generally quite supportive of such solutions, but they necessarily required the added VM layer of abstraction, rather than being native Windows containers. If you wanted containers, and if you wanted Docker, at some point, you needed Linux or UNIX. Microsoft’s much anticipated Technical Preview of the Docker Engine for Windows Server is out. Let’s look at some of the differences between Docker on Linux. First, though, let’s look at the difference between a virtual machine and containerization? If you’re reading this, you probably know the answer in detail, but we’ll do a quick run-through anyway. Containerization From the Past A virtual machine (or VM) is a complete computer hardware layer (CPU, RAM, I/O, etc.) abstracted to software, and running as if it were a self-contained, independent machine within the host operating system. A VM is managed by a hypervisor application (such as VirtualBox or VMware) running on the host system. The VM itself has an operating system that is completely independent of the host system, so a virtual machine with a Linux OS can run on Windows, or vice versa. While a container may look like a virtual machine, it is a very different sort of device when you look under the hood. Like a VM, a container provides a self-contained, isolated environment in which to run a program. A container, however, uses many of the resources of the host system. At the same time, the applications within the container can’t see or act outside of the bounds of the container. Since the container uses so many of the host system’s basic resources, the container’s OS is essentially the same as the host OS. While VM hypervisors are available for most operating systems (allowing an individual instance of a VM to be essentially machine-independent), containers developed largely out of the Linux/UNIX world, and have been closely tied to Linux and UNIX systems. Docker has become the de facto standard for deploying and managing containers, and Docker itself is native to the Linux/UNIX world. Windows Server 2016 and the New Hyper-V Enter Windows Server Containers, and Windows Server 2016. Windows Server Containers are native Windows containers running on Windows Server 2016, and they come with a Windows implementation of Docker. Now, if you want container and you want Docker, you can have them directly on Windows. But what does this mean in practice? First and foremost, a Windows Server Container is exactly what the name implies — a Windows container. Just as a Linux-based container makes direct use of the resources of the underlying operating system and is dependent on it, a Windows Server Container uses Windows resources, and is dependent on Windows. This means that you can’t deploy a Windows Server Container directly on a Linux/UNIX system, any more than you can deploy a Linux container directly on a Windows system. Note: Initially system resources used in a Windows Server Container instance must exactly match those used by the host system in terms of version number, build, and patch; since Windows Server Containers are still in the technical preview stage, however, this may change.) Microsoft has added a bonus: along with Windows Server Containers, which are standard containers at heart, it is also offering a kind of hybrid container, which it calls Hyper-V. A Hyper-V container is more like a standard virtual machine, in that it has its own operating system kernel and memory space, but in other ways, it is dependent on system resources in the manner of a typical container. Microsoft says that the advantage of Hyper-V containers is that they have greater isolation from the host system, and thus more security, making them a better choice for situations where you do not have full control over what’s going to be going on within the containers. Hyper-V containers can be deployed and managed exactly like Windows Server Containers. How they Compare So, then, is Windows Docker really Docker? Yes, it is. Microsoft has taken considerable care to make sure that all of the Docker CLI commands are implemented in the Windows version; Windows Server Containers (and Hyper-V containers) can be managed either from the Docker command line or the PowerShell command line. Now for some of the “mostly, but not 100%” part: Windows Server Containers and Docker containers aren’t quite the same thing. You can use Docker to create a Windows Server Container from an existing Windows Server Container image, and you can manage the new container using Docker. You can also create and manage Windows Server Containers from PowerShell, but if you provision a container with PowerShell it cannot be managed directly with the Docker client/server and vice versa. You must stick with one provisioning method. (Microsoft, however, has indicated that this may change.) In many ways, these are the complications that you would expect when porting a conceptually rather complex and platform-dependent system from one OS to another. What’s more important is that you can now deploy containers using Docker directly on Windows. It may not be a threat to Linux, but it does keep Microsoft in the game at a time when that game has been shifting more and more towards open-source and generally Linux/UNIX-based DevOps tools. So to answer the question, “Are Windows Server Containers really Docker?” — they are as much Docker as you could reasonably expect, and then some. They are also definitely Windows containers, and Microsoft to the core. Docker Engine for Windows Server is not yet GA – If you’re currently running apps from Docker containers running on Linux, check out the Docker Log Analyzer from Sumo Logic.

Blog

Log Analysis in a DevOps Environment

Log analysis is a first-rate debugging tool for DevOps. But if all you’re using it for is finding and preventing trouble, you may be missing some of the major benefits of log analysis. What else can it offer you? Let’s talk about growth. First of all, not all trouble shows up in the form of bugs or error messages; an “error-free” system can still be operating far below optimal efficiency by a variety of important standards. What is the actual response time from the user’s point of view? Is the program eating up clock cycles with unnecessary operations? Log analysis can help you identify bottlenecks, even when they aren’t yet apparent in day-to-day operations. Use Cases for Log Analysis Consider, for example, something as basic as database access. As the number of records grows, access time can slow down, sometimes significantly; there’s nothing new about that. But if the complexity and the number of tables in the database are also increasing, those factors can also slow down retrieval. If the code that deals with the database is designed for maximum efficiency in all situations, it should handle the increased complexity with a minimum of trouble. The tricky part of that last sentence, however, is the phrase “in all situations”. In practice, most code is designed to be efficient under any conditions which seem reasonable at the time, rather than in perpetuity. A routine that performs an optional check on database records may not present any problem when the number of records is low, or when it only runs occasionally, but it may slow the system down if the number of affected records is too high, or if it is done too frequently. As conditions change, hidden inefficiencies in existing code are likely to make themselves known, particularly if the changes put greater demands on the system. As inefficiencies of this kind emerge (but before they present obvious problems in performance) they are likely to show up in the system’s logs. As an example, a gradual increase in the time required to open or close a group of records might appear, which gives you a chance to anticipate and prevent any slowdowns that they might cause. Log analysis can find other kinds of potential bottlenecks as well. For example, intermittent delays in response from a process or an external program can be hard to detect simply by watching overall performance, but they will probably show up in the log files. A single process with significant delays in response time can slow down the whole system. If two process are dependent on each other, and they each have intermittent delays, they can reduce the system’s speed to a crawl or even bring it to a halt. Log analysis should allow you to recognize these delays, as well as the dependencies which can amplify them. Log Data Analytics – Beyond Ops Software operation isn’t the only thing that can be made more efficient by log analysis. Consider the amount of time that is spent in meetings simply trying to get everybody on the same page when it comes to discussing technical issues. It’s far too easy to have a prolonged discussion of performance problems and potential solutions without the participants having a clear idea of the current state of the system. One of the easiest ways to bring such a meeting into focus and shorten discussion time is to provide everybody involved with a digest of key items from the logs, showing the current state of the system and highlighting problem areas. Log analysis can also be a major aid to overall planning by providing detailed picture of how the system actually performs. It can help you map out the parts of the system are the most sensitive to changes in the performance in other areas, allowing you to avoid making alterations which are likely to degrade performance. It can also reveal unanticipated dependencies, as well as suggesting potential shortcuts in the flow of data. Understanding Scalability via Log Analysis One of the most important things that log analysis can do in terms of growth is to help you understand how the system is likely to perform as it scales up. When you know the time required to perform a particular operation on 100,000 records, you can roughly calculate the time required to do the same operation with 10,000,000 records. This in turn allows you to consider whether the code that performs the operation will be adequate at a larger scale, or whether you will need to look at a new strategy for producing the same results. Observability and Baseline Metrics A log analysis system that lets you establish a baseline and observe changes to metrics in relation to that baseline is of course extremely valuable for troubleshooting, but it can also be a major aid to growth. Rapid notification of changes in metrics gives you a real-time window into the way that the system responds to new conditions, and it allows you to detect potential sensitivities which might otherwise go unnoticed. In a similar vein, a system with superior anomaly detection features will make it much easier to pinpoint potential bottlenecks and delayed-response cascades by alerting you to the kinds of unusual events which are often signatures of such problems. All of these things — detecting bottlenecks and intermittent delays, as well as other anomalies which may signal future trouble, anticipating changes in performance as a result of changes in scale, recognizing inefficiencies — will help you turn your software (and your organization) into the kind of lean, clean system which is so often necessary for growth. And all of these things can, surprisingly enough, come from something as simple as good, intelligent, thoughtful log analysis.

Blog

Open Sourcing DevOps

Once again, DevOps is moving the needle. This time, it’s in the open source world, and both open source and commercial software may never be the same again.As more and more open source projects have become not only successful, but vital to the individual and organizations that use them, the open source world has begun to receive some (occasionally grudging) respect from commercial software developers. And as commercial developers have become increasingly dependent on open source tools, they have begun to take the open source process itself more seriously.Today some large corporations have begun to actively participate in the open source world, not merely as users, but as developers of open source projects. SAP, for example, has a variety of projects on github, and Capital One has just launched its open source Hygeia project, also on github. Why would large, corporate, and commercial software developers place their code in open source repositories? Needless to say, they’re making only a limited number of projects available as open source, but for companies used to treating their proprietary source code as a valuable asset that needs to be closely guarded, even limited open source exposure is a remarkable concession. It’s reasonable to assume that they see significant value in the move. What kind of payoff are they looking for? Hiring. The open source community is one of the largest and most accessible pools of programming talent in software history, and a high percentage of the most capable participants are underemployed by traditional standards. Posting an attractive-looking set of open source projects is an easy way to lure new recruits. It allows potential employees to get their feet wet without making any commitments (on either end), and it says, “Hey, we’re a casual, relaxed open source company — working for us will be just like what you’re doing now, only you’ll be making more money!” Recognition. It’s an easy way to recognize employees who have made a contribution — post a (non-essential) project that they’ve worked on, giving them credit for their work. It’s cheaper than a bonus (or even a trophy), and the recognition that employees receive is considerably more public and lasting than a corporate award ceremony. Development of open source as a resource. Large corporations are already major users and sponsors of open-source software, often with direct involvement at the foundation level. By entering the open source world as active contributors, they are positioning themselves to exert even greater influence on its course of development by engaging the open source community on the ground. Behind the direct move into open source is also the recognition that the basic model of the software industry has largely shifted from selling products, which by their nature are at least somewhat proprietary, to selling services, where the unique value of what is being sold depends much more on the specific combination of services being provided, along with the interactions between the vendor and the customer. The backend code at a website can all be generic; the “brand” — the combination of look-and-feel, services provided, name recognition, and trademarks — is the only thing that really needs to be proprietary. And even when other providers manage to successfully clone a brand, they may come up short, as Facebook’s would-be competitors have discovered. Facebook is an instructive example, because (even though its backend code is unique and largely proprietary) the unique service which it provides, and which gives it its value, is the community of users — something that by its nature isn’t proprietary. In the service model, the uniqueness of tools becomes less and less important. In a world where all services used the same basic set of tools, individual service providers could and would still distinguish themselves based on the combinations of services that they offered and the intangibles associated with those services. This doesn’t mean that the source code for SAP’s core applications is about to become worthless, of course. Its value is intimately tied to SAP’s brand, its reputation, its services, and perhaps more than anything, to the accumulated expertise, knowledge, and experience which SAP possesses at the organizational level. As with Facebook, it would be much easier to clone any of SAP’s applications than it would be to clone these intangibles. But the shift to services does mean that for large corporate developers like SAP, placing the code for new projects (particularly for auxiliary software not closely tied to their core applications) in an open source repository may be a reasonable option. The boundary between proprietary and open source software is no longer the boundary between worlds, or between commercial developers and open source foundations. It is now more of a thin line between proprietary and open source applications (or components, or even code snippets) on an individual basis, and very possibly operating within the same environment. For current and future software developers, this does present a challenge, but one which is manageable: to recast themselves not as creators of unique source code, or even as developers of unique applications, but rather as providers of unique packages of applications and services. These packages may include both proprietary and open source elements, but their value will lie in what they offer the user as a package much more than it lies in the intellectual property rights status of the components. This kind of packaging has always been smart business, and the most successful software vendors have always made good use of it. We are rapidly entering a time when it may be the only way to do business.

Blog

DevOps is a Strategy