--- id: archive title: Archive Log Data to S3 using Installed Collectors description: Send data to an archive that you can ingest from later. slug: /help/docs/manage/data-archiving/archive/ canonical: https://www.sumologic.com/help/docs/manage/data-archiving/archive/ --- import useBaseUrl from '@docusaurus/useBaseUrl'; Archive allows you to forward log data from Installed Collectors to AWS S3 buckets to collect at a later time. If you have logs that you do not need to search immediately you can archive them for later use. You can ingest from your archive on-demand with five-minute granularity. :::important Do not change the name and location of the archived files in your S3 bucket. Otherwise, ingesting them later will not work properly. ::: To archive your data you need a processing rule configured to send to an AWS archive destination. First, [create an AWS archive destination](#create-an-aws-archive-destination), then [create archive processing rules](#create-a-processing-rule) to start archiving. Any data that matches the filter expression of an archive processing rule is not sent to Sumo Logic. Instead, it is sent to your AWS archive destination. :::note Every archived log message is tagged with the metadata fields specified by the collector and source. ::: ## Create an AWS archive destination :::note You need the [Manage S3 Data Forwarding](/docs/manage/users-roles/roles/role-capabilities/#data-management) role capability to create an AWS archive destination. ::: 1. Follow the instructions in [Grant Access to an AWS Product](/docs/send-data/hosted-collectors/amazon-aws/grant-access-aws-product/) to grant Sumo Logic permission to send data to the destination S3 bucket. 1. [**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Data Archiving**. You can also click the **Go To...** menu at the top of the screen and select **Data Archiving**.
[**Classic UI**](/docs/get-started/sumo-logic-ui-classic). In the main Sumo Logic menu, select **Manage Data > Collection > Data Archiving**. 1. Click **+** to add a new destination. 1. Select **AWS Archive bucket** for **Destination Type**.
Create a New Destination dialog 1. Configure the following: * **Destination Name**. Enter a name to identify the destination. * **Bucket Name**. Enter the exact name of the S3 bucket. :::note You can create only one destination with a particular bucket name.  If you try to create a new destination with the bucket name of an existing destination, the new destination replaces the old one. ::: * **Description**. You can provide a meaningful description of the connection. * **Access Method**. Select **Role-based access** or **Key access** based on the AWS authentication you are providing. Role-based access is preferred. This was completed in step 1, [Grant Sumo Logic access to an AWS Product](/docs/send-data/hosted-collectors/amazon-aws/grant-access-aws-product). * For **Role-based access** enter the Role ARN that was provided by AWS after creating the role. * For **Key access** enter the **Access Key ID** and **Secret Access Key.** See [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) for details. * For **AWS EC2 Credentials** instance profile credentials on an EC2 instance where an installed collector will be used to archive log data to S3, see [Using IAM Roles to Grant Access to AWS Resources on Amazon EC2](https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-roles.html). * **S3 Region**. Select the S3 region or keep the default value of Others. The S3 region must match the appropriate S3 bucket created in your Amazon account. 1. Click **Save**. If Sumo Logic is able to verify the S3 credentials, the destination will be added to the list of destinations and you can start archiving to the destination via processing rules. ## Create a processing rule A new processing rule type named **Archive messages that match** allows you to archive log data at the source level on Installed Collectors. :::note An archive processing rule acts like an exclude filter, functioning as a denylist filter where the matching data is not sent to Sumo Logic, and instead sends the excluded data to your AWS archive bucket. ::: Archive and forwarding rules are processed after all other processing rule types. When there are archive and forwarding rules they are processed in the order that they are specified in the UI, top to bottom. To configure processing rules for archiving using the web application, follow these steps: :::note To use JSON to configure a processing rule, use the `Forward` filter ype. See an example data forwarding rule. ::: 1. [**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Collection**. You can also click the **Go To...** menu at the top of the screen and select **Collection**.
[**Classic UI**](/docs/get-started/sumo-logic-ui-classic). In the main Sumo Logic menu, select **Manage Data > Collection > Collection**. 1. Search for the source that you want to configure, and click the **Edit** link for the source. The source must be associated with an Installed Collector. 1. Scroll down to the **Processing Rules** section and click the arrow to expand the section. 1. Click **Add Rule**. 1. Type a **Name** for this rule. (Names have a maximum of 32 characters.) 1. For **Filter**, type a regular expression that defines the messages you want to filter. The rule must match the whole message. For multi-line log messages, to get the lines before and after the line containing your text, wrap the segment with `(?s)` such as: `(?s).*matching text(?s).*` Your regex must be [RE2 compliant.](https://github.com/google/re2/wiki/Syntax) 1. Select **Archive messages that match** as the rule type. This option is visible only if you have defined at least one [AWS archive bucket destination](#create-an-aws-archive-destination), as described in the previous section.  1. Select the destination from the dropdown menu.
Archive rule 1. (Optional) Enter a **Prefix** that matches the location to store data in the S3 bucket. The prefix has the following requirements: * It can not start with a forward slash `/`. * It needs to end with a forward slash `/`. * Supports up to a maximum of 64 characters. * The following are supported characters: * Alphanumeric characters: 0-9, a-z, A-Z * Special characters: - _ . * ' ( ) 10. Click **Apply**. The new rule is listed along with any other previously defined processing rules. 11. Click **Save** to save the rules you defined and start archiving data that matches the rule. ## Archive format Forwarded archive files are prepended with a filename prefix based on the receipt time of your data with the following format: ``` dt=/hour=/minute=////v1/.txt.gzip ``` Collector version 19.361-3+ provides the ability to archive files with five-minute granularity. The format changes with the addition `v2` and the removal of `v1`. ``` v2/dt=/hour=/minute=////.txt.gzip ``` Example format of an archived log message: ``` {"_id":"763a9b55-d545-4564-8f4f-92fe9db9acea","date":"2019-11-15T13:26:41.293Z","sourceName":"/Users/sumo/Downloads/Logs/ingest.log","sourceHost":"sumo","sourceCategory":"logfile","message":"a log line"} ``` ## Batching By default, the collector will complete writing logs to an archive file once the uncompressed size of the file reaches 5 GB in size. You can configure the buffer size with the following [collector.properties](/docs/send-data/installed-collectors/collector-installation-reference/collector-properties.md) parameter. ### collector.properties buffer parameter | Parameter | Description | Data Type | Default | |:--|:--|:--|:--| | buffer.max.disk.bytes | The maximum size in bytes of the on-disk buffer per archive destination.
When the maximum is reached the oldest modified file(s) are deleted. | Integer | 5368709120 | ## Ingest data from archive You can ingest a specific time range of data from your archive at any time with an AWS S3 archive source. First, [create an AWS S3 archive source](#create-an-aws-s3-archivesource), then [create an ingestion job](#create-an-ingestion-job). ### Rules * A maximum of 2 concurrent ingestion jobs is supported. If more jobs are needed contact your Sumo Logic account representative. * An ingestion job has a maximum time range of 12 hours. If a longer time range is needed contact your Sumo Logic account representative. * Filenames or object key names must be in either of the following formats: * Sumo Logic [archive format](#archive-format) * `prefix/dt=YYYYMMDD/hour=HH/fileName.json.gz` * If the logs from archive do not have timestamps they are only searchable by receipt time. * If a field is tagged to an archived log message and the ingesting collector or source has a different value for the field, the field values already tagged to the archived log take precedence. * If the collector or source that archived the data is deleted the ingesting collector and source metadata fields are tagged to your data. * You can create ingestion jobs for the same time range. However, jobs maintain a 10 day history of ingested data, and any data resubmitted for ingestion within 10 days of its last ingestion will be automatically filtered so it's not ingested. ### Create an AWS S3 archive source :::note You need the [Manage Collectors](/docs/manage/users-roles/roles/role-capabilities/#data-management) role capability to create an AWS S3 archive source. ::: An AWS S3 archive source allows you to ingest your archived data. Configure it to access the AWS S3 bucket that has your archived data. :::note To use JSON to create an AWS S3 archive source, reference our AWS Log source parameters and use `AwsS3ArchiveBucket` as the value for `contentType`. ::: 1. [**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Collection**. You can also click the **Go To...** menu at the top of the screen and select **Collection**.
[**Classic UI**](/docs/get-started/sumo-logic-ui-classic). In the main Sumo Logic menu, select **Manage Data > Collection > Collection**. 1. On the **Collectors** page, click **Add Source** next to a Hosted Collector, either an existing Hosted Collector or one you have created for this purpose. 1. Select **AWS S3 Archive**.
Archive icon 1. Enter a name for the new source. A description is optional. 1. Select an **S3 region** or keep the default value of **Others**. The S3 region must match the appropriate S3 bucket created in your Amazon account. 1. For **Bucket Name**, enter the exact name of your organization's S3 bucket. Be sure to double-check the name as it appears in AWS. 1. For **Path Expression**, enter the wildcard pattern that matches the archive files you'd like to collect. The pattern: * can use one wildcard (\*). * can specify a [prefix](https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html#object-keys) so only certain files from your bucket are ingested. For example, if your filename is `prefix/dt=/hour=/minute=///v1/.txt.gzip`, you could use `prefix*` to only ingest from those matching files. * can **NOT** use a leading forward slash. * can **NOT** have the S3 bucket name. 1. For **Source Category**, enter any string to tag to the data collected from this source. Category metadata is stored in a searchable field called `_sourceCategory`. 1. **Fields**. Click the **+Add Field** link to add custom metadata fields. Define the fields you want to associate, each field needs a name (key) and value. :::note Fields specified on an AWS S3 archive source take precedence if the archived data has the same fields. ::: * Green check circle A green circle with a check mark is shown when the field exists and is enabled in the fields table schema. * Orange exclamation point An orange triangle with an exclamation point is shown when the field doesn't exist, or is disabled, in the fields table schema. In this case, an option to automatically add or enable the nonexistent fields to the fields table schema is provided. If a field is sent to Sumo Logic that does not exist in the fields schema or is disabled it is ignored, known as dropped. 1. For **AWS Access** you have two **Access Method** options. Select **Role-based access** or **Key access** based on the AWS authentication you are providing. Role-based access is preferred, this was completed in the prerequisite step [Grant Sumo Logic access to an AWS Product](/docs/send-data/hosted-collectors/amazon-aws/grant-access-aws-product/). * For **Role-based access**, enter the Role ARN that was provided by AWS after creating the role.  * For **Key access** enter the **Access Key ID **and** Secret Access Key.** See [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html) for details. 1. Create any processing rules you'd like for the AWS source. 1. When you are finished configuring the source, click **Save**. ## Archive page :::important You need the [Manage Collectors or View Collectors](/docs/manage/users-roles/roles/role-capabilities/#data-management) role capability to manage or view archive. ::: The archive page provides a table of all the existing [AWS S3 archive sources](#create-an-aws-s3-archivesource) in your account and ingestion jobs. [**New UI**](/docs/get-started/sumo-logic-ui/). To access the archive page, in the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Archive**. You can also click the **Go To...** menu at the top of the screen and select **Archive**. [**Classic UI**](/docs/get-started/sumo-logic-ui-classic). To access the archive page, in the main Sumo Logic menu select **Manage Data > Collection > Archive**. Archive page ### Details pane Click on a table row to view the source details. This includes: * **Name** * **Description** * **AWS S3 bucket** * All **Ingestion jobs** that are and have been created on the source. * Each ingestion job shows the name, time window, and volume of data processed by the job. Click the icon Open in search icon to the right of the job name to start a search against the data that was ingested by the job. * Hover your mouse over the information icon to view who created the job and when.
Archive details pane ## Create an ingestion job :::note A maximum of 2 concurrent jobs is supported. ::: An ingestion job is a request to pull data from your S3 bucket. The job begins immediately and provides statistics on its progress. To ingest from your archive you need an AWS S3 archive source configured to access your AWS S3 bucket with the archived data. 1. [**New UI**](/docs/get-started/sumo-logic-ui). In the main Sumo Logic menu select **Data Management**, and then under **Data Collection** select **Archive**. You can also click the **Go To...** menu at the top of the screen and select **Archive**.
[**Classic UI**](/docs/get-started/sumo-logic-ui-classic). In the main Sumo Logic menu, select **Manage Data > Collection > Archive**. 1. On the **Archive** page search and select the AWS S3 archive source that has access to your archived data. 1. Click **New Ingestion Job** and a window appears where you: 1. Define a mandatory job name that is unique to your account. 1. Select the date and time range of archived data to ingest. A maximum of 12 hours is supported.
Archive ingest job 1. Click **Ingest Data** to begin ingestion. The status of the job is visible in the details pane of the source in the archive page. ### Job status An ingestion job will have one of the following statuses: * **Pending**. The job is queued before scanning has started. * **Scanning**. The job is actively scanning for objects from your S3 bucket. Your objects could be ingesting in parallel. * **Ingesting** The job has completed scanning for objects and is still ingesting your objects. * **Failed**. The job has failed to complete. Partial data may have been ingested and is searchable. * **Succeeded** The job completed ingesting and your data is searchable. ## Search ingested archive data Once your archive data is ingested with an ingestion job you can search for it as you would any other data ingested into Sumo Logic. On the archive page find and select the archive S3 source that ran the ingestion job to ingest your archive data. In the [details pane](#details-pane), you can click the **Open in Search** link to view the data in a search that was ingested by the job. :::note When you search for data in the Frequent or Infrequent Tier, you must explicitly reference the partition. ::: The metadata field `_archiveJob` is automatically created in your account and assigned to ingested archive data. This field does not count against your fields limit. Ingested archive data has the following metadata assignments: | Field | Description | |:----------------|:-------------------------------------| | `_archiveJob` | The name of the ingestion job assigned to ingest your archive data. | | `_archiveJobId` | The unique identifier of the ingestion job. | ## Audit ingestion job requests The [Audit Event Index](/docs/manage/security/audit-indexes/audit-event-index) provides events logs in JSON when ingestion jobs are created, completed, and deleted.