其他分享
首页 > 其他分享> > Glue:Resource:aws_glue_crawler

Glue:Resource:aws_glue_crawler

作者:互联网

类型

Resource

标签

aws_glue_crawler

AWS服务

Glue

说明

主要用于创建Glue 爬网程序。

样例

数据在DynamoDB中存储

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = aws_iam_role.example.arn

  dynamodb_target {
    path = "table-name"
  }
}

数据在数据库中存储,通过JDBC的方式进行元数据爬取

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = aws_iam_role.example.arn

  jdbc_target {
    connection_name = aws_glue_connection.example.name
    path            = "database-name/%"
  }
}

数据在S3中存储

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = aws_iam_role.example.arn

  s3_target {
    path = "s3://${aws_s3_bucket.example.bucket}"
  }
}

数据在Catalog中存储

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = aws_iam_role.example.arn

  catalog_target {
    database_name = aws_glue_catalog_database.example.name
    tables        = [aws_glue_catalog_table.example.name]
  }

  schema_change_policy {
    delete_behavior = "LOG"
  }

  configuration = <<EOF
{
  "Version":1.0,
  "Grouping": {
    "TableGroupingPolicy": "CombineCompatibleSchemas"
  }
}
EOF
}

数据在MongoDB中存储

resource "aws_glue_crawler" "example" {
  database_name = aws_glue_catalog_database.example.name
  name          = "example"
  role          = aws_iam_role.example.arn

  mongodb_target {
    connection_name = aws_glue_connection.example.name
    path            = "database-name/%"
  }
}

爬网程序的配置设置样例

resource "aws_glue_crawler" "events_crawler" {
  database_name = aws_glue_catalog_database.glue_database.name
  schedule      = "cron(0 1 * * ? *)"
  name          = "events_crawler_${var.environment_name}"
  role          = aws_iam_role.glue_role.arn
  tags          = var.tags

  configuration = jsonencode(
    {
      Grouping = {
        TableGroupingPolicy = "CombineCompatibleSchemas"
      }
      CrawlerOutput = {
        Partitions = { AddOrUpdateBehavior = "InheritFromTable" }
      }
      Version = 1
    }
  )

  s3_target {
    path = "s3://${aws_s3_bucket.data_lake_bucket.bucket}"
  }
}

参数(待翻译)

Note:Must specify at least one of dynamodb_target, jdbc_target, s3_target or catalog_target

Dynamodb Target

JDBC Target

S3 Target

Catalog Target

Note:deletion_behavior of catalog target doesn't support DEPRECATE_IN_DATABASE

Note:configuration for catalog target crawlers will have { ... "Grouping": { "TableGroupingPolicy": "CombineCompatibleSchemas"} } by default.

MongoDB Target

Schema Change Policy

Lineage Configuration

Recrawl Policy

属性引用

资源导入

替换如下命令行中的参数${crawler_job}参数并运行,可以导入Glue Crawler:

 

$ terraform import aws_glue_crawler.${crawler_job} ${crawler_job}

 

 

 

 

 

标签:glue,name,aws,target,Glue,Optional,example,crawler
来源: https://www.cnblogs.com/lyk-sx/p/15587605.html