如何将json从云存储上的文件导入Bigquery
作者:互联网
我试图通过api将文件(json.txt)从云存储导入Bigquery并抛出错误.当通过web ui完成时,它可以工作并且没有错误(我甚至设置了maxBadRecords = 0).有人可以告诉我我在这里做错了什么吗?代码是错误的,还是我需要在某个地方更改Bigquery中的某些设置?
该文件是一个纯文本utf-8文件,内容如下:我保留了关于bigquery和json导入的文档.
{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}
并在导入作业时抛出以下错误:“值无法转换为预期类型.”每一行.
{
"reason": "invalid",
"location": "Line:15 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:16 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:17 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"location": "Line:18 / Field:1",
"message": "Value cannot be converted to expected type."
},
{
"reason": "invalid",
"message": "Too many errors encountered. Limit is: 10."
}
]
},
"statistics": {
"creationTime": "1384484132723",
"startTime": "1384484142972",
"endTime": "1384484182520",
"load": {
"inputFiles": "1",
"inputFileBytes": "960",
"outputRows": "0",
"outputBytes": "0"
}
}
}
该文件可在此处访问:
http://www.sendspace.com/file/7q0o37
我的代码和架构如下:
def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
'configuration'=> {
'load'=> {
'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
'schema'=> {
'fields'=> [
{ 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
{ 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
{ 'name'=>'object_id', 'type'=>'INTEGER', 'mode'=> 'nullable'}
]
},
'destinationTable'=> {
'projectId'=> @project_id.to_s,
'datasetId'=> dataset,
'tableId'=> table
},
'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
'createDisposition' => 'CREATE_IF_NEEDED',
'maxBadRecords'=> 10,
}
},
}
result = @client.execute(
:api_method=> @bigquery.jobs.insert,
:parameters=> {
#'uploadType' => 'resumable',
:projectId=> @project_id.to_s,
:datasetId=> dataset},
:body_object=> config
)
# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?
puts result.response.body
json = JSON.parse(result.response.body)
while true
job_status = get_job_status(json['jobReference']['jobId'])
if job_status['status']['state'] == 'DONE'
puts "DONE"
return true
else
puts job_status['status']['state']
puts job_status
sleep 5
end
end
end
有人可以告诉我我做错了什么吗?我该修复什么,在哪里?
此外,在未来的某个时刻,我希望使用压缩文件并从中导入 – 这是“tar.gz”还是可以,或者我只需要将其设为“.gz”吗?
提前感谢您的帮助.欣赏它.
解决方法:
很多人(包括我)受到了同样的打击,你受到的打击 –
您正在导入json文件但未指定导入格式,因此它默认为csv.
如果你将configuration.load.sourceFormat设置为NEWLINE_DELIMITED_JSON,你应该很高兴.
我们有一个错误,使其更难做或至少能够检测文件何时是错误的类型,但我会优先考虑.
标签:json,python,import,ruby,google-bigquery 来源: https://codeday.me/bug/20190629/1324451.html