首页 > 数据库> > 带有自引用查询的mysql更新

带有自引用查询的mysql更新

2019-06-26 07:01:11 作者：互联网

我有一个调查表,其中包含(以及其他)以下列

survey_id  - unique id
user_id    - the id of the person the survey relates to
created    - datetime
ip_address - of the submission
ip_count   - the number of duplicates

由于记录集很大,因此动态运行此查询是不切实际的,因此尝试创建一个更新语句,该语句将在ip_count中定期存储“缓存”结果.

ip_count的目的是显示重复的ip_address调查提交的数量已经收到相同的user_id,期限为12个月(/ – 创建日期的6个月).

使用以下数据集,这是预期的结果.

survey_id   user_id    created    ip_address     ip_count  #counted duplicates survey_id
  1            1      01-Jan-12   123.132.123       1      # 2
  2            1      01-Apr-12   123.132.123       2      # 1, 3
  3            2      01-Jul-12   123.132.123       0      # 
  4            1      01-Aug-12   123.132.123       3      # 2, 6
  6            1      01-Dec-12   123.132.123       1      # 4

这是我迄今为止提出的最接近的解决方案,但是这个查询没有考虑到日期限制并且努力想出另一种方法.

UPDATE surveys
JOIN(
  SELECT ip_address, created, user_id, COUNT(*) AS total
  FROM surveys  
  WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
  GROUP BY ip_address, user_id
) AS ipCount 
  ON (
    ipCount.ip_address = surveys.ip_address
    AND ipCount.user_id = surveys.user_id
    AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
  )
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address

谢谢你的帮助提前:)

解决方法:

我没有你的桌子,所以我很难形成一个肯定有用的正确的sql,但我可以为此拍摄,希望能够帮助你…

首先,我需要对自己进行调查的笛卡尔积,并过滤掉我不想要的行

select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)

此输出应包含匹配(根据您的规则)TWICE的每对调查(一次针对第一个位置的每个id,一次针对它位于第二个位置)

然后我们可以在这个输出上做一个GROUP BY来得到一个表,它基本上给了我每个survey_id正确的ip_count

(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)

所以现在我们有一个表将每个survey_id映射到正确的ip_count.要更新原始表,我们需要将其与此连接并复制值

所以应该看起来像

UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x

那里有一些伪代码,但我认为一般的想法应该有效

我之前从未根据另一个查询的输出更新表.试图从这个问题中猜测正确的语法 – How do I UPDATE from a SELECT in SQL Server?

此外,如果我需要为自己的工作做这样的事情,我不会尝试在单个查询中执行此操作.这将很难维护并且可能存在内存/性能问题.最好让脚本逐行遍历表,在事务中的单行更新,然后再转到下一行.更慢,但更容易理解,可能更轻松的数据库.

标签：mysql,self-join
来源： https://codeday.me/bug/20190626/1291259.html