其他分享
首页 > 其他分享> > CodeGo.net>选择不同计数真的很慢

CodeGo.net>选择不同计数真的很慢

作者:互联网

我有一个大约7000个对象的循环,并且在循环中我需要获得结构列表的不同计数.目前,我正在使用-

foreach (var product in productsToSearch)
{
    Console.WriteLine("Time elapsed: {0} start", stopwatch.Elapsed);
    var cumulativeCount = 0;
    productStore.Add(product);
    var orderLinesList = totalOrderLines
        .Where(myRows => productStore.Contains(myRows.Sku))
        .Select(myRows => new OrderLineStruct
        {
            OrderId = myRows.OrderId,
            Sku = myRows.Sku
        });
    var differences = totalOrderLines.Except(orderLinesList);
    cumulativeCount = totalOrderLinsCount - differences.Select(x => x.OrderId).Distinct().Count();
    cumulativeStoreTable.Rows.Add(product, cumulativeCount);      
    Console.WriteLine("Time elapsed: {0} end", stopwatch.Elapsed);
}

public struct OrderLineStruct
{
    public string OrderId { get; set; }
    public string Sku { get; set; }
}

获得唯一计数时,这非常慢.有人知道这样做更有效的方法吗?我尝试过使用MoreLinq,它对Linq具有DisctintBy方法,但是它的效率并不高.我已经玩过PLinq,但是我不确定在哪里可以并行化此查询.

因此,循环的每次迭代的时间为-

经过的时间:00:00:37.1142047开始

经过的时间:00:00:37.8310148结束

= 0.7168101秒
* 7000 = 5017.6707(83.627845分钟)

它的Distinct()Count()行花费的时间最多(约0.5秒).变量差异具有数十万个OrderLineStruct,因此对此执行任何linq查询的速度都很慢.

更新

我对循环进行了一些修改,现在它在大约10分钟内运行,而不是1个小时以上

foreach (var product in productsToSearch)
{
    var cumulativeCount = 0;
    productStore.Add(product);
    var orderLinesList = totalOrderLines
        .Join(productStore, myRows => myRows.Sku, p => p, (myRows, p) => myRows)
        .Select(myRows => new OrderLineStruct
        {
            OrderId = myRows.OrderId,
            Sku = myRows.Sku
        });
    totalOrderLines = totalOrderLines.Except(orderLinesList).ToList();
    cumulativeCount = totalOrderLinesCount - totalOrderLines.Select(x => x.OrderId).Distinct().Count();
    cumulativeStoreTable.Rows.Add(product, cumulativeCount);
}

在Except上具有.ToList()似乎有所不同,现在我在每次迭代后都删除已处理的订单,这将提高每次迭代的性能.

解决方法:

您在错误的位置寻找问题.

orderLinesList,差异和差异.Select(x => x.OrderId).Distinct()只是具有延迟执行的LINQ to Objects链接查询方法,而Count()方法正在全部执行它们.

您的处理算法效率很低.瓶颈是orderLinesList查询,它对每个产品的整个totalOrderLines列表进行迭代,并且将其链接(包含)在Except,Distinct等中-再次在循环内,即7000次.

这是IMO可以执行的示例高效算法:

Console.WriteLine("Time elapsed: {0} start", stopwatch.Elapsed);
var productInfo =
(
    from product in productsToSearch
    join line in totalOrderLines on product equals line.Sku into orderLines
    select new { Product = product, OrderLines = orderLines }
).ToList();
var lastIndexByOrderId = new Dictionary<string, int>();
for (int i = 0; i < productInfo.Count; i++)
{
    foreach (var line in productInfo[i].OrderLines)
        lastIndexByOrderId[line.OrderId] = i; // Last wins
}
int cumulativeCount = 0;
for (int i = 0; i < productInfo.Count; i++)
{
    var product = productInfo[i].Product;
    foreach (var line in productInfo[i].OrderLines)
    {
        int lastIndex;
        if (lastIndexByOrderId.TryGetValue(line.OrderId, out lastIndex) && lastIndex == i)
        {
            cumulativeCount++;
            lastIndexByOrderId.Remove(line.OrderId);
        }
    }
    cumulativeStoreTable.Rows.Add(item.Product, cumulativeCount);
    // Remove the next if it was just to support your processing
    productStore.Add(item.Product);
}
Console.WriteLine("Time elapsed: {0} end", stopwatch.Elapsed);

标签:plinq,morelinq,linq,c
来源: https://codeday.me/bug/20191027/1943482.html