sql - SQLで逸脱したレコードを除外する

Question

列の平均を取得するために必要なこのデータセットがあります。select avg(x) from yトリックを行います。ただし、より正確な数値が必要です。

高すぎる値または低すぎる値 (スパイク) を持つレコードをフィルタリングして、平均を計算する際にそれらを除外できるようにする方法が必要であると考えました。

score 3 · Accepted Answer

There are three types of average, and what you are originally using is the mean - the sum of all the values divided by the number of values.

You might find it more useful to get the mode - the most frequently occuring value:

select name,    
       (select top 1 h.run_duration
        from sysjobhistory h
        where h.step_id = 0
        and h.job_id = j.job_id
        group by h.run_duration
        order by count(*) desc) run_duration
from sysjobs j

If you did want to get rid of any values outside the original standard deviation, you could find the average and the standard deviation in a subquery, eliminate those values which are outside the range : average +- standard deviation, then do a further average of the remaining values, but you start running the risk of having meaningless values:

select oh.job_id, avg(oh.run_duration) from sysjobhistory oh
inner join (select job_id, avg(h.run_duration) avgduration, 
            stdev(h.run_duration) stdev_duration 
            from sysjobhistory h 
            group by job_id) as m on m.job_id = oh.job_id
where oh.step_id = 0
and abs(oh.run_duration - m.avgduration) <  m.stdev_duration
group by oh.job_id

score 1 · Accepted Answer

1

SQLサーバーにはSTDEV関数もあるので、それが役立つかもしれません...

于 2008-12-09T11:07:26.940 に答える

sql - SQLで逸脱したレコードを除外する

2 に答える 2

Related

Reference