5

トヨタの製造ラインでは、部品がどのような経路をたどったかを常に把握しています。何か問題が発生した場合に修正できることを確認できるようにするためです。これはソフトウェアにも当てはまりますか?

すべてのエラー メッセージは、移動したパスを正確に教えてくれるはずです。いくつかは、スタック トレースのエラー メッセージです。これは正しい解釈ですか?他の場所で使用できますか?

わかりました、これがポッドキャストです。面白いと思います

http://itc.conversationsnetwork.org/shows/detail3798.html

4

4 に答える 4

5

A good idea where practicable. Unfortunately, it is usually prohibitively difficult to keep track of the entire history of the state of the machine. You just can't tag each data structure with where you got it from, and the entire state of that object. You might be able to store just the external events and in that way reproduce where everything came from.

Some examples:

I did work on a project where it was practicable and it helped immensely. When we were getting close to shipping, and running out of bugs to fix, we would have our game play in "zero players mode", where the computer would repeatedly play itself all night long with all variations of characters and locales. If it asserted, it would display the random key that started the match. When we came to work in the morning we'd write the key down from our screen (there usually was one) and start it again using that key. Then we'd just watch it until the assert came up, and track it down. The important thing is that we could recreate all the original inputs that led to the error, and rerun it as many times as we wanted, even after recompiles (within limits... the number of fetches from the random number generator could not be changed, although we had a separate RNG for non-game stuff like visual fx). This only worked because each match started after a warm reboot and took only a very small amount of data as input.

I have heard that Bungie used a similar method to try to discover bad geometry in their Halo levels. They would set the dev kits running overnight in a special mode where the indestructable protagonist would move and jump randomly. In the morning they'd look and see if he got stuck in the geometry at some location where he couldn't get out. There may have been grenades involved, too.

On another project we actually logged all user interaction with a timestamp so we could replay it. That works great if you can, but most people have interactions with a changing DB whose entire state might not be stored so easily.

于 2008-09-21T12:57:24.047 に答える
2

ソフトウェアではそれほど重要ではありません。ソフトウェアで問題が発生した場合、通常はその障害を再現して分析できます。1000 回に 1 回しか発生しない場合でも、多くの場合、すべてのロギングをオンにして 1000 回実行できます (単純な浸漬テスト)。

これは、製造ラインでははるかに費用と時間がかかり、不可能なほどです。

最初に問題が発生したときにできるだけ多くの情報を利用できるようにすることは悪いことではありませんが、トヨタほど重要ではありません。

于 2008-09-21T12:19:02.627 に答える
0

これは良いアプローチです。ただし、ロギングをやりすぎないように注意してください。そうしないと、ノイズの中で興味深い情報を見つけることができず、全体的なパフォーマンスが低下します (言語によっては匿名オブジェクトの作成など)。

于 2008-09-21T12:18:39.897 に答える
0

Producing error messages with a full stack trace is usually bad security practice.
On the other hand, and more in line with Toyota's intent, every developed module should be traced back to the original programmer(s) - and they should be held accountable for shoddy work, bug fixes, security vulnerabilities, etc. Not for disciplinary purposes, but both maintenance, and education if necessary. And maybe for bonuses, in the contrary situation... ;-)

于 2008-09-21T12:56:59.713 に答える