Bard versus ChatGPT on DICOM Medical Imaging

14 min readMay 15, 2023

Google, after having been (in my opinion) roundly trounced by the abrupt and explosive mainstream adoption of OpenAI’s ChatGPT service, appears now to be aggressively competing to win back the hundreds of millions of lost AI users by the introduction of their new “Bard” chat service.

One interesting aspect of evaluating competing AI chatbot systems is that, similar to the search engine wars of yore, you can literally prompt each system with the exact same text and then simply compare the quality of the responses rendered.

Of course, any assessment of the “quality” of information flowing from these AI systems is pretty subjective, but I think that for certain topics, the utility and accuracy can be pretty obvious.

Building off of my prior posts, on ChatGPT’s comprehension of the DICOM Standard, I choose DICOM, yet again, as a topic to challenge each system and determine which AI technology gave the better responses.

The following is a comparison of each system using the same prompts and nearly identical prompt flow, though you will see that I had to prod Bard a bit more in order to guide it to completing the task.

Keep in mind that, while I will offer my assessment of each response, you don’t need to be a DICOM expert to see the quality difference in the responses.

ME: Can you explain the purpose of the Basic Offset Table in the DICOM Standard?

Bard: Sure. The Basic Offset Table (BOT) is a data structure in the DICOM Standard that is used to store the offsets to the individual frames of an encapsulated multi-frame image. The BOT is located at the beginning of the encapsulated image data, and it contains a sequence of 32-bit unsigned integer values that represent the byte offsets to the start of each frame. The BOT is used by decoders to quickly locate the individual frames of an encapsulated image.

The BOT is not required for all encapsulated images. For example, if an encapsulated image contains only one frame, then the BOT is not necessary. However, for multi-frame images, the BOT is essential for efficient decoding.

The BOT is a valuable tool for developers who are working with DICOM images. By understanding the BOT, developers can quickly and easily decode encapsulated multi-frame images.

NOTE: Bard “recap” details eliminated for brevity

ChatGPT: Yes, certainly.

The DICOM (Digital Imaging and Communications in Medicine) Standard is a set of rules that allows medical images and associated information to be exchanged between computer systems in a standard format. It's widely used in hospitals and healthcare facilities across the world.

The Basic Offset Table is a component of the DICOM standard, specifically within the Pixel Data element when encapsulated format is used (usually for multi-frame data).

The Basic Offset Table is an array of byte offsets, with each entry in the array representing the byte offset (from the start of the first Item following the Basic Offset Table) of the start of the Item that contains the pixel data of a frame. It is used to quickly access specific frames within the encapsulated Pixel Data without needing to parse the entire data sequence.

In other words, the Basic Offset Table acts as a "map" to quickly locate individual frames within the sequence, which is particularly useful when the file contains multiple frames or is particularly large.

Please note that, for single-frame images, or for multi-frame images that are not compressed, the Basic Offset Table is not used and is typically empty.

So far, the responses of each system are pretty competitive as both have accurately identified the purpose of the Basic Offset Table (used when encoding and decoding “encapsulated” compressed image data as apposed to “native” raw image data) and the actual structure of the table itself.

To Bard’s credit, it very clearly identified that the BOT data structure is a “sequence of 32-bit integer values” whereas ChatGPT only specified that the BOT is an “array of byte offsets” leaving the reader to guess the size of each byte offset.

ChatGPT, on the other hand, accurately mentions that this BOT is part of the “Pixel Data element” meaning that the whole data structure belongs to the Pixel Data (7FE0,0010) data element of an encapsulated DICOM data-set while also offering that the BOT offsets are literally offset “from the start of the first Item following the Basic Offset Table” disclosing a little more detail about the overall structure of the BOT and the Items that flow after it.

Both systems appear to understand that this method of data encoding is particularly useful for multi-frame images and not really necessary for a single frame.

So so far, let’s give this prompt to Bard for the sheer specificity of the “sequence of 32-bit integer values” while taking note of the potentially broader and more comprehensive detail supplied by ChatGPT.

ME: Can you explain what encapsulated data is in DICOM and how it is encoded within the PixelData data element?

Bard: Sure. Encapsulated data is a DICOM data element that can be used to store any type of data, including image data, audio data, and video data. The PixelData data element is used to store the pixel data for an image. When the PixelData data element is set to the Value Representation OB, it can be used to store encapsulated data.

To encode encapsulated data within the PixelData data element, the data is first compressed using one of the DICOM-defined compression algorithms. The compressed data is then stored as a sequence of bytes within the PixelData data element. The length of the PixelData data element is set to the number of bytes of compressed data.

When the PixelData data element is decoded, the compressed data is decompressed using the same compression algorithm that was used to compress the data. The decompressed data is then stored in a buffer. The buffer is then used to create a new DICOM dataset that contains the decoded data.

Encapsulated data is a powerful feature of the DICOM Standard. It allows DICOM datasets to store any type of data, including image data, audio data, and video data. This makes DICOM a versatile and powerful standard for the exchange of medical data.

NOTE: Bard “recap” details eliminated for brevity

ChatGPT: Encapsulated data in the DICOM (Digital Imaging and Communications in Medicine) standard is a way of handling and storing data that may not conform to the standard DICOM data types, most commonly compressed image data.

DICOM files can contain both native (uncompressed) and encapsulated (compressed) pixel data, which is stored in the Pixel Data tag (7FE0, 0010). When the pixel data is encapsulated, it means the image data (which could be a single or multi-frame image) is compressed using a specific compression scheme (e.g., JPEG, JPEG 2000, RLE, etc.), and that compressed data is then broken up into a sequence of fragments, each of which is stored in an Item in the sequence.

Here's a general outline of how encapsulated Pixel Data is structured:

The Pixel Data element begins with a sequence of items tag (FFFE, E000).

The first item in the sequence is the Basic Offset Table. This is an optional table which, if present, contains byte offsets for each frame in a multi-frame image, pointing to the start of the Item containing the first fragment of each frame. If it's not used, it will still be present but have a length of 0.

After the Basic Offset Table, each subsequent item in the sequence represents a fragment of the compressed pixel data. For single-frame images, there will generally be only one fragment. For multi-frame images, there may be one or more fragments per frame. Each fragment is an item in the sequence, starting with the item tag and length, followed by the fragment's pixel data.

The sequence is ended with a Sequence Delimitation Item (FFFE, E0DD) with a length of 0, indicating there are no more fragments.

This encapsulated format allows DICOM to handle a variety of different compression schemes while still providing a consistent way to access the pixel data, regardless of how it's compressed. It's worth noting that specific Transfer Syntax must be defined in the DICOM file to indicate which compression scheme is used.

Here ChatGPT really shines and you can start to see the strengths and weaknesses of each system surface quite strikingly.

First, Bard’s answer, similar to its “32-bit integer” response from the earlier prompt, appears to be attempting to be very “to the point” and fact-based yet it appears to have sacrificed both accuracy and context in order to achieve this goal.

Bard starts out by indicating that encapsulated data is a “DICOM data element that can be used to store any type of data” which, while a bit of a nitpick, is NOT the correct concept.

Encapsulated data is a scheme for encoding compressed (generally) data within a DICOM data element, NOT a data element itself.

Bard then goes on to make a series of inaccurate, half-baked statements that roughly belong to the possible response domain of the prompt’s problem-set but there are many issues with each statement.

First, Bard indicates that: “When the PixelData data element is set to the Value Representation OB, it can be used to store encapsulated data.” which is true but encoding the pixel data using an encapsulated scheme does NOT follow from the fact that the value-representation (i.e. DICOM’s notion of the “type” of the value of the data-element) is “OB” (Other Byte String) it comes as a consequence of needed to encode an already encoded data format directly into a DICOM data element. The “OB” value representation is used to represent this encoded data.

This scheme is called “encapsulated” because you are encapsulating another encoding scheme (e.g. various JPGs, MP4, whatever) within the DICOM data, itself an encoded scheme.

Bard then goes on to state: “To encode encapsulated data within the PixelData data element, the data is first compressed using one of the DICOM-defined compression algorithms.”

Here again, the overall statement is generally accurate but adding “one of the DICOM-defined compression algorithms” really undermines point.

The compression algorithms used for the encoding/decoding (compress/decompress) of the encapsulated data are standard algorithms like JPG, RLE or MP4.

These algorithms are NOT “defined” by the DICOM Standard but accepted by the standard as one of the known “Transfer Syntaxes” that can be used to encode data within a data element’s encapsulated data value.

Turning to ChatGPT’s response, I find it simply amazing how comprehensive and informative it is.

It presents both factual big picture concepts like the primary point that encapsulated data is “standard is a way of handling and storing data that may not conform to the standard DICOM data types, most commonly compressed image data.” as well as very accurate implementation-level details like the literal breakdown of the data “fragments” and their associated encoding using DICOM “Item (FFFE, E000)” and “Sequence Delimitation Item (FFFE, E0DD)”, and a reaffirming the relationship to the BOT.

ChatGPT then sums up the purpose of encapsulated data and takes care in ensuring that the reader understands that the compression scheme used within the encapsulated data must be expressed through the DICOM transfer syntax.

Let’s give this prompt to ChatGPT for its succinct yet comprehensive and highly informative and accurate response.

NOTE: At this point Bard has NOT made any mention of “fragments” so I need to specifically prompt it a bit more to prod it into a state to where ChatGPT was already pre-positioned by the prior prompt.

For that reason, the following two prompts are only asked of Bard.

ME: But what about fragments? How are they used?

Bard: Sure. Fragments are a way of dividing up large pieces of data into smaller pieces that can be transmitted more efficiently. In DICOM, fragments are used to divide up the PixelData data element. This can be useful for images that are too large to be transmitted in a single message.

To create a fragment, the PixelData data element is divided into a number of smaller pieces. Each piece is then encoded and transmitted as a separate message. When the fragments are received, they are reassembled into the original PixelData data element.

Fragments can be used to improve the performance of DICOM communications. By dividing up large pieces of data, fragments can be transmitted more quickly and efficiently. This can improve the overall performance of DICOM systems.

NOTE: Bard “recap” details eliminated for brevity

ME: But how exactly is the encapsulated PixelData encoded using the BOT and the fragments concept?

Bard: The encapsulated PixelData is encoded using the BOT and the fragments concept as follows:

The PixelData data element is divided into a number of fragments.

The fragments are encoded using a compression algorithm.

The compressed fragments are stored in a sequence of bytes.

The BOT is created. The BOT contains the byte offsets to the start of each fragment.

The BOT and the compressed fragments are stored together in the PixelData data element.

When the PixelData data element is decoded, the following steps are performed:

The BOT is read.

The byte offsets in the BOT are used to locate the start of each fragment.

The compressed fragments are decompressed.

The decompressed fragments are reassembled into the original PixelData data element.

The BOT and the fragments concept allows for efficient transmission of large pieces of data. By dividing up the data into smaller pieces, the data can be transmitted more quickly and efficiently. This can improve the overall performance of DICOM systems.

Here is an example of how the BOT and the fragments concept would be used to encode a 100MB image:

The image is divided into 100 fragments, each of which is 1MB in size.

Each fragment is compressed using a compression algorithm.

The compressed fragments are stored in a sequence of bytes.

The BOT is created. The BOT contains the byte offsets to the start of each fragment.

The BOT and the compressed fragments are stored together in the PixelData data element.

When the PixelData data element is decoded, the following steps are performed:

The BOT is read.

The byte offsets in the BOT are used to locate the start of each fragment.

The compressed fragments are decompressed.

The decompressed fragments are reassembled into the original image.

This process allows for the efficient transmission of a large piece of data, such as an image, over a network.

Bard’s responses to the two above prompts are really starting to show a trend namely, that Bard is answering largely factually but with little comprehension and continuity.

In a a sense, Bard provided numerous factoids that are roughly correct but even after all of this additional prompting, has yet to really attain the same level of understanding of DICOM scheme for encapsulated data as ChatGPT had established in response to only my second prompt.

All of the Bard response text specified above still leaves the reader with a materially less significant understanding of the concept of encapsulated data than ChatGPT presented with just eight beautifully structured sentences two prompts ago.

At this point, I’m convinced that ChatGPT is lightyears ahead of Bard which, to me at least, is starting to feel more like a fancy Google auto-complete and search function spitting back the likely next list to 25 factoids rather than truly applying some real digital-intelligence to the task of interpreting the DICOM Standard.

Finally, I wrap up my investigation by challenging the systems to take what they know about DICOM’s scheme for encapsulated data and putting it to the test with a very nuanced understanding of the standard, namely the relationship of “frames” to “fragments”.

ME: Can frames span multiple fragments?

Bard: Yes, frames can span multiple fragments. This is because the BOT and the fragments concept is used to encode the PixelData data element, which contains the pixel data for an image. The PixelData data element can be divided into a number of fragments, and each fragment can contain one or more frames. This allows for the efficient transmission of large images that are composed of multiple frames.

For example, a 100MB image that is composed of 100 frames could be divided into 100 fragments, each of which is 1MB in size. Each fragment could contain one frame, or it could contain multiple frames. The BOT would be used to store the byte offsets to the start of each fragment.

NOTE: Bard “recap” details eliminated for brevity

ChatGPT: Yes, a single frame of pixel data in a DICOM file can indeed span multiple fragments. This is especially common with certain types of compression where the compressed data for a single frame is larger than the maximum size of a single fragment.

In such cases, the compressed data for the frame is divided into multiple fragments, each of which is stored as a separate item in the pixel data sequence. The Basic Offset Table would then contain the offset to the start of the first fragment of each frame.

To reassemble the frame, you would start with the first fragment for that frame, as indicated by the Basic Offset Table, and then continue with each subsequent fragment until you reach the start of the next frame (as indicated by the next entry in the Basic Offset Table) or the end of the sequence.

Remember, each fragment is self-contained with its own item tag and length, so you can determine where one fragment ends and the next begins based on these item tags and lengths.

It's important to note that all fragments of a frame have to be concatenated and then decompressed together to correctly interpret the image data. You cannot decompress individual fragments separately when a frame spans multiple fragments.

Again, ChatGPT truly outshines Bard here as Bard is just flatly wrong in stating that “each fragment can contain one or more frames” which the DICOM Standard explicitly prohibits but also given the comprehensive and high quality nature of the response by ChatGPT.

ChatGPT is not only factually correct, it offers true clarity and continuity with the prior prompts (particularly for the Item encoding) giving the user a very clear understanding of the overall concept as well as the most important specific details.

ME: Can multiple frames occupy the same fragment?

Bard: Yes, multiple frames can occupy the same fragment. This is because the BOT and the fragments concept is used to encode the PixelData data element, which contains the pixel data for an image. The PixelData data element can be divided into a number of fragments, and each fragment can contain one or more frames. The size of each fragment is determined by the size of the frames that it contains.

NOTE: Bard “recap” details eliminated for brevity

ChatGPT: No, multiple frames cannot occupy the same fragment in the DICOM Standard. The DICOM specification requires that each frame's compressed data starts at the beginning of a fragment, and the same fragment cannot be shared by multiple frames.

When the DICOM file uses encapsulated (compressed) Pixel Data, the Basic Offset Table is used to specify the start of each frame's data. Each entry in the Basic Offset Table points to the start of the fragment that contains the beginning of a frame.

This strict organization is necessary to avoid ambiguity and to ensure that the image data can be correctly and unambiguously extracted and decompressed.

So in short, while a single frame can span multiple fragments, multiple frames cannot share the same fragment. Each new frame must start with a new fragment.

Finally, with this exchange, ChatGPT has truly trounced Bard.

Not only did ChatGPT best Bard it in its factual knowledge of the DICOM Standard by getting the answer to this final prompt correct, its understanding and capacity to explain clearly this very nuanced feature of the vast DICOM Standard is truly amazing and I would argue would best the explanations provided by many of the most experienced human DICOM engineers.

Don’t get me wrong, Google’s Bard is also pretty impressive and clearly there is a real battle shaping up in the AI wars, but at the moment, ChatGPT, in my opinion, is far superior and clearly leading the challenge and forging the real path forward for digital intelligence.

Bard versus ChatGPT on DICOM Medical Imaging

ME: Can you explain the purpose of the Basic Offset Table in the DICOM Standard?

ME: Can you explain what encapsulated data is in DICOM and how it is encoded within the PixelData data element?

ME: But what about fragments? How are they used?

ME: But how exactly is the encapsulated PixelData encoded using the BOT and the fragments concept?

ME: Can frames span multiple fragments?

ME: Can multiple frames occupy the same fragment?

Written by David Vaccaro

No responses yet