https://www.cs.cmu.edu/~jbigham/pubs/pdfs/2013/visualchallenges.pdf

The above is the paper we are going to discuss. In my earlier article The Question we are not asking about computer vision I contemplated about what more we could achieve from the image, in short 'A picture with thousand word'. In this artice I am going to discuss, about the challenges of blind while using the technology designed for them. As I said visual questioning answering is one such thing in acaedemia and there are other commercial products such as Orcam which incorporates the similar concept but with additional touch to different AI technologies such as OCR.

from IPython.display import YouTubeVideo

YouTubeVideo('Khjes55OQKM', width=800, height=300)

This paper goes into the shoes of blind people to find out why builuding a technology is so much hard on mobile devices or any edge devices. Authors of this paper have gathered data on 5,329 visually impaired person asked 40,748 questions about photographs that they took from iPhones using an application called VizWiz Social. This paper dates back to 2013 and in my little find I din't found any app.
VizWiz Social is an iPhone application that lets a blind person take a picture, speak a question they’d like to know about the picture, and then get an answer back within a minute or so. The diversity of areas and the question they are trying to answer in this paper covers truly appreciable. As a normal human being wearing a cloth doesn't sound big a deal, but for visually impaired person it's totally different problem. They don't understand color, they don't understand inward or outwards, ironed or not ironed. By outlining the types of question asked frequetnly or the problem they struggle the most the put it in different taxonomy.

VizWiz Social provides insight into a specific but important subset of challenges faced by blind users, i.e., those that can be represented with a still photograph and brief audio description and that can be answered quickly but asynchronously. Other types of challenges, such as those where a user needs help in a situation requiring conveying and/or receiving continuous information, are beyond the bounds of the current study. The pattern – taking a picture and receiving information about it – is also present in much of the automatic technology in use today, and so is a familiar paradigm.

Question Asked

The questions posed by VizWiz Social users cover a wide range of accessibility challenges that people face on a daily basis. Questions helped users complete daily tasks (e.g., getting dressed, cooking meals) and can provide information about rare events (e.g., a child’s illness, a mouse in the kitchen).
Researchers developed a radom sample of 1000 questions. Categories were assigned; how these category were assigned is mind blowing you will get to know with the below picture.

Look at the depth of the categories and the question they are asking. If user he has pills then the subject must want to know what kind of pill, or expiry date of the pills.

Identification: Asking for an object to be identified by name or type
Reading: Asking for a text to be read from some physical object or electronic display
Description: Asking for a description of some visual or physical property of an object
Other: Question did not ask an answerable question

We Understimate this thought; with just asking question there could be wide variety of answering one single problem.

An Other category is used for unanswerable questions, such as those with unusable images, those that cannot be answered from the content shown in the photo, or those in a foreign language. This categorization of questions of questions opens a new door for solving the problem. The results were profound.

The above data now gives a clear insight what common problem a visual impaired person come across. The paper goes in length to describe the most interesting shows the given picture.

Primary Subject Photographs

Researcher not only stop there they went ahead to categorize one more problem that is, what if the questions asked by the user is in context with the image. Based on this they were able to focus on subject matter (rather than on their intent, as in the prior section) and offers insight into what categories of visual information VizWiz users could not easily access. I want you to read more about this in the paper.

Question Urgency

One technological challenge that they have to face was to get answers urgently that seem plausible 7 years back, but I am afraid to ask this question this is going be challenge today because there are lot of variables in it but they also tackled this problem beautifully. They said what if can seggregate based on the priority therefore, the scale was described based on the urgency.

Within a minute: The question asked must be answered in 60 seconds or less.
Within a few minutes: The question asked must be answered in 1 to 10 minutes.
Within an hour: The question asked must be answered in 10 minutes to 1 hour.
Within the day: The question asked must be answered in 1 to 24 hours.
At any time: The question can be answered at any time

Similar to these there are lot areas this papers covers like, upong usage how ones experience of using the app changes overtime. They also found out that the quality of questions were changing which means the models is getting robust on extensive analysis by the correct question asked by the reserachers and getting valid feedback from the actualy user. The key takeaway for me will be "How one must define a problem statement!".