SRS - Sign2Speech: Difference between revisions
Reatha.Tith (talk | contribs) No edit summary |
Reatha.Tith (talk | contribs) |
||
(41 intermediate revisions by 3 users not shown) | |||
Line 2: | Line 2: | ||
== Purpose of the requirements document == |
== Purpose of the requirements document == |
||
This Software Requirements Specification (SRS) identifies the requirements for project "Sign2Speech". This is an open source projet and we shall present what we did for this project in case to catch interest of new potential contributors. This document is a guideline about the functionalities offered and the problems that the system solve. |
|||
== Scope of the product == |
== Scope of the product == |
||
Sign2Speech could be used at reception desks or during video conferences to allow signing people to speak with people who don't know the French Sign Language. |
|||
The main point of this project is to use Intel's Real Sense camera to recognize gestures from the French Sign Language to offer a new means of communication. |
|||
The program will be able to transcribe gestures, done by a signing person, into written words, printed on the screen of the person who doesn't know the FSL. This communication will be made via a chat application working on WebRTC. |
|||
== Glossary == |
== Glossary == |
||
* FSL: French Sign Language |
|||
* JSON: Javascript Object Notation . We use this format to store our dictionary. |
|||
== References == |
== References == |
||
* https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_manuals_sdk_algorithms.html |
|||
* http://rapidjson.org/index.html |
|||
* https://github.com/dhbaird/easywsclient |
|||
== Overview of the remainder of the document == |
== Overview of the remainder of the document == |
||
In the remainder of the document, the general description of the software will be exposed. The requirements (functional and non-functional) will be specified in another part. |
|||
The document will end with the product evolution. |
|||
= General Description= |
= General Description= |
||
== Product perspective == |
== Product perspective == |
||
The main aim of our project is to improve the communication between a signing and a non-signing person. To do so, we developed a software that is capable of recognizing gestures, finding their meaning in a linked dictionary and sending the translation to a correspondent via a websocket channel. |
|||
== Product functions == |
== Product functions == |
||
Our appplication is made of 2 major parts: |
|||
* The recognition and translating application |
|||
In this function, the program will try to recognize the gestures executed in front of the camera. It is linked to a dictionary (JSON file) that contains all the words that the program will be able to recognize. If the gesture is recognized by the program, its meaning will be printed on the screen. If not, nothing will display. |
|||
* The WebRTC chat application |
|||
It is used to enable the communication between 2 persons, using the video and a text chat. It also allows the real-time transmission of the subtitles. |
|||
== User characteristics == |
== User characteristics == |
||
There is two types of users for the application: |
|||
* The first user (User1) knows the FSL, |
|||
* The second user (User2) doesn't understand the FSL |
|||
They both want to communicate together. User1 will stand in the front of the camera and will sign while User2 will watch the monitor to see the translation of the gestures. He will be able to reply using the messaging chat. |
|||
== Operating environment == |
== Operating environment == |
||
The camera's SDK only works on Windows. A good Internet connection is also required for the chat application, which only works with Mozilla Firefox due to restriction on Chrome (WebRTC needs a HTTPS connection) and other browser doesn't provide a full support of WebRTC. |
|||
The user must not use a hotspot or a connection wich is under a complex NAT. The connection process for WebRTC will struggle to pass through the NAT and will send back an error message. |
|||
== General constraints == |
== General constraints == |
||
Of course, the user needs to have Intel's Real Sense camera. |
|||
== Assumptions and dependencies == |
|||
We have reported different factors that have a negative consequence on the hand tracking process: |
|||
* The user must not wear bracelets or rings, |
|||
* The user should, as much as possible, use the camera under natural light, rather than artificial light sources, |
|||
* The user must wear a monochrome top that contrasts with the color of the skin. |
|||
These elements can reduce the errors and make the tracking better, but it still won’t be perfect because of the imprecisions of the camera themselves. |
|||
= Specific requirements, covering functional, non-functional and interface requirements = |
= Specific requirements, covering functional, non-functional and interface requirements = |
||
* document external interfaces, |
|||
* describe system functionality and performance |
|||
* specify logical database requirements, |
|||
* design constraints, |
|||
* emergent system properties and quality characteristics. |
|||
== |
== Requirement X.Y.Z (in Structured Natural Language) == |
||
'''Function''':Live transcription |
|||
=== Gesture recognition === |
|||
'''Description''': When the receptionist speak a live transcription can be seen by both users. |
|||
'''Description''': This module does all the tracking process and the encoding |
|||
'''Inputs''': |
'''Inputs''': Hand and finger data returned by the camera stream |
||
'''Source''': |
'''Source''': Intel's Real Sense camera |
||
'''Outputs''': |
'''Outputs''': The encoding corresponding to the gesture (fingers+trajectories) |
||
'''Destination''': Computer's memory |
|||
'''Destination''': This device is designed to used in a reception environment susceptible of welcoming deaf (or impaired) people.(University, Post office...) |
|||
'''Action''': The signing person has to make the gestures in front of the camera (between 20cm and 1m away from the camera). The camera tracks the fingers and the hand and we gather the information in order to calculate averages to encode the gesture as accurately as possible. |
|||
'''Action''': |
|||
* Natural language sentences (with MUST, MAY, SHALL) |
|||
* Graphical Notations : UML Sequence w/o collaboration diagrams, Process maps, Task Analysis (HTA, CTT) |
|||
* Mathematical Notations |
|||
* Tabular notations for several (condition --> action) tuples |
|||
'''Non functional requirements''': |
'''Non functional requirements''': Real-time tracking (< 1 second) and margin of error (error scoring result < 8) |
||
'''Pre-condition''': Optimal conditions of use (good light, monochrom top that constrats with the color of the skin, no rings, no bracelets, ...) |
|||
'''Pre-condition''': |
|||
'''Post-condition''': The gesture must have been well recognized by the camera in order to be well encoded |
|||
'''Post-condition''': |
|||
'''Side-effects''': If the gesture has not been well recognized and encoded, it won't be recognized/well translated afterwards |
|||
=== Gesture translation === |
|||
'''Description''': This part implements all the translation of the project |
|||
'''Inputs''': The gesture encoding returned by the gesture recognization step |
|||
'''Source''': Computer's memory |
|||
'''Outputs''': The current node moves inside the dictionary and sends the corresponding word if there is a match |
|||
'''Destination''': WebSocket channel |
|||
'''Action''': : Given a word, the program looks for its translation in the dictionary and sends it on the websocket channel if there is a match |
|||
'''Non functional requirements''': Real-time search in the dictionary (< 1 second) |
|||
'''Pre-condition''': The encoding must have a translation in the dictionary |
|||
'''Post-condition''': The gesture must be well translated |
|||
'''Side-effects''': |
'''Side-effects''': |
||
=== Learning mode === |
|||
'''Description''': The function of this mode is to allow the user to add as many gestures (with their translations) as he wants to the dictionary |
|||
'''Inputs''': Hand and finger data returned by the camera stream and the meaning of the gesture |
|||
'''Source''': Intel's Real Sense camera |
|||
'''Outputs''': New dictionary (JSON file) containing the new gestures and their meaning |
|||
'''Destination''': Computer's memory |
|||
'''Action''': The user has to select the learning mode when launching the application and enter the number of gestures that he wants to record. Guidelines are printed on the screen so that the user knows what to do and when to do it. Basically, he will have to repeat each gesture 3 times in a row (so that the program can compute the average of the 3 repeated gestures to minimize the errors). At the end of the record, the user can chooose whether he wants to add a new word (and stay in the learning mode) or not (in this case, the normal recognition mode will be activated). |
|||
'''Non functional requirements''': Real-time tracking (< 1 second) |
|||
'''Pre-condition''': Optimal conditions of use (good light, monochrom top that constrats with the color of the skin, no rings, no bracelets, ...). The user must also have divided his gesture into basic gestures to record. |
|||
'''Post-condition''': The new entry in the dictionary must correspond to the gesture that the user intended to do |
|||
'''Side-effects''': The lack of precision of the camera: if the gesture was not well recognized, the encoding in the dictionary will be wrong |
|||
= Product Evolution = |
|||
* “Real-time” windows that could show a representation of the hand that the camera is currently analyzing. It could allow the user to know if the camera is able to correctly recognize his hand. It could be done with QT Creator. Our application is not at this time really “friendly-user”. |
|||
*“2 hands” symbols that are currently not implemented in our application |
|||
* Improvements of trajectories recognition |
|||
* Language Model |
|||
= Other Nonfunctional Requirements = |
|||
== Performance requirements == |
|||
== Safety requirements == |
|||
== Security requirements == |
|||
== Software quality attributes == |
|||
== Project documentation == |
|||
== User documentation == |
|||
*A better camera |
|||
= Other Requirements = |
|||
== Appendix A: Terminology/Glossary/Definitions list == |
|||
== Appendix B: To be determined == |
Latest revision as of 11:41, 5 April 2016
Introduction
Purpose of the requirements document
This Software Requirements Specification (SRS) identifies the requirements for project "Sign2Speech". This is an open source projet and we shall present what we did for this project in case to catch interest of new potential contributors. This document is a guideline about the functionalities offered and the problems that the system solve.
Scope of the product
Sign2Speech could be used at reception desks or during video conferences to allow signing people to speak with people who don't know the French Sign Language. The main point of this project is to use Intel's Real Sense camera to recognize gestures from the French Sign Language to offer a new means of communication. The program will be able to transcribe gestures, done by a signing person, into written words, printed on the screen of the person who doesn't know the FSL. This communication will be made via a chat application working on WebRTC.
Glossary
- FSL: French Sign Language
- JSON: Javascript Object Notation . We use this format to store our dictionary.
References
- https://software.intel.com/sites/landingpage/realsense/camera-sdk/v1.1/documentation/html/index.html?doc_manuals_sdk_algorithms.html
- http://rapidjson.org/index.html
- https://github.com/dhbaird/easywsclient
Overview of the remainder of the document
In the remainder of the document, the general description of the software will be exposed. The requirements (functional and non-functional) will be specified in another part. The document will end with the product evolution.
General Description
Product perspective
The main aim of our project is to improve the communication between a signing and a non-signing person. To do so, we developed a software that is capable of recognizing gestures, finding their meaning in a linked dictionary and sending the translation to a correspondent via a websocket channel.
Product functions
Our appplication is made of 2 major parts:
- The recognition and translating application
In this function, the program will try to recognize the gestures executed in front of the camera. It is linked to a dictionary (JSON file) that contains all the words that the program will be able to recognize. If the gesture is recognized by the program, its meaning will be printed on the screen. If not, nothing will display.
- The WebRTC chat application
It is used to enable the communication between 2 persons, using the video and a text chat. It also allows the real-time transmission of the subtitles.
User characteristics
There is two types of users for the application:
- The first user (User1) knows the FSL,
- The second user (User2) doesn't understand the FSL
They both want to communicate together. User1 will stand in the front of the camera and will sign while User2 will watch the monitor to see the translation of the gestures. He will be able to reply using the messaging chat.
Operating environment
The camera's SDK only works on Windows. A good Internet connection is also required for the chat application, which only works with Mozilla Firefox due to restriction on Chrome (WebRTC needs a HTTPS connection) and other browser doesn't provide a full support of WebRTC. The user must not use a hotspot or a connection wich is under a complex NAT. The connection process for WebRTC will struggle to pass through the NAT and will send back an error message.
General constraints
Of course, the user needs to have Intel's Real Sense camera. We have reported different factors that have a negative consequence on the hand tracking process:
- The user must not wear bracelets or rings,
- The user should, as much as possible, use the camera under natural light, rather than artificial light sources,
- The user must wear a monochrome top that contrasts with the color of the skin.
These elements can reduce the errors and make the tracking better, but it still won’t be perfect because of the imprecisions of the camera themselves.
Specific requirements, covering functional, non-functional and interface requirements
Requirement X.Y.Z (in Structured Natural Language)
Gesture recognition
Description: This module does all the tracking process and the encoding
Inputs: Hand and finger data returned by the camera stream
Source: Intel's Real Sense camera
Outputs: The encoding corresponding to the gesture (fingers+trajectories)
Destination: Computer's memory
Action: The signing person has to make the gestures in front of the camera (between 20cm and 1m away from the camera). The camera tracks the fingers and the hand and we gather the information in order to calculate averages to encode the gesture as accurately as possible.
Non functional requirements: Real-time tracking (< 1 second) and margin of error (error scoring result < 8)
Pre-condition: Optimal conditions of use (good light, monochrom top that constrats with the color of the skin, no rings, no bracelets, ...)
Post-condition: The gesture must have been well recognized by the camera in order to be well encoded
Side-effects: If the gesture has not been well recognized and encoded, it won't be recognized/well translated afterwards
Gesture translation
Description: This part implements all the translation of the project
Inputs: The gesture encoding returned by the gesture recognization step
Source: Computer's memory
Outputs: The current node moves inside the dictionary and sends the corresponding word if there is a match
Destination: WebSocket channel
Action: : Given a word, the program looks for its translation in the dictionary and sends it on the websocket channel if there is a match
Non functional requirements: Real-time search in the dictionary (< 1 second)
Pre-condition: The encoding must have a translation in the dictionary
Post-condition: The gesture must be well translated
Side-effects:
Learning mode
Description: The function of this mode is to allow the user to add as many gestures (with their translations) as he wants to the dictionary
Inputs: Hand and finger data returned by the camera stream and the meaning of the gesture
Source: Intel's Real Sense camera
Outputs: New dictionary (JSON file) containing the new gestures and their meaning
Destination: Computer's memory
Action: The user has to select the learning mode when launching the application and enter the number of gestures that he wants to record. Guidelines are printed on the screen so that the user knows what to do and when to do it. Basically, he will have to repeat each gesture 3 times in a row (so that the program can compute the average of the 3 repeated gestures to minimize the errors). At the end of the record, the user can chooose whether he wants to add a new word (and stay in the learning mode) or not (in this case, the normal recognition mode will be activated).
Non functional requirements: Real-time tracking (< 1 second)
Pre-condition: Optimal conditions of use (good light, monochrom top that constrats with the color of the skin, no rings, no bracelets, ...). The user must also have divided his gesture into basic gestures to record.
Post-condition: The new entry in the dictionary must correspond to the gesture that the user intended to do
Side-effects: The lack of precision of the camera: if the gesture was not well recognized, the encoding in the dictionary will be wrong
Product Evolution
- “Real-time” windows that could show a representation of the hand that the camera is currently analyzing. It could allow the user to know if the camera is able to correctly recognize his hand. It could be done with QT Creator. Our application is not at this time really “friendly-user”.
- “2 hands” symbols that are currently not implemented in our application
- Improvements of trajectories recognition
- Language Model
- A better camera