Difference between revisions of "LiveSubtitlesSRS"

From air
Jump to navigation Jump to search
(Created page with "= Introduction = == Purpose of the requirements document == This Software Requirements Specification (SRS) identifies the requirements for project "Sign2Speech". This is an o...")
 
 
(10 intermediate revisions by 3 users not shown)
Line 2: Line 2:
   
 
== Purpose of the requirements document ==
 
== Purpose of the requirements document ==
This Software Requirements Specification (SRS) identifies the requirements for project "Sign2Speech". This is an open source projet and we shall present what we did for this project in case to catch interest of new potential contributors. This document is a guideline about the functionalities offered and the problems that the system solve.
+
This Software Requirements Specification (SRS) identifies the requirements for project "RealTimeSubtitles". This is a guideline about features offered and problems that we will have to solve. It is an open source project loaded on Github, the code is well organized to allow review by us or by new potential contributors.
   
 
== Scope of the product ==
 
== Scope of the product ==
 
RealTimeSubtiltes is an app designed to help partially deaf stutents in a classroom. The aim is to transcript a teacher speech in live and display the speech on the corresponding slide as subtitles. On the other hands, students in the classroom can correct the subtitle on a collaborative HMI. We have to use GoogleAPI Speech for the transcript, reveal.js for the slides and JavaScript. .
 
RealTimeSubtiltes is an app designed to help partially deaf stutents in a classroom. The aim is to transcript a teacher speech in live and display the speech on the corresponding slide as subtitles. On the other hands, students in the classroom can correct the subtitle on a collaborative HMI. We have to use GoogleAPI Speech for the transcript, reveal.js for the slides and JavaScript. .
 
   
 
= General Description=
 
= General Description=
Line 16: Line 15:
 
== Product functions ==
 
== Product functions ==
   
The part have 2 parts :
+
The app is divided into 2 parts :
 
*The transcript by GoogleSpeech
 
*The transcript by GoogleSpeech
 
In a first place the API must recognize the teacher speech and transcript it in real time. Final result are appended into the right place according to the current slide.
 
In a first place the API must recognize the teacher speech and transcript it in real time. Final result are appended into the right place according to the current slide.
 
*The collaborative HMI
 
*The collaborative HMI
Designed for students, it allows logged in student to follow a course. While the teacher speech the students can either follow the courses and read the subtitles, or edit the subtitles to correct the results.
+
Designed for students, it allows logged in student to follow a course. While the teacher speech the students can either follow the courses and read the subtitles, or edit the subtitles to correct the results.
 
   
 
== User characteristics ==
 
== User characteristics ==
Line 48: Line 46:
   
 
'''Description''': Capture the voice and return a textual translation
 
'''Description''': Capture the voice and return a textual translation
  +
 
'''Inputs''': Voice of a speaker
 
'''Inputs''': Voice of a speaker
  +
 
'''Source''': Human
 
'''Source''': Human
  +
 
'''Outputs''': Textual data
 
'''Outputs''': Textual data
  +
 
'''Destination''': User
 
'''Destination''': User
  +
 
'''Action''': A speaker talk with a microphone and the system return the transcript in textual
 
'''Action''': A speaker talk with a microphone and the system return the transcript in textual
  +
 
'''Non functional requirements''': Accurate detection of spoken words
 
'''Non functional requirements''': Accurate detection of spoken words
  +
 
'''Pre-condition''': User has a microphone
 
'''Pre-condition''': User has a microphone
  +
 
'''Post-condition''': Words are detected
 
'''Post-condition''': Words are detected
  +
 
'''Side-effects''': words are not detected or wrong detection
 
'''Side-effects''': words are not detected or wrong detection
  +
   
   
 
=== Render the subtitles to slides ===
 
=== Render the subtitles to slides ===
  +
 
'''Description''': Show the subtitles to the slides
 
'''Description''': Show the subtitles to the slides
  +
 
'''Inputs''': words spoken
 
'''Inputs''': words spoken
  +
 
'''Source''': Speech recognizer
 
'''Source''': Speech recognizer
  +
 
'''Outputs''': slides with subtitles
 
'''Outputs''': slides with subtitles
  +
 
'''Destination''': slides
 
'''Destination''': slides
  +
'''Action''': : get the spoken words and show them correctly to the slides
+
'''Action''': : Get the spoken words and show them correctly to the slides
  +
 
'''Non functional requirements''': No loss of data
 
'''Non functional requirements''': No loss of data
  +
 
'''Pre-condition''': Spoken words are detected
 
'''Pre-condition''': Spoken words are detected
  +
 
'''Post-condition''': Slides are shown with subtitles
 
'''Post-condition''': Slides are shown with subtitles
'''Side-effects''':Subtitles are not well shown and hide the slides. Subtitles are not readable.
 
   
 
'''Side-effects''': Subtitles are not well shown and hide the slides. Subtitles are not readable.
   
=== Learning mode ===
+
=== Editing subtitles ===
  +
'''Description''': User can edit subtitles : add or edit words
'''Description''': The function of this mode is to allow the user to add as many gestures (with their translations) as he wants to the dictionary
 
   
  +
'''Inputs''': Wrong detected word
'''Inputs''': Hand and finger data returned by the camera stream and the meaning of the gesture
 
   
'''Source''': Intel's Real Sense camera
+
'''Source''': Speech recognizer
   
  +
'''Outputs''': corrected word
'''Outputs''': New dictionary (JSON file) containing the new gestures and their meaning
 
   
'''Destination''': Computer's memory
+
'''Destination''': shown subtitles
   
  +
'''Action''': User click on the word he wants to edit then edit it with his keyboard. User click on blank space between words to add a word.
'''Action''': The user has to select the learning mode when launching the application and enter the number of gestures that he wants to record. Guidelines are printed on the screen so that the user knows what to do and when to do it. Basically, he will have to repeat each gesture 3 times in a row (so that the program can compute the average of the 3 repeated gestures to minimize the errors). At the end of the record, the user can chooose whether he wants to add a new word (and stay in the learning mode) or not (in this case, the normal recognition mode will be activated).
 
   
'''Non functional requirements''': Real-time tracking (< 1 second)
+
'''Non functional requirements''': Easy to click between words, or add a word
   
  +
'''Pre-condition''': Words are detected
'''Pre-condition''': Optimal conditions of use (good light, monochrom top that constrats with the color of the skin, no rings, no bracelets, ...). The user must also have divided his gesture into basic gestures to record.
 
   
'''Post-condition''': The new entry in the dictionary must correspond to the gesture that the user intended to do
+
'''Post-condition''': words are added or modified
   
'''Side-effects''': The lack of precision of the camera: if the gesture was not well recognized, the encoding in the dictionary will be wrong
+
'''Side-effects''': Removing a good word, text not well displayed.
   
= Product Evolution =
 
   
* “Real-time” windows that could show a representation of the hand that the camera is currently analyzing. It could allow the user to know if the camera is able to correctly recognize his hand. It could be done with QT Creator. Our application is not at this time really “friendly-user”.
 
   
  +
=== Session login ===
*“2 hands” symbols that are currently not implemented in our application
 
  +
'''Description''': User has his own session
  +
  +
'''Inputs''': User profile
  +
  +
'''Source''': User profile
  +
  +
'''Outputs''': A logged user
  +
  +
'''Destination''': security manager, session control
  +
  +
'''Action''': User click on login form and enter his login and password.
  +
  +
'''Non functional requirements''': secured against SQL injection
  +
  +
'''Pre-condition''': user wants to login and know his login and password
  +
  +
'''Post-condition''': user is logged
  +
  +
'''Side-effects''': Users are tracked by id. Users cant delete others courses
  +
 
= Product Evolution =
  +
  +
*Different API Speech more efficient
  +
*Using RealTimeSubtitles in meetings/conferences
   
* Improvements of trajectories recognition
 
   
  +
= References =
* Language Model
 
   
  +
*https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html#speechreco-result
*A better camera
 
  +
*https://developers.google.com/web/updates/2013/01/Voice-Driven-Web-Apps-Introduction-to-the-Web-Speech-API
  +
*https://openclassrooms.com/courses/des-applications-ultra-rapides-avec-node-js/socket-io-passez-au-temps-reel
  +
*https://openclassrooms.com/courses/concevez-votre-site-web-avec-php-et-mysql/tp-page-protegee-par-mot-de-passe
  +
*https://openclassrooms.com/courses/concevez-votre-site-web-avec-php-et-mysql/variables-superglobales-sessions-et-cookies
  +
*https://www.youtube.com/watch?v=o0xr1JRZOb4&index=2&list=PLLnpHn493BHFWQGA1PcyQZWAfR96a4CkH
  +
*https://atmospherejs.com
  +
*https://www.meteor.com/tutorials/blaze/adding-user-accounts
  +
*http://meteortips.com/
  +
*https://github.com/CollectionFS/Meteor-CollectionFS#installation
  +
*http://getbootstrap.com/getting-started/
  +
*http://srault95.github.io/meteor-app-base/meteor-collection-helpers/

Latest revision as of 14:20, 15 March 2017

Introduction

Purpose of the requirements document

This Software Requirements Specification (SRS) identifies the requirements for project "RealTimeSubtitles". This is a guideline about features offered and problems that we will have to solve. It is an open source project loaded on Github, the code is well organized to allow review by us or by new potential contributors.

Scope of the product

RealTimeSubtiltes is an app designed to help partially deaf stutents in a classroom. The aim is to transcript a teacher speech in live and display the speech on the corresponding slide as subtitles. On the other hands, students in the classroom can correct the subtitle on a collaborative HMI. We have to use GoogleAPI Speech for the transcript, reveal.js for the slides and JavaScript. .

General Description

Product perspective

The main target of our project is to help partially deaf student to be more autonomous attending a lecture. This project is proposed by the department of disabled students at the UGA. In addition, we have to design a collaborative HMI for students to correct in real time the subtitles.


Product functions

The app is divided into 2 parts :

  • The transcript by GoogleSpeech

In a first place the API must recognize the teacher speech and transcript it in real time. Final result are appended into the right place according to the current slide.

  • The collaborative HMI

Designed for students, it allows logged in student to follow a course. While the teacher speech the students can either follow the courses and read the subtitles, or edit the subtitles to correct the results.

User characteristics

There are three types of users for our app

  • The teacher talking while showing his slides
  • The students editing notes
  • The students reading the notes and the partially deaf students


Operating environment

The GoogleSpeech API works on google Chrome. A good Internet connection is required for the transcript.

General constraints

  • The teacher needs to have his slides on reveal.js
  • The teacher need to talk loud and not so fast
  • The room has to be quiet (no noise)
  • These elements can reduce errors and help the API to transcript well the speech. However, it won’t be perfect due to the instability of GoogleSpeech API.


Specific requirements, covering functional, non-functional and interface requirements

Requirement X.Y.Z (in Structured Natural Language)

Speech recognition

Description: Capture the voice and return a textual translation

Inputs: Voice of a speaker

Source: Human

Outputs: Textual data

Destination: User

Action: A speaker talk with a microphone and the system return the transcript in textual

Non functional requirements: Accurate detection of spoken words

Pre-condition: User has a microphone

Post-condition: Words are detected

Side-effects: words are not detected or wrong detection


Render the subtitles to slides

Description: Show the subtitles to the slides

Inputs: words spoken

Source: Speech recognizer

Outputs: slides with subtitles

Destination: slides

Action: : Get the spoken words and show them correctly to the slides

Non functional requirements: No loss of data

Pre-condition: Spoken words are detected

Post-condition: Slides are shown with subtitles

Side-effects: Subtitles are not well shown and hide the slides. Subtitles are not readable.

Editing subtitles

Description: User can edit subtitles : add or edit words

Inputs: Wrong detected word

Source: Speech recognizer

Outputs: corrected word

Destination: shown subtitles

Action: User click on the word he wants to edit then edit it with his keyboard. User click on blank space between words to add a word.

Non functional requirements: Easy to click between words, or add a word

Pre-condition: Words are detected

Post-condition: words are added or modified

Side-effects: Removing a good word, text not well displayed.


Session login

Description: User has his own session

Inputs: User profile

Source: User profile

Outputs: A logged user

Destination: security manager, session control

Action: User click on login form and enter his login and password.

Non functional requirements: secured against SQL injection

Pre-condition: user wants to login and know his login and password

Post-condition: user is logged

Side-effects: Users are tracked by id. Users cant delete others courses

Product Evolution

  • Different API Speech more efficient
  • Using RealTimeSubtitles in meetings/conferences


References